All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
@ 2015-03-25  9:36 Wen Congyang
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
                   ` (15 more replies)
  0 siblings, 16 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Yang Hongyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/wencongyang/qemu-colo/commits/block-replication-v2

Changs Log:
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (13):
  docs: block replication's description
  quorum: allow ignoring child errors
  NBD client: connect to nbd server later
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  NBD client: implement block driver interfaces for block replication
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Parse "backing_reference" option to reference existing BDS
  Backup: clear all bitmap when doing block checkpoint
  qcow2: support colo
  skip nbd_target when starting block replication
  Don't allow a disk use backing reference target

 block.c                    | 242 +++++++++++++++++++++++-
 block/Makefile.objs        |   2 +-
 block/backup.c             |  12 ++
 block/nbd.c                | 171 +++++++++++++++--
 block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.h              |   6 +
 block/quorum.c             | 143 ++++++++++++++-
 docs/block-replication.txt | 147 +++++++++++++++
 include/block/block.h      |   5 +
 include/block/block_int.h  |  13 ++
 include/qemu/hbitmap.h     |   8 +
 qapi/block.json            |  16 ++
 tests/qemu-iotests/051     |  13 ++
 tests/qemu-iotests/051.out |  13 ++
 util/hbitmap.c             |  19 ++
 15 files changed, 1230 insertions(+), 27 deletions(-)
 create mode 100644 docs/block-replication.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 15:38   ` [Qemu-devel] [Qemu-block] " Eric Blake
  2015-03-26  6:31   ` [Qemu-devel] " Fam Zheng
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors Wen Congyang
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 0000000..874ed8e
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,147 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+The block replication is used for continuous checkpoints. It is designed
+for COLO that Secondary VM is running. It can also be applied for FT/HA
+scene that Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoint. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement COLO block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary      2 NBD  ------->  2 NBD
+     disk       client    ||     server                                         virtio-blk
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 4 <--------- active-disk 3
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+3) The disk on the secondary is represented by a custom block device
+(called active-disk). It should be an empty disk, and the format should
+be qcow2.
+
+4) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also be an empty disk, and
+the dirver supports bdrv_make_empty().
+
+== New block driver interface ==
+We add three block driver interfaces to control block replication:
+a. bdrv_start_replication()
+   Start block replication, called in migration/checkpoint thread.
+   We must call bdrv_start_replication() in secondary QEMU before
+   calling bdrv_start_replication() in primary QEMU.
+b. bdrv_do_checkpoint()
+   This interface is called after all VM state is transfered to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. bdrv_stop_replication()
+   It is called when failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw,\
+         children.1.file.driver=nbd+colo,\
+         children.1.file.host=xxx,\
+         children.1.file.port=xxx,\
+         children.1.file.export=xxx,\
+         children.1.driver=raw,\
+         children.1.ignore-errors=on
+  Note:
+  1. NBD Client should not be the first child of quorum.
+  2. There should be only one NBD Client.
+  3. host is the secondary physical machine's hostname or IP
+  4. Each disk must have its own export name.
+
+Secondary:
+  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
+  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
+         backing_reference.drive_id=nbd_target1,\
+         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
+         backing_reference.hidden-disk.driver=qcow2,\
+         backing_reference.hidden-disk.allow-write-backing-file=on
+  Then run qmp command:
+    nbd_server_start host:port
+  Note:
+  1. The export name for the same disk must be the same in primary
+     and secondary QEMU command line
+  2. The qmp command nbd_server_start must be run before running the
+     qmp command migrate on primary QEMU
+  3. Don't use nbd_server_start's other options
+  4. Active disk, hidden disk and nbd target's length should be the
+     same.
+  5. It is better to put active disk and hidden disk in ramdisk.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:45   ` Paolo Bonzini
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later Wen Congyang
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

If the child is not ready, read/write/getlength/flush will
return -errno. It is not critical error, and can be ignored:
1. read/write:
   Just not report the error event.
2. getlength:
   just ignore it. If all children's getlength return -errno,
   and be ignored, return -EIO.
3. flush:
   Just ignore it. If all children's getlength return -errno,
   and be ignored, return 0.

Usage: children.x.ignore-errors=true

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/quorum.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 437b122..d087135 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -30,6 +30,7 @@
 #define QUORUM_OPT_BLKVERIFY      "blkverify"
 #define QUORUM_OPT_REWRITE        "rewrite-corrupted"
 #define QUORUM_OPT_READ_PATTERN   "read-pattern"
+#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   "ignore-errors"
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -65,6 +66,7 @@ typedef struct QuorumVotes {
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
     BlockDriverState **bs; /* children BlockDriverStates */
+    bool *ignore_errors;   /* ignore children's error? */
     int num_children;      /* children count */
     int threshold;         /* if less than threshold children reads gave the
                             * same result a quorum error occurs.
@@ -97,6 +99,7 @@ typedef struct QuorumChildRequest {
     uint8_t *buf;
     int ret;
     QuorumAIOCB *parent;
+    int index;
 } QuorumChildRequest;
 
 /* Quorum will use the following structure to track progress of each read/write
@@ -209,6 +212,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
         acb->qcrs[i].buf = NULL;
         acb->qcrs[i].ret = 0;
         acb->qcrs[i].parent = acb;
+        acb->qcrs[i].index = i;
     }
 
     return acb;
@@ -305,7 +309,7 @@ static void quorum_aio_cb(void *opaque, int ret)
     acb->count++;
     if (ret == 0) {
         acb->success_count++;
-    } else {
+    } else if (!s->ignore_errors[sacb->index]) {
         quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
     }
     assert(acb->count <= s->num_children);
@@ -720,19 +724,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
     BDRVQuorumState *s = bs->opaque;
-    int64_t result;
+    int64_t result = -EIO;
     int i;
 
     /* check that all file have the same length */
-    result = bdrv_getlength(s->bs[0]);
-    if (result < 0) {
-        return result;
-    }
-    for (i = 1; i < s->num_children; i++) {
+    for (i = 0; i < s->num_children; i++) {
         int64_t value = bdrv_getlength(s->bs[i]);
+
         if (value < 0) {
             return value;
         }
+
+        if (value == 0 && s->ignore_errors[i]) {
+            /*
+             * If the child is not ready, it cannot return -errno,
+             * otherwise refresh_total_sectors() will fail when
+             * we open the child.
+             */
+            continue;
+        }
+
+        if (result == -EIO) {
+            result = value;
+            continue;
+        }
+
         if (value != result) {
             return -EIO;
         }
@@ -770,6 +786,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
 
     for (i = 0; i < s->num_children; i++) {
         result = bdrv_co_flush(s->bs[i]);
+        if (result < 0 && s->ignore_errors[i]) {
+            result = 0;
+        }
         result_value.l = result;
         quorum_count_vote(&error_votes, &result_value, i);
     }
@@ -844,6 +863,19 @@ static QemuOptsList quorum_runtime_opts = {
     },
 };
 
+static QemuOptsList quorum_children_common_opts = {
+    .name = "qcow2 children",
+    .head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
+    .desc = {
+        {
+            .name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+            .type = QEMU_OPT_BOOL,
+            .help = "ignore child I/O error",
+        },
+        { /* end of list */ }
+    },
+};
+
 static int parse_read_pattern(const char *opt)
 {
     int i;
@@ -939,11 +971,14 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     /* allocate the children BlockDriverState array */
     s->bs = g_new0(BlockDriverState *, s->num_children);
     opened = g_new0(bool, s->num_children);
+    s->ignore_errors = g_new0(bool, s->num_children);
 
     for (i = 0, lentry = qlist_first(list); lentry;
          lentry = qlist_next(lentry), i++) {
         QDict *d;
         QString *string;
+        QemuOpts *children_opts = NULL;
+        bool value;
 
         switch (qobject_type(lentry->value))
         {
@@ -951,6 +986,20 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
             case QTYPE_QDICT:
                 d = qobject_to_qdict(lentry->value);
                 QINCREF(d);
+
+                children_opts = qemu_opts_create(&quorum_children_common_opts,
+                                                 NULL, 0, &error_abort);
+                qemu_opts_absorb_qdict(children_opts, d, &local_err);
+                if (local_err) {
+                    ret = -EINVAL;
+                    qemu_opts_del(children_opts);
+                    goto close_exit;
+                }
+                value = qemu_opt_get_bool(children_opts,
+                                          QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+                                          false);
+                s->ignore_errors[i]  = value;
+                qemu_opts_del(children_opts);
                 ret = bdrv_open(&s->bs[i], NULL, NULL, d, flags, NULL,
                                 &local_err);
                 break;
@@ -1008,6 +1057,7 @@ static void quorum_close(BlockDriverState *bs)
     }
 
     g_free(s->bs);
+    g_free(s->ignore_errors);
 }
 
 static void quorum_detach_aio_context(BlockDriverState *bs)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:46   ` Paolo Bonzini
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication Wen Congyang
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

The secondary qemu starts later than the primary qemu, so we
cannot connect to nbd server in bdrv_open().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/nbd.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 108 insertions(+), 14 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2176186..3faf865 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -44,6 +44,8 @@
 typedef struct BDRVNBDState {
     NbdClientSession client;
     QemuOpts *socket_opts;
+    char *export;
+    bool connected;
 } BDRVNBDState;
 
 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -254,50 +256,95 @@ static int nbd_establish_connection(BlockDriverState *bs, Error **errp)
     return sock;
 }
 
-static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
-                    Error **errp)
+static int nbd_connect_server(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = bs->opaque;
-    char *export = NULL;
     int result, sock;
-    Error *local_err = NULL;
-
-    /* Pop the config into our state object. Exit if invalid. */
-    nbd_config(s, options, &export, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return -EINVAL;
-    }
 
     /* establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
     sock = nbd_establish_connection(bs, errp);
     if (sock < 0) {
-        g_free(export);
+        g_free(s->export);
         return sock;
     }
 
     /* NBD handshake */
-    result = nbd_client_init(bs, sock, export, errp);
-    g_free(export);
+    result = nbd_client_init(bs, sock, s->export, errp);
+    g_free(s->export);
+    s->export = NULL;
+    if (!result) {
+        s->connected = true;
+    }
+
     return result;
 }
 
+static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
+                    Error **errp)
+{
+    BDRVNBDState *s = bs->opaque;
+    Error *local_err = NULL;
+
+    /* Pop the config into our state object. Exit if invalid. */
+    nbd_config(s, options, &s->export, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    return nbd_connect_server(bs, errp);
+}
+
+static int nbd_open_colo(BlockDriverState *bs, QDict *options, int flags,
+                         Error **errp)
+{
+    BDRVNBDState *s = bs->opaque;
+    Error *local_err = NULL;
+
+    /* Pop the config into our state object. Exit if invalid. */
+    nbd_config(s, options, &s->export, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
 static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
                         int nb_sectors, QEMUIOVector *qiov)
 {
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        return -EIO;
+    }
+
     return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
 }
 
 static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
                          int nb_sectors, QEMUIOVector *qiov)
 {
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        return -EIO;
+    }
+
     return nbd_client_co_writev(bs, sector_num, nb_sectors, qiov);
 }
 
 static int nbd_co_flush(BlockDriverState *bs)
 {
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        return -EIO;
+    }
+
     return nbd_client_co_flush(bs);
 }
 
@@ -310,6 +357,12 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
                           int nb_sectors)
 {
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        return -EIO;
+    }
+
     return nbd_client_co_discard(bs, sector_num, nb_sectors);
 }
 
@@ -319,23 +372,44 @@ static void nbd_close(BlockDriverState *bs)
 
     qemu_opts_del(s->socket_opts);
     nbd_client_close(bs);
+    s->connected = false;
 }
 
 static int64_t nbd_getlength(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
 
+    if (!s->connected) {
+        /*
+         * We cannot return -ENOTCONN, otherwise refresh_total_sectors()
+         * will fail, and we cannot open nbd client.
+         */
+        return 0;
+    }
+
     return s->client.size;
 }
 
 static void nbd_detach_aio_context(BlockDriverState *bs)
 {
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        return;
+    }
+
     nbd_client_detach_aio_context(bs);
 }
 
 static void nbd_attach_aio_context(BlockDriverState *bs,
                                    AioContext *new_context)
 {
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        return;
+    }
+
     nbd_client_attach_aio_context(bs, new_context);
 }
 
@@ -438,11 +512,31 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_refresh_filename      = nbd_refresh_filename,
 };
 
+static BlockDriver bdrv_nbd_colo = {
+    .format_name                = "nbd+colo",
+    .protocol_name              = "nbd+colo",
+    .instance_size              = sizeof(BDRVNBDState),
+    .bdrv_parse_filename        = nbd_parse_filename,
+    .bdrv_file_open             = nbd_open_colo,
+    .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_writev             = nbd_co_writev,
+    .bdrv_close                 = nbd_close,
+    .bdrv_co_flush_to_os        = nbd_co_flush,
+    .bdrv_co_discard            = nbd_co_discard,
+    .bdrv_getlength             = nbd_getlength,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_refresh_filename      = nbd_refresh_filename,
+
+    .has_variable_length        = true,
+};
+
 static void bdrv_nbd_init(void)
 {
     bdrv_register(&bdrv_nbd);
     bdrv_register(&bdrv_nbd_tcp);
     bdrv_register(&bdrv_nbd_unix);
+    bdrv_register(&bdrv_nbd_colo);
 }
 
 block_init(bdrv_nbd_init);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (2 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:48   ` Paolo Bonzini
  2015-03-26  7:12   ` Fam Zheng
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for " Wen Congyang
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Luiz Capitulino, Gonglei,
	Stefan Hajnoczi, Yang Hongyang, Michael Roth, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 block.c                   | 39 +++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  4 ++++
 include/block/block_int.h | 11 +++++++++++
 qapi/block.json           | 16 ++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/block.c b/block.c
index 0fe97de..0ff5cf8 100644
--- a/block.c
+++ b/block.c
@@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
     return &bs->stats;
 }
+
+void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_start_replication) {
+        drv->bdrv_start_replication(bs, mode, errp);
+    } else if (bs->file) {
+        bdrv_start_replication(bs->file, mode, errp);
+    } else {
+        error_set(errp, QERR_UNSUPPORTED);
+    }
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_do_checkpoint) {
+        drv->bdrv_do_checkpoint(bs, errp);
+    } else if (bs->file) {
+        bdrv_do_checkpoint(bs->file, errp);
+    } else {
+        error_set(errp, QERR_UNSUPPORTED);
+    }
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_stop_replication) {
+        drv->bdrv_stop_replication(bs, errp);
+    } else if (bs->file) {
+        bdrv_stop_replication(bs->file, errp);
+    } else {
+        error_set(errp, QERR_UNSUPPORTED);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4c57d63..68f3b1a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dccb092..08dd8ba 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -290,6 +290,17 @@ struct BlockDriver {
      */
     int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+
+    void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
+                                   Error **errp);
+    /* Drop Disk buffer when doing checkpoint. */
+    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+    /*
+     * After failover, we should flush Disk buffer into secondary disk
+     * and stop block replication.
+     */
+    void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index e313465..e640566 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @COLOMode
+#
+# An enumeration of COLO mode.
+#
+# @unprotected: COLO is not started or after failover
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'COLOMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for block replication
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (3 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:50   ` Paolo Bonzini
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: " Wen Congyang
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/quorum.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index d087135..eee829d 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -84,6 +84,8 @@ typedef struct BDRVQuorumState {
                             */
 
     QuorumReadPattern read_pattern;
+
+    int colo_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -1024,6 +1026,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     g_free(opened);
+    s->colo_index = -1;
     goto exit;
 
 close_exit:
@@ -1114,6 +1117,78 @@ static void quorum_refresh_filename(BlockDriverState *bs)
     bs->full_open_options = opts;
 }
 
+static void quorum_stop_replication(BlockDriverState *bs, Error **errp);
+static void quorum_start_replication(BlockDriverState *bs, COLOMode mode,
+                                     Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int count = 0, i, index;
+    Error *local_err = NULL;
+
+    /*
+     * TODO: support COLO_SECONDARY_MODE if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != COLO_MODE_PRIMARY) {
+        error_set(errp, QERR_INVALID_PARAMETER, "mode");
+        return;
+    }
+
+    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+        error_set(errp, QERR_INVALID_PARAMETER, "read pattern");
+        return;
+    }
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_start_replication(s->bs[i], mode, &local_err);
+        if (local_err) {
+            error_free(local_err);
+            local_err = NULL;
+        } else {
+            count++;
+            index = i;
+        }
+    }
+
+    if (count == 0) {
+        /* No child supports block replication */
+        error_set(errp, QERR_UNSUPPORTED);
+    } else if (count > 1) {
+        quorum_stop_replication(bs, NULL);
+        error_setg(errp, "too many children support block replication");
+    } else {
+        s->colo_index = index;
+    }
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->colo_index < 0) {
+        error_setg(errp, "Block replication is not started");
+        return;
+    }
+
+    bdrv_do_checkpoint(s->bs[s->colo_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    if (s->colo_index >= 0) {
+        bdrv_stop_replication(s->bs[s->colo_index], errp);
+        s->colo_index = -1;
+        return;
+    }
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_stop_replication(s->bs[i], NULL);
+    }
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name                        = "quorum",
     .protocol_name                      = "quorum",
@@ -1137,6 +1212,10 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+    .bdrv_start_replication             = quorum_start_replication,
+    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
+    .bdrv_stop_replication              = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (4 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for " Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:50   ` Paolo Bonzini
  2015-03-26  7:21   ` Fam Zheng
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 07/13] allow writing to the backing file Wen Congyang
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 3faf865..753b322 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
     bs->full_open_options = opts;
 }
 
+static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
+                                  Error **errp)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    /*
+     * TODO: support COLO_SECONDARY_MODE if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != COLO_MODE_PRIMARY) {
+        error_set(errp, QERR_INVALID_PARAMETER, "mode");
+        return;
+    }
+
+    if (s->connected) {
+        error_setg(errp, "The connection is established");
+        return;
+    }
+
+    /* TODO: NBD client should be one child of quorum, how to verify it? */
+    nbd_connect_server(bs, errp);
+}
+
+static void nbd_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        error_setg(errp, "The connection is not established");
+        return;
+    }
+}
+
+static void nbd_stop_replication(BlockDriverState *bs, Error **errp)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->connected) {
+        error_setg(errp, "The connection is not established");
+        return;
+    }
+
+    nbd_client_close(bs);
+    s->connected = false;
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -527,6 +573,9 @@ static BlockDriver bdrv_nbd_colo = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_start_replication     = nbd_start_replication,
+    .bdrv_do_checkpoint         = nbd_do_checkpoint,
+    .bdrv_stop_replication      = nbd_stop_replication,
 
     .has_variable_length        = true,
 };
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 07/13] allow writing to the backing file
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (5 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: " Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS Wen Congyang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0ff5cf8..b4d629e 100644
--- a/block.c
+++ b/block.c
@@ -1247,6 +1247,20 @@ out:
     bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+    .name = "backing_file",
+    .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+    .desc = {
+        {
+            .name = ALLOW_WRITE_BACKING_FILE,
+            .type = QEMU_OPT_BOOL,
+            .help = "allow write to backing file",
+        },
+        { /* end of list */ }
+    },
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1261,6 +1275,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     int ret = 0;
     BlockDriverState *backing_hd;
     Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    int flags;
 
     if (bs->backing_hd != NULL) {
         QDECREF(options);
@@ -1273,6 +1289,19 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     }
 
     bs->open_flags &= ~BDRV_O_NO_BACKING;
+    flags = bdrv_backing_flags(bs->open_flags);
+
+    opts = qemu_opts_create(&backing_file_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        goto free_exit;
+    }
+    if (qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false)) {
+        flags |= BDRV_O_RDWR;
+    }
+    qemu_opts_del(opts);
+
     if (qdict_haskey(options, "file.filename")) {
         backing_filename[0] = '\0';
     } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1305,7 +1334,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     assert(bs->backing_hd == NULL);
     ret = bdrv_open(&backing_hd,
                     *backing_filename ? backing_filename : NULL, NULL, options,
-                    bdrv_backing_flags(bs->open_flags), NULL, &local_err);
+                    flags, NULL, &local_err);
     if (ret < 0) {
         bdrv_unref(backing_hd);
         backing_hd = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (6 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 07/13] allow writing to the backing file Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-26  7:07   ` Fam Zheng
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jeff Cody, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Yang Hongyang, zhanghailiang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index db2933e..0950973 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (7 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-26  7:31   ` Fam Zheng
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint Wen Congyang
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Usage:
-drive file=xxx,id=Y, \
-drive file=xxxx,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*

It will create such backing chain:
               {virtio-blk dev 'Y'}                                      {virtio-blk dev 'X'}
                         |                                                          |
                         |                                                          |
                         v                                                          v

    [base] <- [mid] <- ( Y )  <----------------- (hidden target) <--------------- ( X )

                         v                              ^
                         v                              ^
                         v                              ^
                         v                              ^
                         >>>> drive-backup sync=none >>>>

X's backing file is hidden-disk, and hidden-disk's backing file is Y.
Disk Y may be opened or reopened in read-write mode, so A block backup
job is automatically created: source is Y and target is hidden-disk.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c                    | 145 ++++++++++++++++++++++++++++++++++++++++++++-
 include/block/block.h      |   1 +
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/051     |  13 ++++
 tests/qemu-iotests/051.out |  13 ++++
 5 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index b4d629e..bd7fa9c 100644
--- a/block.c
+++ b/block.c
@@ -1351,6 +1351,113 @@ free_exit:
     return ret;
 }
 
+static void backing_reference_completed(void *opaque, int ret)
+{
+    BlockDriverState *hidden_disk = opaque;
+
+    assert(!hidden_disk->backing_reference);
+}
+
+static int bdrv_open_backing_reference_file(BlockDriverState *bs,
+                                            QDict *options, Error **errp)
+{
+    const char *backing_name;
+    QDict *hidden_disk_options = NULL;
+    BlockDriverState *backing_hd, *hidden_disk;
+    BlockBackend *backing_blk;
+    Error *local_err = NULL;
+    int ret = 0;
+
+    backing_name = qdict_get_try_str(options, "drive_id");
+    if (!backing_name) {
+        error_setg(errp, "Backing reference needs option drive_id");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+    qdict_del(options, "drive_id");
+
+    qdict_extract_subqdict(options, &hidden_disk_options, "hidden-disk.");
+    if (!qdict_size(hidden_disk_options)) {
+        error_setg(errp, "Backing reference needs option hidden-disk.*");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    if (qdict_size(options)) {
+        const QDictEntry *entry = qdict_first(options);
+        error_setg(errp, "Backing reference used by '%s' doesn't support "
+                   "the option '%s'", bdrv_get_device_name(bs), entry->key);
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    backing_blk = blk_by_name(backing_name);
+    if (!backing_blk) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, backing_name);
+        ret = -ENOENT;
+        goto free_exit;
+    }
+
+    backing_hd = blk_bs(backing_blk);
+    /* Backing reference itself? */
+    if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
+        error_setg(errp, "Backing reference itself");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+                           errp)) {
+        ret = -EBUSY;
+        goto free_exit;
+    }
+
+    /* hidden-disk is bs's backing file */
+    ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
+    hidden_disk_options = NULL;
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    hidden_disk = bs->backing_hd;
+    if (!hidden_disk->drv || !hidden_disk->drv->supports_backing) {
+        ret = -EINVAL;
+        error_setg(errp, "Hidden disk's driver doesn't support backing files");
+        goto free_exit;
+    }
+
+    bdrv_set_backing_hd(hidden_disk, backing_hd);
+    bdrv_ref(backing_hd);
+
+    /*
+     * backing hd may be opened or reopened in read-write mode, so we
+     * should backup backing hd to hidden disk
+     */
+    bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+                    bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    hidden_disk->backing_blocker);
+
+    bdrv_ref(hidden_disk);
+    backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE,
+                 BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+                 backing_reference_completed, hidden_disk, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        bdrv_unref(hidden_disk);
+        /* FIXME, use which errno? */
+        ret = -EIO;
+        goto free_exit;
+    }
+
+    bs->backing_reference = true;
+
+free_exit:
+    QDECREF(hidden_disk_options);
+    QDECREF(options);
+    return ret;
+}
+
 /*
  * Opens a disk image whose options are given as BlockdevRef in another block
  * device's options.
@@ -1604,13 +1711,37 @@ int bdrv_open(BlockDriverState **pbs, const char *filename,
 
     /* If there is a backing file, use it */
     if ((flags & BDRV_O_NO_BACKING) == 0) {
-        QDict *backing_options;
+        QDict *backing_options, *backing_reference_options;
 
+        qdict_extract_subqdict(options, &backing_reference_options,
+                               "backing_reference.");
         qdict_extract_subqdict(options, &backing_options, "backing.");
-        ret = bdrv_open_backing_file(bs, backing_options, &local_err);
-        if (ret < 0) {
+
+        if (qdict_size(backing_reference_options) &&
+            qdict_size(backing_options)) {
+            error_setg(&local_err,
+                       "Option \"backing_reference.*\" and \"backing.*\""
+                       " cannot be used together");
+            ret = -EINVAL;
+            QDECREF(backing_reference_options);
+            QDECREF(backing_options);
             goto close_and_fail;
         }
+        if (qdict_size(backing_reference_options)) {
+            QDECREF(backing_options);
+            ret = bdrv_open_backing_reference_file(bs,
+                                                   backing_reference_options,
+                                                   &local_err);
+            if (ret) {
+                goto close_and_fail;
+            }
+        } else {
+            QDECREF(backing_reference_options);
+            ret = bdrv_open_backing_file(bs, backing_options, &local_err);
+            if (ret < 0) {
+                goto close_and_fail;
+            }
+        }
     }
 
     bdrv_refresh_filename(bs);
@@ -1941,6 +2072,14 @@ void bdrv_close(BlockDriverState *bs)
     if (bs->drv) {
         if (bs->backing_hd) {
             BlockDriverState *backing_hd = bs->backing_hd;
+            if (bs->backing_reference) {
+                assert(backing_hd->backing_hd);
+                if (backing_hd->backing_hd->job) {
+                    block_job_cancel(backing_hd->backing_hd->job);
+                }
+                bdrv_set_backing_hd(backing_hd, NULL);
+                bdrv_unref(backing_hd->backing_hd);
+            }
             bdrv_set_backing_hd(bs, NULL);
             bdrv_unref(backing_hd);
         }
diff --git a/include/block/block.h b/include/block/block.h
index 68f3b1a..7138e90 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -159,6 +159,7 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_RESIZE,
     BLOCK_OP_TYPE_STREAM,
     BLOCK_OP_TYPE_REPLACE,
+    BLOCK_OP_TYPE_BACKING_REFERENCE,
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 08dd8ba..624945d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -375,6 +375,7 @@ struct BlockDriverState {
     QDict *full_open_options;
     char exact_filename[PATH_MAX];
 
+    bool backing_reference;
     BlockDriverState *backing_hd;
     BlockDriverState *file;
 
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 0360f37..fd67f40 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -116,6 +116,19 @@ run_qemu -drive file="$TEST_IMG",file.backing.driver=file,file.backing.filename=
 run_qemu -drive file="$TEST_IMG",file.backing.driver=qcow2,file.backing.file.filename="$TEST_IMG.orig"
 
 echo
+echo === Backing file reference ===
+echo
+
+run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
+    -drive file="$TEST_IMG",driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename="$TEST_IMG.hidden"
+
+run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
+    -drive file="$TEST_IMG",driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename="$TEST_IMG.hidden",backing.file.filename="$TEST_IMG.orig"
+
+run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
+    -drive file="$TEST_IMG",driver=qcow2,file.backing_reference.drive_id=drive0,file.backing_reference.hidden-disk.filename="$TEST_IMG.hidden",file.backing.file.filename="$TEST_IMG.orig"
+
+echo
 echo === Enable and disable lazy refcounting on the command line, plus some invalid values ===
 echo
 
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 2890eac..cb8340b 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -75,6 +75,19 @@ Testing: -drive file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.fil
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.file.filename=TEST_DIR/t.qcow2.orig: Driver doesn't support backing files
 
 
+=== Backing file reference ===
+
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) q^[[K^[[Dqu^[[K^[[D^[[Dqui^[[K^[[D^[[D^[[Dquit^[[K
+
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden,backing.file.filename=TEST_DIR/t.qcow2.orig
+QEMU_PROG: -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing=drive0,backing.file.filename=TEST_DIR/t.qcow2.orig: Option "backing_reference.*" and "backing.*" cannot be used together
+
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,file.backing_reference.drive_id=drive0,file.backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden,file.backing.file.filename=TEST_DIR/t.qcow2.orig
+QEMU_PROG: -drive file=TEST_DIR/t.qcow2,driver=qcow2,file.backing=drive0,file.backing.file.filename=TEST_DIR/t.qcow2.orig: Option "backing_reference.*" and "backing.*" cannot be used together
+
+
 === Enable and disable lazy refcounting on the command line, plus some invalid values ===
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (8 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:55   ` Paolo Bonzini
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 11/13] qcow2: support colo Wen Congyang
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jeff Cody, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Yang Hongyang, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Jeff Cody <jcody@redhat.com>
---
 block/backup.c            | 12 ++++++++++++
 include/block/block_int.h |  1 +
 include/qemu/hbitmap.h    |  8 ++++++++
 util/hbitmap.c            | 19 +++++++++++++++++++
 4 files changed, 40 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 1c535b1..4e9d535 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -435,3 +435,15 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
     job->common.co = qemu_coroutine_create(backup_run);
     qemu_coroutine_enter(job->common.co, job);
 }
+
+void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    if (job->driver != &backup_job_driver) {
+        error_setg(errp, "It is not backup job");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 624945d..45d547b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -628,6 +628,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
                   BlockdevOnError on_target_error,
                   BlockCompletionFunc *cb, void *opaque,
                   Error **errp);
+void backup_do_checkpoint(BlockJob *job, Error **errp);
 
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 550d7ce..95a55e4 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -109,6 +109,14 @@ void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count);
 void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
 
 /**
+ * hbitmap_reset_all:
+ * @hb: HBitmap to operate on.
+ *
+ * Reset all bits in an HBitmap.
+ */
+void hbitmap_reset_all(HBitmap *hb);
+
+/**
  * hbitmap_get:
  * @hb: HBitmap to operate on.
  * @item: Bit to query (0-based).
diff --git a/util/hbitmap.c b/util/hbitmap.c
index ab13971..4111ca5 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -353,6 +353,25 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count)
     hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
 }
 
+void hbitmap_reset_all(HBitmap *hb)
+{
+#if 0
+    hbitmap_reset(hb, 0, hb->size << hb->granularity);
+#else
+    uint64_t size = hb->size;
+    unsigned int i;
+
+    /* Same as hbitmap_alloc() except memset() */
+    for (i = HBITMAP_LEVELS; i-- > 0; ) {
+        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+        memset(hb->levels[i], 0, size * sizeof(unsigned long));
+    }
+
+    assert(size == 1);
+    hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
+#endif
+}
+
 bool hbitmap_get(const HBitmap *hb, uint64_t item)
 {
     /* Compute position and bit in the last layer.  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 11/13] qcow2: support colo
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (9 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication Wen Congyang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/qcow2.c | 447 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.h |   6 +
 2 files changed, 452 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 32bdf75..cc10af0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -35,6 +35,7 @@
 #include "qapi-event.h"
 #include "trace.h"
 #include "qemu/option_int.h"
+#include "block/blockjob.h"
 
 /*
   Differences with QCOW:
@@ -1496,7 +1497,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
     memset(s, 0, sizeof(BDRVQcowState));
     options = qdict_clone_shallow(bs->options);
 
-    ret = qcow2_open(bs, options, flags, &local_err);
+    ret = bs->drv->bdrv_open(bs, options, flags, &local_err);
     QDECREF(options);
     if (local_err) {
         error_setg(errp, "Could not reopen qcow2 layer: %s",
@@ -2947,9 +2948,453 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_amend_options  = qcow2_amend_options,
 };
 
+/*********************************************************/
+/*
+ * qcow2 colo functions.
+ *
+ * Note:
+ * 1. The image format is qcow2, but it is only used for block replication.
+ * 2. The image is created by qcow2, not qcow+colo.
+ * 3. The image is an empty image.
+ * 4. The image doesn't contain any snapshot.
+ * 5. The image doesn't contain backing_file in image header.
+ * 6. Active disk and hidden disk use this driver.
+ * 7. The size of Active disk, hidden disk, nbd target should be the same.
+ */
+
+enum {
+    COLO_NONE,      /* block replication is not started */
+    COLO_RUNNING,   /* block replication is running */
+    COLO_DONE,      /* block replication is done(failover) */
+};
+
+static int qcow2_colo_probe(const uint8_t *buf, int buf_size,
+                            const char *filename)
+{
+    /* Use qcow2 as default */
+    return 0;
+}
+
+#define COLO_OPT_EXPORT         "export"
+static QemuOptsList qcow2_colo_runtime_opts = {
+    .name = "qcow2+colo",
+    .head = QTAILQ_HEAD_INITIALIZER(qcow2_colo_runtime_opts.head),
+    .desc = {
+        {
+            .name = COLO_OPT_EXPORT,
+            .type = QEMU_OPT_STRING,
+            .help = "The NBD server name",
+        },
+        { /* end of list */ }
+    },
+};
+
+/*
+ * usage: -drive if=xxx,driver=qcow2+colo,export=xxx,\
+ *        backing_reference.drive_id=xxxx,backing_reference.hidden-disk.*
+ */
+static int qcow2_colo_open(BlockDriverState *bs, QDict *options, int flags,
+                           Error **errp)
+{
+    int ret;
+    BDRVQcowState *s = bs->opaque;;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+
+    ret = qcow2_open(bs, options, flags, errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = -ENOTSUP;
+    if (s->nb_snapshots) {
+        error_setg(errp, "qcow2+colo doesn't support snapshot");
+        goto fail;
+    }
+
+    if (!bs->backing_hd && bs->backing_file[0] != '\0') {
+        error_setg(errp,
+                   "qcow2+colo doesn't support backing_file in image header");
+        goto fail;
+    }
+
+    opts = qemu_opts_create(&qcow2_colo_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    s->export_name = g_strdup(qemu_opt_get(opts, COLO_OPT_EXPORT));
+    if (!s->export_name) {
+        error_setg(&local_err, "Missing the option export");
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    return 0;
+
+fail:
+    qcow2_close(bs);
+    qemu_opts_del(opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
+static coroutine_fn int qcow2_colo_co_readv(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            int remaining_sectors,
+                                            QEMUIOVector *qiov)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+
+    switch (s->colo_state) {
+    case COLO_NONE:
+        return -EIO;
+    case COLO_RUNNING:
+        return qcow2_co_readv(bs, sector_num, remaining_sectors, qiov);
+    case COLO_DONE:
+        nbd_target = bs->backing_hd->backing_hd;
+        return bdrv_co_readv(nbd_target, sector_num, remaining_sectors, qiov);
+    default:
+        abort();
+    }
+}
+
+static coroutine_fn int qcow2_colo_co_writev(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+
+    switch (s->colo_state) {
+    case COLO_NONE:
+        return -EIO;
+    case COLO_RUNNING:
+        return qcow2_co_writev(bs, sector_num, remaining_sectors, qiov);
+    case COLO_DONE:
+        nbd_target = bs->backing_hd->backing_hd;
+        return bdrv_co_writev(nbd_target, sector_num, remaining_sectors, qiov);
+    default:
+        abort();
+    }
+}
+
+static coroutine_fn int qcow2_colo_co_write_zeroes(BlockDriverState *bs,
+                                                   int64_t sector_num,
+                                                   int nb_sectors,
+                                                   BdrvRequestFlags flags)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+
+    switch (s->colo_state) {
+    case COLO_NONE:
+        return -EIO;
+    case COLO_RUNNING:
+        return qcow2_co_write_zeroes(bs, sector_num, nb_sectors, flags);
+    case COLO_DONE:
+        nbd_target = bs->backing_hd->backing_hd;
+        return bdrv_co_write_zeroes(nbd_target, sector_num, nb_sectors, flags);
+    default:
+        abort();
+    }
+}
+
+static coroutine_fn int qcow2_colo_co_flush_to_os(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+
+    switch (s->colo_state) {
+    case COLO_NONE:
+        return -EIO;
+    case COLO_RUNNING:
+        return qcow2_co_flush_to_os(bs);
+    case COLO_DONE:
+        nbd_target = bs->backing_hd->backing_hd;
+        return bdrv_co_flush(nbd_target);
+    default:
+        abort();
+    }
+}
+
+static coroutine_fn int qcow2_colo_co_discard(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int nb_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+
+    switch (s->colo_state) {
+    case COLO_NONE:
+        return -EIO;
+    case COLO_RUNNING:
+        return qcow2_co_discard(bs, sector_num, nb_sectors);
+    case COLO_DONE:
+        nbd_target = bs->backing_hd->backing_hd;
+        return bdrv_co_discard(nbd_target, sector_num, nb_sectors);
+    default:
+        abort();
+    }
+}
+
+static int qcow2_colo_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                                       const uint8_t *buf, int nb_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+
+    switch (s->colo_state) {
+    case COLO_NONE:
+        return -EIO;
+    case COLO_RUNNING:
+        return qcow2_write_compressed(bs, sector_num, buf, nb_sectors);
+    case COLO_DONE:
+        nbd_target = bs->backing_hd->backing_hd;
+        return bdrv_write_compressed(nbd_target, sector_num, buf, nb_sectors);
+    default:
+        abort();
+    }
+}
+
+static void qcow2_colo_start_replication(BlockDriverState *bs, COLOMode mode,
+                                         Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target, *hidden_disk;
+    Error *local_err = NULL;
+    int64_t active_length, hidden_length, nbd_length;
+
+    /*
+     * TODO: support COLO_MODE_PRIMARY if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != COLO_MODE_SECONDARY) {
+        error_set(errp, QERR_INVALID_PARAMETER, "mode");
+        return;
+    }
+
+    if (!bs->backing_reference) {
+        error_set(errp, QERR_UNSUPPORTED);
+        return;
+    }
+
+    if (s->colo_state == COLO_RUNNING) {
+        error_setg(errp, "Block replication is running");
+        return;
+    } else if (s->colo_state == COLO_DONE) {
+        error_setg(errp, "Cannot restart block replication");
+        return;
+    }
+
+    nbd_target = bs->backing_hd->backing_hd;
+    if (!nbd_target->job ||
+        nbd_target->job->driver->job_type != BLOCK_JOB_TYPE_BACKUP) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    hidden_disk = bs->backing_hd;
+    nbd_target = hidden_disk->backing_hd;
+
+    /* verify the length */
+    active_length = bdrv_getlength(bs);
+    hidden_length = bdrv_getlength(hidden_disk);
+    nbd_length = bdrv_getlength(nbd_target);
+    if (active_length < 0 || hidden_length < 0 || nbd_length < 0 ||
+        active_length != hidden_length || hidden_length != nbd_length) {
+        error_setg(errp, "active disk, hidden disk, nbd target's length are "
+                   "not the same");
+        return;
+    }
+
+    if (!hidden_disk->drv->bdrv_make_empty) {
+        error_set(errp, QERR_UNSUPPORTED);
+        return;
+    }
+
+    /* start NBD server */
+    s->exp = nbd_export_new(nbd_target->blk, 0, -1, 0, NULL, &local_err);
+    if (!s->exp) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    nbd_export_set_name(s->exp, s->export_name);
+
+    s->colo_state = COLO_RUNNING;
+}
+
+static void qcow2_colo_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *hidden_disk, *nbd_target;
+    int ret;
+
+    if (s->colo_state != COLO_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    hidden_disk = bs->backing_hd;
+    nbd_target = hidden_disk->backing_hd;
+    if (!nbd_target->job) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    backup_do_checkpoint(nbd_target->job, errp);
+
+    ret = qcow2_make_empty(bs);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = hidden_disk->drv->bdrv_make_empty(hidden_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+/*
+ * TODO: Use blockjob?
+ */
+static void commit_data(BlockDriverState *from, BlockDriverState *to,
+                        int commit_buffer_sectors, Error **errp)
+{
+    int64_t len, target_length;
+    int64_t sector_num, end;
+    void *buf = NULL;
+    int n = 0, ret;
+
+    len = bdrv_getlength(from);
+    target_length = bdrv_getlength(to);
+    if (len < 0 || target_length < 0) {
+        /* should not happen */
+        error_set(errp, QERR_UNDEFINED_ERROR);
+        return;
+    }
+
+    assert(len == target_length);
+    end = len >> BDRV_SECTOR_BITS;
+    buf = qemu_blockalign(from, commit_buffer_sectors << BDRV_SECTOR_BITS);
+
+    for (sector_num = 0; sector_num < end; sector_num += n) {
+        ret = bdrv_is_allocated(from, sector_num, commit_buffer_sectors, &n);
+        if (ret < 0) {
+            error_set(errp, QERR_UNDEFINED_ERROR);
+            return;
+        }
+
+        if (ret == 0) {
+            continue;
+        }
+
+        ret = bdrv_read(from, sector_num, buf, n);
+        if (ret) {
+            error_set(errp, QERR_IO_ERROR);
+            return;
+        }
+
+        ret = bdrv_write(to, sector_num, buf, n);
+        if (ret) {
+            error_set(errp, QERR_IO_ERROR);
+            return;
+        }
+    }
+}
+
+static void qcow2_colo_stop_replication(BlockDriverState *bs, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    BlockDriverState *nbd_target;
+    Error *local_err = NULL;
+
+    if (s->colo_state != COLO_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    /* stop NBD server */
+    nbd_export_close(s->exp);
+    nbd_export_put(s->exp);
+
+    nbd_target = bs->backing_hd->backing_hd;
+
+    if (!nbd_target->job ||
+        nbd_target->job->driver->job_type != BLOCK_JOB_TYPE_BACKUP) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    block_job_cancel(nbd_target->job);
+
+    /* commit data from active disk to hidden disk*/
+    commit_data(bs, bs->backing_hd, s->cluster_sectors, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* commit data from hidden disk to nbd target */
+    commit_data(bs->backing_hd, nbd_target, s->cluster_sectors, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+BlockDriver bdrv_qcow2_colo = {
+    .format_name        = "qcow2+colo",
+    .instance_size      = sizeof(BDRVQcowState),
+    .bdrv_probe         = qcow2_colo_probe,
+    .bdrv_open          = qcow2_colo_open,
+    .bdrv_close         = qcow2_close,
+    .bdrv_reopen_prepare  = qcow2_reopen_prepare,
+    .bdrv_has_zero_init = bdrv_has_zero_init_1,
+    .bdrv_co_get_block_status = qcow2_co_get_block_status,
+    .bdrv_set_key       = qcow2_set_key,
+
+    .bdrv_co_readv          = qcow2_colo_co_readv,
+    .bdrv_co_writev         = qcow2_colo_co_writev,
+    .bdrv_co_flush_to_os    = qcow2_colo_co_flush_to_os,
+
+    .bdrv_co_write_zeroes   = qcow2_colo_co_write_zeroes,
+    .bdrv_co_discard        = qcow2_colo_co_discard,
+    .bdrv_write_compressed  = qcow2_colo_write_compressed,
+    .bdrv_make_empty        = qcow2_make_empty,
+
+    .bdrv_get_info          = qcow2_get_info,
+    .bdrv_get_specific_info = qcow2_get_specific_info,
+
+    .bdrv_save_vmstate    = qcow2_save_vmstate,
+    .bdrv_load_vmstate    = qcow2_load_vmstate,
+
+    .supports_backing           = true,
+
+    .bdrv_refresh_limits        = qcow2_refresh_limits,
+    .bdrv_invalidate_cache      = qcow2_invalidate_cache,
+
+    .bdrv_check          = qcow2_check,
+    .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_start_replication     = qcow2_colo_start_replication,
+    .bdrv_do_checkpoint         = qcow2_colo_do_checkpoint,
+    .bdrv_stop_replication      = qcow2_colo_stop_replication,
+};
+
 static void bdrv_qcow2_init(void)
 {
     bdrv_register(&bdrv_qcow2);
+    bdrv_register(&bdrv_qcow2_colo);
 }
 
 block_init(bdrv_qcow2_init);
diff --git a/block/qcow2.h b/block/qcow2.h
index aa6d367..9d5e260 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
 
 #include "qemu/aes.h"
 #include "block/coroutine.h"
+#include "block/nbd.h"
 
 //#define DEBUG_ALLOC
 //#define DEBUG_ALLOC2
@@ -283,6 +284,11 @@ typedef struct BDRVQcowState {
     QLIST_HEAD(, Qcow2UnknownHeaderExtension) unknown_header_ext;
     QTAILQ_HEAD (, Qcow2DiscardRegion) discards;
     bool cache_discards;
+
+    /* Used for block replication */
+    int colo_state;
+    const char *export_name;
+    NBDExport *exp;
 } BDRVQcowState;
 
 struct QCowAIOCB;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (10 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 11/13] qcow2: support colo Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-26  7:03   ` Fam Zheng
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 13/13] Don't allow a disk use backing reference target Wen Congyang
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index bd7fa9c..3af5ad4 100644
--- a/block.c
+++ b/block.c
@@ -6368,6 +6368,12 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    Error *local_err = NULL;
+
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
+        error_free(local_err);
+        return;
+    }
 
     if (drv && drv->bdrv_start_replication) {
         drv->bdrv_start_replication(bs, mode, errp);
@@ -6381,6 +6387,12 @@ void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    Error *local_err = NULL;
+
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
+        error_free(local_err);
+        return;
+    }
 
     if (drv && drv->bdrv_do_checkpoint) {
         drv->bdrv_do_checkpoint(bs, errp);
@@ -6394,6 +6406,12 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    Error *local_err = NULL;
+
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
+        error_free(local_err);
+        return;
+    }
 
     if (drv && drv->bdrv_stop_replication) {
         drv->bdrv_stop_replication(bs, errp);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [RFC PATCH COLO v2 13/13] Don't allow a disk use backing reference target
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (11 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication Wen Congyang
@ 2015-03-25  9:36 ` Wen Congyang
  2015-03-25 12:56 ` [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Paolo Bonzini
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-25  9:36 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index 3af5ad4..548cafa 100644
--- a/block.c
+++ b/block.c
@@ -1399,6 +1399,14 @@ static int bdrv_open_backing_reference_file(BlockDriverState *bs,
     }
 
     backing_hd = blk_bs(backing_blk);
+    /* Don't allow a disk use backing reference target */
+    ret = blk_attach_dev(backing_hd->blk, bs);
+    if (ret < 0) {
+        error_setg(errp, "backing_hd %s is used by the other device model",
+                   backing_name);
+        goto free_exit;
+    }
+
     /* Backing reference itself? */
     if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
         error_setg(errp, "Backing reference itself");
@@ -2077,6 +2085,7 @@ void bdrv_close(BlockDriverState *bs)
                 if (backing_hd->backing_hd->job) {
                     block_job_cancel(backing_hd->backing_hd->job);
                 }
+                blk_detach_dev(backing_hd->backing_hd->blk, bs);
                 bdrv_set_backing_hd(backing_hd, NULL);
                 bdrv_unref(backing_hd->backing_hd);
             }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors Wen Congyang
@ 2015-03-25 12:45   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:45 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang



On 25/03/2015 10:36, Wen Congyang wrote:
> +static QemuOptsList quorum_children_common_opts = {
> +    .name = "qcow2 children",

.name should be "quorum children".

Paolo

> +    .head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
> +    .desc = {
> +        {
> +            .name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
> +            .type = QEMU_OPT_BOOL,
> +            .help = "ignore child I/O error",
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later Wen Congyang
@ 2015-03-25 12:46   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:46 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang



On 25/03/2015 10:36, Wen Congyang wrote:
> The secondary qemu starts later than the primary qemu, so we
> cannot connect to nbd server in bdrv_open().
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block/nbd.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 108 insertions(+), 14 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2176186..3faf865 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -44,6 +44,8 @@
>  typedef struct BDRVNBDState {
>      NbdClientSession client;
>      QemuOpts *socket_opts;
> +    char *export;
> +    bool connected;
>  } BDRVNBDState;
>  
>  static int nbd_parse_uri(const char *filename, QDict *options)
> @@ -254,50 +256,95 @@ static int nbd_establish_connection(BlockDriverState *bs, Error **errp)
>      return sock;
>  }
>  
> -static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
> -                    Error **errp)
> +static int nbd_connect_server(BlockDriverState *bs, Error **errp)
>  {
>      BDRVNBDState *s = bs->opaque;
> -    char *export = NULL;
>      int result, sock;
> -    Error *local_err = NULL;
> -
> -    /* Pop the config into our state object. Exit if invalid. */
> -    nbd_config(s, options, &export, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        return -EINVAL;
> -    }
>  
>      /* establish TCP connection, return error if it fails
>       * TODO: Configurable retry-until-timeout behaviour.
>       */
>      sock = nbd_establish_connection(bs, errp);
>      if (sock < 0) {
> -        g_free(export);
> +        g_free(s->export);
>          return sock;
>      }
>  
>      /* NBD handshake */
> -    result = nbd_client_init(bs, sock, export, errp);
> -    g_free(export);
> +    result = nbd_client_init(bs, sock, s->export, errp);
> +    g_free(s->export);
> +    s->export = NULL;
> +    if (!result) {
> +        s->connected = true;
> +    }
> +
>      return result;
>  }
>  
> +static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
> +                    Error **errp)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +    Error *local_err = NULL;
> +
> +    /* Pop the config into our state object. Exit if invalid. */
> +    nbd_config(s, options, &s->export, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return -EINVAL;
> +    }
> +
> +    return nbd_connect_server(bs, errp);
> +}
> +
> +static int nbd_open_colo(BlockDriverState *bs, QDict *options, int flags,
> +                         Error **errp)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +    Error *local_err = NULL;
> +
> +    /* Pop the config into our state object. Exit if invalid. */
> +    nbd_config(s, options, &s->export, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
>  static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
>                          int nb_sectors, QEMUIOVector *qiov)
>  {
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        return -EIO;
> +    }
> +
>      return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
>  }
>  
>  static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
>                           int nb_sectors, QEMUIOVector *qiov)
>  {
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        return -EIO;
> +    }
> +
>      return nbd_client_co_writev(bs, sector_num, nb_sectors, qiov);
>  }
>  
>  static int nbd_co_flush(BlockDriverState *bs)
>  {
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        return -EIO;
> +    }
> +
>      return nbd_client_co_flush(bs);
>  }
>  
> @@ -310,6 +357,12 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
>                            int nb_sectors)
>  {
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        return -EIO;
> +    }
> +
>      return nbd_client_co_discard(bs, sector_num, nb_sectors);
>  }
>  
> @@ -319,23 +372,44 @@ static void nbd_close(BlockDriverState *bs)
>  
>      qemu_opts_del(s->socket_opts);
>      nbd_client_close(bs);
> +    s->connected = false;
>  }
>  
>  static int64_t nbd_getlength(BlockDriverState *bs)
>  {
>      BDRVNBDState *s = bs->opaque;
>  
> +    if (!s->connected) {
> +        /*
> +         * We cannot return -ENOTCONN, otherwise refresh_total_sectors()
> +         * will fail, and we cannot open nbd client.
> +         */
> +        return 0;
> +    }
> +
>      return s->client.size;
>  }
>  
>  static void nbd_detach_aio_context(BlockDriverState *bs)
>  {
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        return;
> +    }
> +
>      nbd_client_detach_aio_context(bs);
>  }
>  
>  static void nbd_attach_aio_context(BlockDriverState *bs,
>                                     AioContext *new_context)
>  {
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        return;
> +    }
> +
>      nbd_client_attach_aio_context(bs, new_context);
>  }
>  
> @@ -438,11 +512,31 @@ static BlockDriver bdrv_nbd_unix = {
>      .bdrv_refresh_filename      = nbd_refresh_filename,
>  };
>  
> +static BlockDriver bdrv_nbd_colo = {
> +    .format_name                = "nbd+colo",
> +    .protocol_name              = "nbd+colo",
> +    .instance_size              = sizeof(BDRVNBDState),
> +    .bdrv_parse_filename        = nbd_parse_filename,
> +    .bdrv_file_open             = nbd_open_colo,
> +    .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_writev             = nbd_co_writev,
> +    .bdrv_close                 = nbd_close,
> +    .bdrv_co_flush_to_os        = nbd_co_flush,
> +    .bdrv_co_discard            = nbd_co_discard,
> +    .bdrv_getlength             = nbd_getlength,
> +    .bdrv_detach_aio_context    = nbd_detach_aio_context,
> +    .bdrv_attach_aio_context    = nbd_attach_aio_context,
> +    .bdrv_refresh_filename      = nbd_refresh_filename,
> +
> +    .has_variable_length        = true,
> +};
> +
>  static void bdrv_nbd_init(void)
>  {
>      bdrv_register(&bdrv_nbd);
>      bdrv_register(&bdrv_nbd_tcp);
>      bdrv_register(&bdrv_nbd_unix);
> +    bdrv_register(&bdrv_nbd_colo);
>  }
>  
>  block_init(bdrv_nbd_init);
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication Wen Congyang
@ 2015-03-25 12:48   ` Paolo Bonzini
  2015-03-25 15:43     ` Eric Blake
  2015-03-26  7:12   ` Fam Zheng
  1 sibling, 1 reply; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:48 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Luiz Capitulino, Gonglei,
	Stefan Hajnoczi, Yang Hongyang, Michael Roth, zhanghailiang



On 25/03/2015 10:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  block.c                   | 39 +++++++++++++++++++++++++++++++++++++++
>  include/block/block.h     |  4 ++++
>  include/block/block_int.h | 11 +++++++++++
>  qapi/block.json           | 16 ++++++++++++++++
>  4 files changed, 70 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 0fe97de..0ff5cf8 100644
> --- a/block.c
> +++ b/block.c
> @@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>  {
>      return &bs->stats;
>  }
> +
> +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_start_replication) {
> +        drv->bdrv_start_replication(bs, mode, errp);
> +    } else if (bs->file) {
> +        bdrv_start_replication(bs->file, mode, errp);
> +    } else {
> +        error_set(errp, QERR_UNSUPPORTED);
> +    }
> +}
> +
> +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_do_checkpoint) {
> +        drv->bdrv_do_checkpoint(bs, errp);
> +    } else if (bs->file) {
> +        bdrv_do_checkpoint(bs->file, errp);
> +    } else {
> +        error_set(errp, QERR_UNSUPPORTED);
> +    }
> +}
> +
> +void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_stop_replication) {
> +        drv->bdrv_stop_replication(bs, errp);
> +    } else if (bs->file) {
> +        bdrv_stop_replication(bs->file, errp);
> +    } else {
> +        error_set(errp, QERR_UNSUPPORTED);
> +    }
> +}
> diff --git a/include/block/block.h b/include/block/block.h
> index 4c57d63..68f3b1a 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>  
>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>  
> +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp);
> +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
> +void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
> +
>  #endif
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index dccb092..08dd8ba 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -290,6 +290,17 @@ struct BlockDriver {
>       */
>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>  
> +
> +    void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
> +                                   Error **errp);
> +    /* Drop Disk buffer when doing checkpoint. */
> +    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
> +    /*
> +     * After failover, we should flush Disk buffer into secondary disk
> +     * and stop block replication.
> +     */
> +    void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
> +
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> diff --git a/qapi/block.json b/qapi/block.json
> index e313465..e640566 100644
> --- a/qapi/block.json
> +++ b/qapi/block.json
> @@ -40,6 +40,22 @@
>    'data': ['auto', 'none', 'lba', 'large', 'rechs']}
>  
>  ##
> +# @COLOMode
> +#
> +# An enumeration of COLO mode.
> +#
> +# @unprotected: COLO is not started or after failover
> +#
> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
> +#
> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
> +#
> +# Since: 2.4
> +##
> +{ 'enum' : 'COLOMode',
> +  'data' : ['unprotected', 'primary', 'secondary']}

Perhaps replace COLOMode with ReplicationMode or FaultToleranceMode?

> +##
>  # @BlockdevSnapshotInternal
>  #
>  # @device: the name of the device to generate the snapshot from
> 

Apart from this,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for block replication
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for " Wen Congyang
@ 2015-03-25 12:50   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:50 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang



On 25/03/2015 10:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block/quorum.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 79 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index d087135..eee829d 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -84,6 +84,8 @@ typedef struct BDRVQuorumState {
>                              */
>  
>      QuorumReadPattern read_pattern;
> +
> +    int colo_index; /* store which child supports block replication */
>  } BDRVQuorumState;
>  
>  typedef struct QuorumAIOCB QuorumAIOCB;
> @@ -1024,6 +1026,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      g_free(opened);
> +    s->colo_index = -1;
>      goto exit;
>  
>  close_exit:
> @@ -1114,6 +1117,78 @@ static void quorum_refresh_filename(BlockDriverState *bs)
>      bs->full_open_options = opts;
>  }
>  
> +static void quorum_stop_replication(BlockDriverState *bs, Error **errp);
> +static void quorum_start_replication(BlockDriverState *bs, COLOMode mode,
> +                                     Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int count = 0, i, index;
> +    Error *local_err = NULL;
> +
> +    /*
> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
> +     * QEMU becoming primary QEMU.
> +     */
> +    if (mode != COLO_MODE_PRIMARY) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");
> +        return;
> +    }
> +
> +    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "read pattern");
> +        return;
> +    }
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        bdrv_start_replication(s->bs[i], mode, &local_err);
> +        if (local_err) {
> +            error_free(local_err);
> +            local_err = NULL;
> +        } else {
> +            count++;
> +            index = i;
> +        }
> +    }
> +
> +    if (count == 0) {
> +        /* No child supports block replication */
> +        error_set(errp, QERR_UNSUPPORTED);
> +    } else if (count > 1) {
> +        quorum_stop_replication(bs, NULL);
> +        error_setg(errp, "too many children support block replication");
> +    } else {
> +        s->colo_index = index;
> +    }
> +}
> +
> +static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +
> +    if (s->colo_index < 0) {
> +        error_setg(errp, "Block replication is not started");
> +        return;
> +    }
> +
> +    bdrv_do_checkpoint(s->bs[s->colo_index], errp);
> +}
> +
> +static void quorum_stop_replication(BlockDriverState *bs, Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    if (s->colo_index >= 0) {
> +        bdrv_stop_replication(s->bs[s->colo_index], errp);
> +        s->colo_index = -1;
> +        return;
> +    }
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        bdrv_stop_replication(s->bs[i], NULL);
> +    }

Why do anything at all if s->colo_index < 0?  So, with the for loop
removed (and possibly the if condition changed to "s->colo_index < 0",
similar to quorum_do_checkpoint:

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> +}
> +
>  static BlockDriver bdrv_quorum = {
>      .format_name                        = "quorum",
>      .protocol_name                      = "quorum",
> @@ -1137,6 +1212,10 @@ static BlockDriver bdrv_quorum = {
>  
>      .is_filter                          = true,
>      .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +
> +    .bdrv_start_replication             = quorum_start_replication,
> +    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
> +    .bdrv_stop_replication              = quorum_stop_replication,
>  };
>  
>  static void bdrv_quorum_init(void)
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: " Wen Congyang
@ 2015-03-25 12:50   ` Paolo Bonzini
  2015-03-26  7:21   ` Fam Zheng
  1 sibling, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:50 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang



On 25/03/2015 10:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 3faf865..753b322 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
>      bs->full_open_options = opts;
>  }
>  
> +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
> +                                  Error **errp)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +
> +    /*
> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
> +     * QEMU becoming primary QEMU.
> +     */
> +    if (mode != COLO_MODE_PRIMARY) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");
> +        return;
> +    }
> +
> +    if (s->connected) {
> +        error_setg(errp, "The connection is established");
> +        return;
> +    }
> +
> +    /* TODO: NBD client should be one child of quorum, how to verify it? */
> +    nbd_connect_server(bs, errp);
> +}
> +
> +static void nbd_do_checkpoint(BlockDriverState *bs, Error **errp)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        error_setg(errp, "The connection is not established");
> +        return;
> +    }
> +}
> +
> +static void nbd_stop_replication(BlockDriverState *bs, Error **errp)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +
> +    if (!s->connected) {
> +        error_setg(errp, "The connection is not established");
> +        return;
> +    }
> +
> +    nbd_client_close(bs);
> +    s->connected = false;
> +}
> +
>  static BlockDriver bdrv_nbd = {
>      .format_name                = "nbd",
>      .protocol_name              = "nbd",
> @@ -527,6 +573,9 @@ static BlockDriver bdrv_nbd_colo = {
>      .bdrv_detach_aio_context    = nbd_detach_aio_context,
>      .bdrv_attach_aio_context    = nbd_attach_aio_context,
>      .bdrv_refresh_filename      = nbd_refresh_filename,
> +    .bdrv_start_replication     = nbd_start_replication,
> +    .bdrv_do_checkpoint         = nbd_do_checkpoint,
> +    .bdrv_stop_replication      = nbd_stop_replication,
>  
>      .has_variable_length        = true,
>  };
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-03-25 12:55   ` Paolo Bonzini
  2015-03-26  0:59     ` Wen Congyang
  0 siblings, 1 reply; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:55 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jeff Cody, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Yang Hongyang, zhanghailiang



On 25/03/2015 10:36, Wen Congyang wrote:
> 
> +void backup_do_checkpoint(BlockJob *job, Error **errp)
> +{
> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> +
> +    if (job->driver != &backup_job_driver) {
> +        error_setg(errp, "It is not backup job");
> +        return;
> +    }
> +
> +    hbitmap_reset_all(backup_job->bitmap);
> +}

Please add instead a block_job_do_checkpoint API, and a do_checkpoint
function pointer to BlockJobDriver.

> +{
> +#if 0
> +    hbitmap_reset(hb, 0, hb->size << hb->granularity);
> +#else
> +    uint64_t size = hb->size;
> +    unsigned int i;
> +
> +    /* Same as hbitmap_alloc() except memset() */
> +    for (i = HBITMAP_LEVELS; i-- > 0; ) {

This can be "--i >= 1"...

> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +        memset(hb->levels[i], 0, size * sizeof(unsigned long));
> +    }
> +
> +    assert(size == 1);
> +    hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);

... if you use "=" instead of "|=" here.

> +#endif
> +}

Please pick one implementation (no #if), and also add a testcase to
tests/test-hbitmap.c.

Paolo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (12 preceding siblings ...)
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 13/13] Don't allow a disk use backing reference target Wen Congyang
@ 2015-03-25 12:56 ` Paolo Bonzini
  2015-03-25 14:24 ` Dr. David Alan Gilbert
  2015-07-01  3:09 ` Michael R. Hines
  15 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-25 12:56 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Yang Hongyang



On 25/03/2015 10:36, Wen Congyang wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> You can get the patch here:
> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
> 
> Changs Log:
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Wen Congyang (13):
>   docs: block replication's description
>   quorum: allow ignoring child errors
>   NBD client: connect to nbd server later
>   Add new block driver interfaces to control block replication
>   quorum: implement block driver interfaces for block replication
>   NBD client: implement block driver interfaces for block replication
>   allow writing to the backing file
>   Allow creating backup jobs when opening BDS
>   block: Parse "backing_reference" option to reference existing BDS
>   Backup: clear all bitmap when doing block checkpoint
>   qcow2: support colo
>   skip nbd_target when starting block replication
>   Don't allow a disk use backing reference target
> 
>  block.c                    | 242 +++++++++++++++++++++++-
>  block/Makefile.objs        |   2 +-
>  block/backup.c             |  12 ++
>  block/nbd.c                | 171 +++++++++++++++--
>  block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>  block/qcow2.h              |   6 +
>  block/quorum.c             | 143 ++++++++++++++-
>  docs/block-replication.txt | 147 +++++++++++++++
>  include/block/block.h      |   5 +
>  include/block/block_int.h  |  13 ++
>  include/qemu/hbitmap.h     |   8 +
>  qapi/block.json            |  16 ++
>  tests/qemu-iotests/051     |  13 ++
>  tests/qemu-iotests/051.out |  13 ++
>  util/hbitmap.c             |  19 ++
>  15 files changed, 1230 insertions(+), 27 deletions(-)
>  create mode 100644 docs/block-replication.txt
> 

Looks nice!

I've reviewed the patches where I'm competent.

Paolo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (13 preceding siblings ...)
  2015-03-25 12:56 ` [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Paolo Bonzini
@ 2015-03-25 14:24 ` Dr. David Alan Gilbert
  2015-03-26  2:34   ` Gonglei
  2015-07-01  3:09 ` Michael R. Hines
  15 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-25 14:24 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> You can get the patch here:
> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2

Do you have a branch with this code ontop of the main colo work?

Dave

> 
> Changs Log:
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Wen Congyang (13):
>   docs: block replication's description
>   quorum: allow ignoring child errors
>   NBD client: connect to nbd server later
>   Add new block driver interfaces to control block replication
>   quorum: implement block driver interfaces for block replication
>   NBD client: implement block driver interfaces for block replication
>   allow writing to the backing file
>   Allow creating backup jobs when opening BDS
>   block: Parse "backing_reference" option to reference existing BDS
>   Backup: clear all bitmap when doing block checkpoint
>   qcow2: support colo
>   skip nbd_target when starting block replication
>   Don't allow a disk use backing reference target
> 
>  block.c                    | 242 +++++++++++++++++++++++-
>  block/Makefile.objs        |   2 +-
>  block/backup.c             |  12 ++
>  block/nbd.c                | 171 +++++++++++++++--
>  block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>  block/qcow2.h              |   6 +
>  block/quorum.c             | 143 ++++++++++++++-
>  docs/block-replication.txt | 147 +++++++++++++++
>  include/block/block.h      |   5 +
>  include/block/block_int.h  |  13 ++
>  include/qemu/hbitmap.h     |   8 +
>  qapi/block.json            |  16 ++
>  tests/qemu-iotests/051     |  13 ++
>  tests/qemu-iotests/051.out |  13 ++
>  util/hbitmap.c             |  19 ++
>  15 files changed, 1230 insertions(+), 27 deletions(-)
>  create mode 100644 docs/block-replication.txt
> 
> -- 
> 2.1.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
@ 2015-03-25 15:38   ` Eric Blake
  2015-03-26  8:58     ` Wen Congyang
  2015-03-26  6:31   ` [Qemu-devel] " Fam Zheng
  1 sibling, 1 reply; 55+ messages in thread
From: Eric Blake @ 2015-03-25 15:38 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 5305 bytes --]

On 03/25/2015 03:36 AM, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 docs/block-replication.txt
> 

Grammar review only (I'll leave the technical review to others)

> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
> new file mode 100644
> index 0000000..874ed8e
> --- /dev/null
> +++ b/docs/block-replication.txt
> @@ -0,0 +1,147 @@
> +Block replication
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2015
> +Copyright (c) 2015 Intel Corporation
> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.

Space after comma in English writing.

> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +The block replication is used for continuous checkpoints. It is designed

Sounds better as either of:
The block replication feature is...
Block replication is...

> +for COLO that Secondary VM is running. It can also be applied for FT/HA

Please define COLO and FT/HA on first use (okay to abbreviate elsewhere
in the document, but the first use should not assume the acronym is
well-known)

s/for COLO that/for COLO (COurse-grain LOck-stepping), where the/

> +scene that Secondary VM is not running.

s/for FT/HA scene that/for the FT/HA (Fault-tolerance/High Assurance)
scenario, where the/

> +
> +This document gives an overview of block replication's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is

s/checkpoint/checkpoints/

> +identical right after a VM checkpoint, but becomes different as the VM
...
> +
> +4) The hidden-disk is created automatically. It buffers the original content
> +that is modified by the primary VM. It should also be an empty disk, and
> +the dirver supports bdrv_make_empty().

s/dirver/driver/

> +
> +== New block driver interface ==
> +We add three block driver interfaces to control block replication:
> +a. bdrv_start_replication()
> +   Start block replication, called in migration/checkpoint thread.
> +   We must call bdrv_start_replication() in secondary QEMU before
> +   calling bdrv_start_replication() in primary QEMU.
> +b. bdrv_do_checkpoint()
> +   This interface is called after all VM state is transfered to

s/transfered/transferred/

> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
> +   thread.
> +c. bdrv_stop_replication()
> +   It is called when failover. We will flush the Disk buffer into

s/when/on/

> +   Secondary Disk and stop block replication. The vm should be stopped
> +   before calling it. The caller must hold the I/O mutex lock if it is
> +   in migration/checkpoint thread.
> +
> +== Usage ==
> +Primary:
> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
> +         children.0.file.filename=1.raw,\
> +         children.0.driver=raw,\
> +         children.1.file.driver=nbd+colo,\
> +         children.1.file.host=xxx,\
> +         children.1.file.port=xxx,\
> +         children.1.file.export=xxx,\
> +         children.1.driver=raw,\
> +         children.1.ignore-errors=on

This command line looks like multiple arguments because of the leading
whitespace on succeeding lines.  I don't know if there is any better way
to format it, though, to make it obvious that it is all a single
argument to -drive.

> +  Note:
> +  1. NBD Client should not be the first child of quorum.
> +  2. There should be only one NBD Client.
> +  3. host is the secondary physical machine's hostname or IP
> +  4. Each disk must have its own export name.

Maybe a note 5 to call out the formatting aspect of the command line?

> +
> +Secondary:
> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
> +         backing_reference.drive_id=nbd_target1,\
> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
> +         backing_reference.hidden-disk.driver=qcow2,\
> +         backing_reference.hidden-disk.allow-write-backing-file=on
> +  Then run qmp command:
> +    nbd_server_start host:port
> +  Note:
> +  1. The export name for the same disk must be the same in primary
> +     and secondary QEMU command line
> +  2. The qmp command nbd_server_start must be run before running the
> +     qmp command migrate on primary QEMU
> +  3. Don't use nbd_server_start's other options
> +  4. Active disk, hidden disk and nbd target's length should be the
> +     same.
> +  5. It is better to put active disk and hidden disk in ramdisk.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication
  2015-03-25 12:48   ` Paolo Bonzini
@ 2015-03-25 15:43     ` Eric Blake
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Blake @ 2015-03-25 15:43 UTC (permalink / raw)
  To: Paolo Bonzini, Wen Congyang, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Luiz Capitulino, Gonglei,
	Stefan Hajnoczi, Yang Hongyang, Michael Roth, zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On 03/25/2015 06:48 AM, Paolo Bonzini wrote:

>>  ##
>> +# @COLOMode
>> +#
>> +# An enumeration of COLO mode.

s/mode/modes/

>> +#
>> +# @unprotected: COLO is not started or after failover
>> +#
>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.

Inconsistent on whether you end with '.'

>> +#
>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>> +#
>> +# Since: 2.4
>> +##
>> +{ 'enum' : 'COLOMode',
>> +  'data' : ['unprotected', 'primary', 'secondary']}
> 
> Perhaps replace COLOMode with ReplicationMode or FaultToleranceMode?

If you keep the name COLOMode, it would also be nice to spell out
'COurse-grain LOck-stepping' at least once somewhere in the .json file
(probably in the description of this enum).  A name like ReplicationMode
is a bit more self-documenting (I know what replication is, I have to
look up what COLO is).


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint
  2015-03-25 12:55   ` Paolo Bonzini
@ 2015-03-26  0:59     ` Wen Congyang
  0 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  0:59 UTC (permalink / raw)
  To: Paolo Bonzini, qemu devel, Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jeff Cody, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Yang Hongyang, zhanghailiang

On 03/25/2015 08:55 PM, Paolo Bonzini wrote:
> 
> 
> On 25/03/2015 10:36, Wen Congyang wrote:
>>
>> +void backup_do_checkpoint(BlockJob *job, Error **errp)
>> +{
>> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
>> +
>> +    if (job->driver != &backup_job_driver) {
>> +        error_setg(errp, "It is not backup job");
>> +        return;
>> +    }
>> +
>> +    hbitmap_reset_all(backup_job->bitmap);
>> +}
> 
> Please add instead a block_job_do_checkpoint API, and a do_checkpoint
> function pointer to BlockJobDriver.
> 
>> +{
>> +#if 0
>> +    hbitmap_reset(hb, 0, hb->size << hb->granularity);
>> +#else
>> +    uint64_t size = hb->size;
>> +    unsigned int i;
>> +
>> +    /* Same as hbitmap_alloc() except memset() */
>> +    for (i = HBITMAP_LEVELS; i-- > 0; ) {
> 
> This can be "--i >= 1"...
> 
>> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>> +        memset(hb->levels[i], 0, size * sizeof(unsigned long));
>> +    }
>> +
>> +    assert(size == 1);
>> +    hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
> 
> ... if you use "=" instead of "|=" here.
> 
>> +#endif
>> +}
> 
> Please pick one implementation (no #if), and also add a testcase to
> tests/test-hbitmap.c.

OK

Thanks
Wen Congyang

> 
> Paolo
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-03-25 14:24 ` Dr. David Alan Gilbert
@ 2015-03-26  2:34   ` Gonglei
  0 siblings, 0 replies; 55+ messages in thread
From: Gonglei @ 2015-03-26  2:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang

On 2015/3/25 22:24, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> Block replication is a very important feature which is used for
>> continuous checkpoints(for example: COLO).
>>
>> Usage:
>> Please refer to docs/block-replication.txt
>>
>> You can get the patch here:
>> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
> 
> Do you have a branch with this code on top of the main colo work?
> 
A new version of main COLO framework bases on this block patch
series will be posted soon. :)

Regards,
-Gonglei

> Dave
> 
>>
>> Changs Log:
>> V2:
>> 1. Redesign the secondary qemu(use image-fleecing)
>> 2. Use Error objects to return error message
>> 3. Address the comments from Max Reitz and Eric Blake
>>
>> Wen Congyang (13):
>>   docs: block replication's description
>>   quorum: allow ignoring child errors
>>   NBD client: connect to nbd server later
>>   Add new block driver interfaces to control block replication
>>   quorum: implement block driver interfaces for block replication
>>   NBD client: implement block driver interfaces for block replication
>>   allow writing to the backing file
>>   Allow creating backup jobs when opening BDS
>>   block: Parse "backing_reference" option to reference existing BDS
>>   Backup: clear all bitmap when doing block checkpoint
>>   qcow2: support colo
>>   skip nbd_target when starting block replication
>>   Don't allow a disk use backing reference target
>>
>>  block.c                    | 242 +++++++++++++++++++++++-
>>  block/Makefile.objs        |   2 +-
>>  block/backup.c             |  12 ++
>>  block/nbd.c                | 171 +++++++++++++++--
>>  block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>>  block/qcow2.h              |   6 +
>>  block/quorum.c             | 143 ++++++++++++++-
>>  docs/block-replication.txt | 147 +++++++++++++++
>>  include/block/block.h      |   5 +
>>  include/block/block_int.h  |  13 ++
>>  include/qemu/hbitmap.h     |   8 +
>>  qapi/block.json            |  16 ++
>>  tests/qemu-iotests/051     |  13 ++
>>  tests/qemu-iotests/051.out |  13 ++
>>  util/hbitmap.c             |  19 ++
>>  15 files changed, 1230 insertions(+), 27 deletions(-)
>>  create mode 100644 docs/block-replication.txt
>>
>> -- 
>> 2.1.0
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
  2015-03-25 15:38   ` [Qemu-devel] [Qemu-block] " Eric Blake
@ 2015-03-26  6:31   ` Fam Zheng
  2015-03-26  7:17     ` Wen Congyang
  2015-04-03  2:35     ` Wen Congyang
  1 sibling, 2 replies; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  6:31 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Wed, 03/25 17:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 docs/block-replication.txt
> 
> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
> new file mode 100644
> index 0000000..874ed8e
> --- /dev/null
> +++ b/docs/block-replication.txt
> @@ -0,0 +1,147 @@
> +Block replication
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2015
> +Copyright (c) 2015 Intel Corporation
> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +The block replication is used for continuous checkpoints. It is designed
> +for COLO that Secondary VM is running. It can also be applied for FT/HA
> +scene that Secondary VM is not running.
> +
> +This document gives an overview of block replication's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
> +identical right after a VM checkpoint, but becomes different as the VM
> +executes till the next checkpoint. To support disk contents checkpoint,
> +the modified disk contents in the Secondary VM must be buffered, and are
> +only dropped at next checkpoint time. To reduce the network transportation
> +effort at the time of checkpoint, the disk modification operations of
> +Primary disk are asynchronously forwarded to the Secondary node.
> +
> +== Workflow ==
> +The following is the image of block replication workflow:
> +
> +        +----------------------+            +------------------------+
> +        |Primary Write Requests|            |Secondary Write Requests|
> +        +----------------------+            +------------------------+
> +                  |                                       |
> +                  |                                      (4)
> +                  |                                       V
> +                  |                              /-------------\
> +                  |      Copy and Forward        |             |
> +                  |---------(1)----------+       | Disk Buffer |
> +                  |                      |       |             |
> +                  |                     (3)      \-------------/
> +                  |                 speculative      ^
> +                  |                write through    (2)
> +                  |                      |           |
> +                  V                      V           |
> +           +--------------+           +----------------+
> +           | Primary Disk |           | Secondary Disk |
> +           +--------------+           +----------------+
> +
> +    1) Primary write requests will be copied and forwarded to Secondary
> +       QEMU.
> +    2) Before Primary write requests are written to Secondary disk, the
> +       original sector content will be read from Secondary disk and
> +       buffered in the Disk buffer, but it will not overwrite the existing
> +       sector content in the Disk buffer.

Could you elaborate a bit about the "existing sector content" here? IIUC, it
could be from either "Secondary Write Requests" or previous COW of "Primary
Write Requests". Is that right?

> +    3) Primary write requests will be written to Secondary disk.
> +    4) Secondary write requests will be buffered in the Disk buffer and it
> +       will overwrite the existing sector content in the buffer.
> +
> +== Architecture ==
> +We are going to implement COLO block replication from many basic
> +blocks that are already in QEMU.
> +
> +         virtio-blk       ||
> +             ^            ||                            .----------
> +             |            ||                            | Secondary
> +        1 Quorum          ||                            '----------
> +         /      \         ||
> +        /        \        ||
> +   Primary      2 NBD  ------->  2 NBD
> +     disk       client    ||     server                                         virtio-blk
> +                          ||        ^                                                ^
> +--------.                 ||        |                                                |
> +Primary |                 ||  Secondary disk <--------- hidden-disk 4 <--------- active-disk 3
> +--------'                 ||        |          backing        ^       backing
> +                          ||        |                         |
> +                          ||        |                         |
> +                          ||        '-------------------------'
> +                          ||           drive-backup sync=none
> +
> +1) The disk on the primary is represented by a block device with two
> +children, providing replication between a primary disk and the host that
> +runs the secondary VM. The read pattern for quorum can be extended to
> +make the primary always read from the local disk instead of going through
> +NBD.
> +
> +2) The secondary disk receives writes from the primary VM through QEMU's
> +embedded NBD server (speculative write-through).
> +
> +3) The disk on the secondary is represented by a custom block device
> +(called active-disk). It should be an empty disk, and the format should
> +be qcow2.
> +
> +4) The hidden-disk is created automatically. It buffers the original content
> +that is modified by the primary VM. It should also be an empty disk, and
> +the dirver supports bdrv_make_empty().
> +
> +== New block driver interface ==
> +We add three block driver interfaces to control block replication:
> +a. bdrv_start_replication()
> +   Start block replication, called in migration/checkpoint thread.
> +   We must call bdrv_start_replication() in secondary QEMU before
> +   calling bdrv_start_replication() in primary QEMU.
> +b. bdrv_do_checkpoint()
> +   This interface is called after all VM state is transfered to
> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
> +   thread.
> +c. bdrv_stop_replication()
> +   It is called when failover. We will flush the Disk buffer into
> +   Secondary Disk and stop block replication. The vm should be stopped
> +   before calling it. The caller must hold the I/O mutex lock if it is
> +   in migration/checkpoint thread.
> +
> +== Usage ==
> +Primary:
> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
> +         children.0.file.filename=1.raw,\
> +         children.0.driver=raw,\
> +         children.1.file.driver=nbd+colo,\
> +         children.1.file.host=xxx,\
> +         children.1.file.port=xxx,\
> +         children.1.file.export=xxx,\
> +         children.1.driver=raw,\
> +         children.1.ignore-errors=on
   ^^^^^^^^^
Won't the leading spaces cause trouble? :)

> +  Note:
> +  1. NBD Client should not be the first child of quorum.
> +  2. There should be only one NBD Client.
> +  3. host is the secondary physical machine's hostname or IP
> +  4. Each disk must have its own export name.
> +
> +Secondary:
> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
> +         backing_reference.drive_id=nbd_target1,\
> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
> +         backing_reference.hidden-disk.driver=qcow2,\
> +         backing_reference.hidden-disk.allow-write-backing-file=on
> +  Then run qmp command:
> +    nbd_server_start host:port

s/nbd_server_start/nbd-server-start

For a few more too.


> +  Note:
> +  1. The export name for the same disk must be the same in primary
> +     and secondary QEMU command line
> +  2. The qmp command nbd_server_start must be run before running the
> +     qmp command migrate on primary QEMU
> +  3. Don't use nbd_server_start's other options
> +  4. Active disk, hidden disk and nbd target's length should be the
> +     same.
> +  5. It is better to put active disk and hidden disk in ramdisk.

Fam

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication Wen Congyang
@ 2015-03-26  7:03   ` Fam Zheng
  2015-03-26  7:15     ` Wen Congyang
  0 siblings, 1 reply; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  7:03 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Wed, 03/25 17:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/block.c b/block.c
> index bd7fa9c..3af5ad4 100644
> --- a/block.c
> +++ b/block.c
> @@ -6368,6 +6368,12 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>  void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
>  {
>      BlockDriver *drv = bs->drv;
> +    Error *local_err = NULL;
> +
> +    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
> +        error_free(local_err);

local_err is set but not used, just pass NULL. Same below.

> +        return;
> +    }
>  
>      if (drv && drv->bdrv_start_replication) {
>          drv->bdrv_start_replication(bs, mode, errp);
> @@ -6381,6 +6387,12 @@ void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
>  void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
>  {
>      BlockDriver *drv = bs->drv;
> +    Error *local_err = NULL;
> +
> +    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
> +        error_free(local_err);
> +        return;
> +    }
>  
>      if (drv && drv->bdrv_do_checkpoint) {
>          drv->bdrv_do_checkpoint(bs, errp);
> @@ -6394,6 +6406,12 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
>  void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
>  {
>      BlockDriver *drv = bs->drv;
> +    Error *local_err = NULL;
> +
> +    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
> +        error_free(local_err);
> +        return;
> +    }
>  
>      if (drv && drv->bdrv_stop_replication) {
>          drv->bdrv_stop_replication(bs, errp);
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS Wen Congyang
@ 2015-03-26  7:07   ` Fam Zheng
  2015-03-26  7:14     ` Wen Congyang
  0 siblings, 1 reply; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  7:07 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Wed, 03/25 17:36, Wen Congyang wrote:
> When opening BDS, we need to create backup jobs for
> image-fleecing.

How does this commit message relate to the Makefile change?

> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Jeff Cody <jcody@redhat.com>
> ---
>  block/Makefile.objs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index db2933e..0950973 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>  block-obj-$(CONFIG_LIBSSH2) += ssh.o
>  block-obj-y += accounting.o
>  block-obj-y += write-threshold.o
> +block-obj-y += backup.o
>  
>  common-obj-y += stream.o
>  common-obj-y += commit.o
> -common-obj-y += backup.o
>  
>  iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
>  iscsi.o-libs       := $(LIBISCSI_LIBS)
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication Wen Congyang
  2015-03-25 12:48   ` Paolo Bonzini
@ 2015-03-26  7:12   ` Fam Zheng
  2015-03-26  7:22     ` Wen Congyang
  1 sibling, 1 reply; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  7:12 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Luiz Capitulino, Paolo Bonzini,
	Max Reitz, Michael Roth, zhanghailiang

On Wed, 03/25 17:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  block.c                   | 39 +++++++++++++++++++++++++++++++++++++++
>  include/block/block.h     |  4 ++++
>  include/block/block_int.h | 11 +++++++++++
>  qapi/block.json           | 16 ++++++++++++++++
>  4 files changed, 70 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 0fe97de..0ff5cf8 100644
> --- a/block.c
> +++ b/block.c
> @@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>  {
>      return &bs->stats;
>  }
> +
> +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_start_replication) {
> +        drv->bdrv_start_replication(bs, mode, errp);
> +    } else if (bs->file) {
> +        bdrv_start_replication(bs->file, mode, errp);
> +    } else {
> +        error_set(errp, QERR_UNSUPPORTED);

I think we should use error_setg in new code? (The same to following ones)

> +    }
> +}
> +
> +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_do_checkpoint) {
> +        drv->bdrv_do_checkpoint(bs, errp);
> +    } else if (bs->file) {
> +        bdrv_do_checkpoint(bs->file, errp);
> +    } else {
> +        error_set(errp, QERR_UNSUPPORTED);
> +    }
> +}
> +
> +void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_stop_replication) {
> +        drv->bdrv_stop_replication(bs, errp);
> +    } else if (bs->file) {
> +        bdrv_stop_replication(bs->file, errp);
> +    } else {
> +        error_set(errp, QERR_UNSUPPORTED);
> +    }
> +}
> diff --git a/include/block/block.h b/include/block/block.h
> index 4c57d63..68f3b1a 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>  
>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>  
> +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp);
> +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
> +void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
> +
>  #endif
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index dccb092..08dd8ba 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -290,6 +290,17 @@ struct BlockDriver {
>       */
>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>  
> +
> +    void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
> +                                   Error **errp);

Need some documentation, but I have a generic question:

Why is a single interface with modes better than different functions for each
mode (bdrv_start_replication_{primary,secondary}? Asking because the behavior
is very different between them, and I don't see much sharing -- you implement
primary operation in quorum, and secondary in qcow2+colo.

> +    /* Drop Disk buffer when doing checkpoint. */
> +    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
> +    /*
> +     * After failover, we should flush Disk buffer into secondary disk
> +     * and stop block replication.
> +     */
> +    void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
> +
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> diff --git a/qapi/block.json b/qapi/block.json
> index e313465..e640566 100644
> --- a/qapi/block.json
> +++ b/qapi/block.json
> @@ -40,6 +40,22 @@
>    'data': ['auto', 'none', 'lba', 'large', 'rechs']}
>  
>  ##
> +# @COLOMode
> +#
> +# An enumeration of COLO mode.
> +#
> +# @unprotected: COLO is not started or after failover
> +#
> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
> +#
> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
> +#
> +# Since: 2.4
> +##
> +{ 'enum' : 'COLOMode',
> +  'data' : ['unprotected', 'primary', 'secondary']}

If split bdrv_start_replication, do we still need an enum? I can't find the
usage in QMP interface, is it in some other series?

Fam

> +
> +##
>  # @BlockdevSnapshotInternal
>  #
>  # @device: the name of the device to generate the snapshot from
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS
  2015-03-26  7:07   ` Fam Zheng
@ 2015-03-26  7:14     ` Wen Congyang
  2015-03-26  7:18       ` Fam Zheng
  0 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  7:14 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/26/2015 03:07 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> When opening BDS, we need to create backup jobs for
>> image-fleecing.
> 
> How does this commit message relate to the Makefile change?

Hmm, we need to use backup API in block.c, and block.o will
be used by qemu-img which doesn't use common-obj.

Thanks
Wen Congyang

> 
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Jeff Cody <jcody@redhat.com>
>> ---
>>  block/Makefile.objs | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index db2933e..0950973 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>>  block-obj-$(CONFIG_LIBSSH2) += ssh.o
>>  block-obj-y += accounting.o
>>  block-obj-y += write-threshold.o
>> +block-obj-y += backup.o
>>  
>>  common-obj-y += stream.o
>>  common-obj-y += commit.o
>> -common-obj-y += backup.o
>>  
>>  iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
>>  iscsi.o-libs       := $(LIBISCSI_LIBS)
>> -- 
>> 2.1.0
>>
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication
  2015-03-26  7:03   ` Fam Zheng
@ 2015-03-26  7:15     ` Wen Congyang
  0 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  7:15 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/26/2015 03:03 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  block.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index bd7fa9c..3af5ad4 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6368,6 +6368,12 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>>  void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
>>  {
>>      BlockDriver *drv = bs->drv;
>> +    Error *local_err = NULL;
>> +
>> +    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
>> +        error_free(local_err);
> 
> local_err is set but not used, just pass NULL. Same below.

Yes, will fix it in the next version.

Thanks
Wen Congyang

> 
>> +        return;
>> +    }
>>  
>>      if (drv && drv->bdrv_start_replication) {
>>          drv->bdrv_start_replication(bs, mode, errp);
>> @@ -6381,6 +6387,12 @@ void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
>>  void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
>>  {
>>      BlockDriver *drv = bs->drv;
>> +    Error *local_err = NULL;
>> +
>> +    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
>> +        error_free(local_err);
>> +        return;
>> +    }
>>  
>>      if (drv && drv->bdrv_do_checkpoint) {
>>          drv->bdrv_do_checkpoint(bs, errp);
>> @@ -6394,6 +6406,12 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
>>  void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
>>  {
>>      BlockDriver *drv = bs->drv;
>> +    Error *local_err = NULL;
>> +
>> +    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, &local_err)) {
>> +        error_free(local_err);
>> +        return;
>> +    }
>>  
>>      if (drv && drv->bdrv_stop_replication) {
>>          drv->bdrv_stop_replication(bs, errp);
>> -- 
>> 2.1.0
>>
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-26  6:31   ` [Qemu-devel] " Fam Zheng
@ 2015-03-26  7:17     ` Wen Congyang
  2015-04-03  2:35     ` Wen Congyang
  1 sibling, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  7:17 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/26/2015 02:31 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 147 insertions(+)
>>  create mode 100644 docs/block-replication.txt
>>
>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>> new file mode 100644
>> index 0000000..874ed8e
>> --- /dev/null
>> +++ b/docs/block-replication.txt
>> @@ -0,0 +1,147 @@
>> +Block replication
>> +----------------------------------------
>> +Copyright Fujitsu, Corp. 2015
>> +Copyright (c) 2015 Intel Corporation
>> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>> +
>> +This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +See the COPYING file in the top-level directory.
>> +
>> +The block replication is used for continuous checkpoints. It is designed
>> +for COLO that Secondary VM is running. It can also be applied for FT/HA
>> +scene that Secondary VM is not running.
>> +
>> +This document gives an overview of block replication's design.
>> +
>> +== Background ==
>> +High availability solutions such as micro checkpoint and COLO will do
>> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
>> +identical right after a VM checkpoint, but becomes different as the VM
>> +executes till the next checkpoint. To support disk contents checkpoint,
>> +the modified disk contents in the Secondary VM must be buffered, and are
>> +only dropped at next checkpoint time. To reduce the network transportation
>> +effort at the time of checkpoint, the disk modification operations of
>> +Primary disk are asynchronously forwarded to the Secondary node.
>> +
>> +== Workflow ==
>> +The following is the image of block replication workflow:
>> +
>> +        +----------------------+            +------------------------+
>> +        |Primary Write Requests|            |Secondary Write Requests|
>> +        +----------------------+            +------------------------+
>> +                  |                                       |
>> +                  |                                      (4)
>> +                  |                                       V
>> +                  |                              /-------------\
>> +                  |      Copy and Forward        |             |
>> +                  |---------(1)----------+       | Disk Buffer |
>> +                  |                      |       |             |
>> +                  |                     (3)      \-------------/
>> +                  |                 speculative      ^
>> +                  |                write through    (2)
>> +                  |                      |           |
>> +                  V                      V           |
>> +           +--------------+           +----------------+
>> +           | Primary Disk |           | Secondary Disk |
>> +           +--------------+           +----------------+
>> +
>> +    1) Primary write requests will be copied and forwarded to Secondary
>> +       QEMU.
>> +    2) Before Primary write requests are written to Secondary disk, the
>> +       original sector content will be read from Secondary disk and
>> +       buffered in the Disk buffer, but it will not overwrite the existing
>> +       sector content in the Disk buffer.
> 
> Could you elaborate a bit about the "existing sector content" here? IIUC, it
> could be from either "Secondary Write Requests" or previous COW of "Primary
> Write Requests". Is that right?

Yes.

> 
>> +    3) Primary write requests will be written to Secondary disk.
>> +    4) Secondary write requests will be buffered in the Disk buffer and it
>> +       will overwrite the existing sector content in the buffer.
>> +
>> +== Architecture ==
>> +We are going to implement COLO block replication from many basic
>> +blocks that are already in QEMU.
>> +
>> +         virtio-blk       ||
>> +             ^            ||                            .----------
>> +             |            ||                            | Secondary
>> +        1 Quorum          ||                            '----------
>> +         /      \         ||
>> +        /        \        ||
>> +   Primary      2 NBD  ------->  2 NBD
>> +     disk       client    ||     server                                         virtio-blk
>> +                          ||        ^                                                ^
>> +--------.                 ||        |                                                |
>> +Primary |                 ||  Secondary disk <--------- hidden-disk 4 <--------- active-disk 3
>> +--------'                 ||        |          backing        ^       backing
>> +                          ||        |                         |
>> +                          ||        |                         |
>> +                          ||        '-------------------------'
>> +                          ||           drive-backup sync=none
>> +
>> +1) The disk on the primary is represented by a block device with two
>> +children, providing replication between a primary disk and the host that
>> +runs the secondary VM. The read pattern for quorum can be extended to
>> +make the primary always read from the local disk instead of going through
>> +NBD.
>> +
>> +2) The secondary disk receives writes from the primary VM through QEMU's
>> +embedded NBD server (speculative write-through).
>> +
>> +3) The disk on the secondary is represented by a custom block device
>> +(called active-disk). It should be an empty disk, and the format should
>> +be qcow2.
>> +
>> +4) The hidden-disk is created automatically. It buffers the original content
>> +that is modified by the primary VM. It should also be an empty disk, and
>> +the dirver supports bdrv_make_empty().
>> +
>> +== New block driver interface ==
>> +We add three block driver interfaces to control block replication:
>> +a. bdrv_start_replication()
>> +   Start block replication, called in migration/checkpoint thread.
>> +   We must call bdrv_start_replication() in secondary QEMU before
>> +   calling bdrv_start_replication() in primary QEMU.
>> +b. bdrv_do_checkpoint()
>> +   This interface is called after all VM state is transfered to
>> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
>> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
>> +   thread.
>> +c. bdrv_stop_replication()
>> +   It is called when failover. We will flush the Disk buffer into
>> +   Secondary Disk and stop block replication. The vm should be stopped
>> +   before calling it. The caller must hold the I/O mutex lock if it is
>> +   in migration/checkpoint thread.
>> +
>> +== Usage ==
>> +Primary:
>> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
>> +         children.0.file.filename=1.raw,\
>> +         children.0.driver=raw,\
>> +         children.1.file.driver=nbd+colo,\
>> +         children.1.file.host=xxx,\
>> +         children.1.file.port=xxx,\
>> +         children.1.file.export=xxx,\
>> +         children.1.driver=raw,\
>> +         children.1.ignore-errors=on
>    ^^^^^^^^^
> Won't the leading spaces cause trouble? :)

Yes

> 
>> +  Note:
>> +  1. NBD Client should not be the first child of quorum.
>> +  2. There should be only one NBD Client.
>> +  3. host is the secondary physical machine's hostname or IP
>> +  4. Each disk must have its own export name.
>> +
>> +Secondary:
>> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
>> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
>> +         backing_reference.drive_id=nbd_target1,\
>> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
>> +         backing_reference.hidden-disk.driver=qcow2,\
>> +         backing_reference.hidden-disk.allow-write-backing-file=on
>> +  Then run qmp command:
>> +    nbd_server_start host:port
> 
> s/nbd_server_start/nbd-server-start
> 
> For a few more too.

All will be fixed in the next version

Thanks
Wen Congyang

> 
> 
>> +  Note:
>> +  1. The export name for the same disk must be the same in primary
>> +     and secondary QEMU command line
>> +  2. The qmp command nbd_server_start must be run before running the
>> +     qmp command migrate on primary QEMU
>> +  3. Don't use nbd_server_start's other options
>> +  4. Active disk, hidden disk and nbd target's length should be the
>> +     same.
>> +  5. It is better to put active disk and hidden disk in ramdisk.
> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS
  2015-03-26  7:14     ` Wen Congyang
@ 2015-03-26  7:18       ` Fam Zheng
  2015-03-26  7:23         ` Wen Congyang
  0 siblings, 1 reply; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  7:18 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Thu, 03/26 15:14, Wen Congyang wrote:
> On 03/26/2015 03:07 PM, Fam Zheng wrote:
> > On Wed, 03/25 17:36, Wen Congyang wrote:
> >> When opening BDS, we need to create backup jobs for
> >> image-fleecing.
> > 
> > How does this commit message relate to the Makefile change?
> 
> Hmm, we need to use backup API in block.c, and block.o will
> be used by qemu-img which doesn't use common-obj.

I see. How about adding the referenced functions to stubs/?

Fam

> 
> Thanks
> Wen Congyang
> 
> > 
> >>
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >> Cc: Jeff Cody <jcody@redhat.com>
> >> ---
> >>  block/Makefile.objs | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/Makefile.objs b/block/Makefile.objs
> >> index db2933e..0950973 100644
> >> --- a/block/Makefile.objs
> >> +++ b/block/Makefile.objs
> >> @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
> >>  block-obj-$(CONFIG_LIBSSH2) += ssh.o
> >>  block-obj-y += accounting.o
> >>  block-obj-y += write-threshold.o
> >> +block-obj-y += backup.o
> >>  
> >>  common-obj-y += stream.o
> >>  common-obj-y += commit.o
> >> -common-obj-y += backup.o
> >>  
> >>  iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
> >>  iscsi.o-libs       := $(LIBISCSI_LIBS)
> >> -- 
> >> 2.1.0
> >>
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: " Wen Congyang
  2015-03-25 12:50   ` Paolo Bonzini
@ 2015-03-26  7:21   ` Fam Zheng
  2015-03-26  7:32     ` Wen Congyang
  1 sibling, 1 reply; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  7:21 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Wed, 03/25 17:36, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 3faf865..753b322 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
>      bs->full_open_options = opts;
>  }
>  
> +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
> +                                  Error **errp)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +
> +    /*
> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
> +     * QEMU becoming primary QEMU.
> +     */
> +    if (mode != COLO_MODE_PRIMARY) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");

Please use error_setg. (Please grep the whole series :)

Fam

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication
  2015-03-26  7:12   ` Fam Zheng
@ 2015-03-26  7:22     ` Wen Congyang
  0 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  7:22 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Luiz Capitulino, Paolo Bonzini,
	Max Reitz, Michael Roth, zhanghailiang

On 03/26/2015 03:12 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Luiz Capitulino <lcapitulino@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> ---
>>  block.c                   | 39 +++++++++++++++++++++++++++++++++++++++
>>  include/block/block.h     |  4 ++++
>>  include/block/block_int.h | 11 +++++++++++
>>  qapi/block.json           | 16 ++++++++++++++++
>>  4 files changed, 70 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 0fe97de..0ff5cf8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>>  {
>>      return &bs->stats;
>>  }
>> +
>> +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +
>> +    if (drv && drv->bdrv_start_replication) {
>> +        drv->bdrv_start_replication(bs, mode, errp);
>> +    } else if (bs->file) {
>> +        bdrv_start_replication(bs->file, mode, errp);
>> +    } else {
>> +        error_set(errp, QERR_UNSUPPORTED);
> 
> I think we should use error_setg in new code? (The same to following ones)

Hmm, do you mean that don't use QERR_UNSUPPORTED here?

> 
>> +    }
>> +}
>> +
>> +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +
>> +    if (drv && drv->bdrv_do_checkpoint) {
>> +        drv->bdrv_do_checkpoint(bs, errp);
>> +    } else if (bs->file) {
>> +        bdrv_do_checkpoint(bs->file, errp);
>> +    } else {
>> +        error_set(errp, QERR_UNSUPPORTED);
>> +    }
>> +}
>> +
>> +void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +
>> +    if (drv && drv->bdrv_stop_replication) {
>> +        drv->bdrv_stop_replication(bs, errp);
>> +    } else if (bs->file) {
>> +        bdrv_stop_replication(bs->file, errp);
>> +    } else {
>> +        error_set(errp, QERR_UNSUPPORTED);
>> +    }
>> +}
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 4c57d63..68f3b1a 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>>  
>>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>>  
>> +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error **errp);
>> +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
>> +void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
>> +
>>  #endif
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index dccb092..08dd8ba 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -290,6 +290,17 @@ struct BlockDriver {
>>       */
>>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>>  
>> +
>> +    void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
>> +                                   Error **errp);
> 
> Need some documentation, but I have a generic question:
> 
> Why is a single interface with modes better than different functions for each
> mode (bdrv_start_replication_{primary,secondary}? Asking because the behavior
> is very different between them, and I don't see much sharing -- you implement
> primary operation in quorum, and secondary in qcow2+colo.

No special reason.

> 
>> +    /* Drop Disk buffer when doing checkpoint. */
>> +    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
>> +    /*
>> +     * After failover, we should flush Disk buffer into secondary disk
>> +     * and stop block replication.
>> +     */
>> +    void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
>> +
>>      QLIST_ENTRY(BlockDriver) list;
>>  };
>>  
>> diff --git a/qapi/block.json b/qapi/block.json
>> index e313465..e640566 100644
>> --- a/qapi/block.json
>> +++ b/qapi/block.json
>> @@ -40,6 +40,22 @@
>>    'data': ['auto', 'none', 'lba', 'large', 'rechs']}
>>  
>>  ##
>> +# @COLOMode
>> +#
>> +# An enumeration of COLO mode.
>> +#
>> +# @unprotected: COLO is not started or after failover
>> +#
>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>> +#
>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>> +#
>> +# Since: 2.4
>> +##
>> +{ 'enum' : 'COLOMode',
>> +  'data' : ['unprotected', 'primary', 'secondary']}
> 
> If split bdrv_start_replication, do we still need an enum? I can't find the
> usage in QMP interface, is it in some other series?

I will check it.

Thanks
Wen Congyang

> 
> Fam
> 
>> +
>> +##
>>  # @BlockdevSnapshotInternal
>>  #
>>  # @device: the name of the device to generate the snapshot from
>> -- 
>> 2.1.0
>>
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS
  2015-03-26  7:18       ` Fam Zheng
@ 2015-03-26  7:23         ` Wen Congyang
  2015-03-26 13:53           ` Paolo Bonzini
  0 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  7:23 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/26/2015 03:18 PM, Fam Zheng wrote:
> On Thu, 03/26 15:14, Wen Congyang wrote:
>> On 03/26/2015 03:07 PM, Fam Zheng wrote:
>>> On Wed, 03/25 17:36, Wen Congyang wrote:
>>>> When opening BDS, we need to create backup jobs for
>>>> image-fleecing.
>>>
>>> How does this commit message relate to the Makefile change?
>>
>> Hmm, we need to use backup API in block.c, and block.o will
>> be used by qemu-img which doesn't use common-obj.
> 
> I see. How about adding the referenced functions to stubs/?

Good idea. I will try it.

Thanks
Wen Congyang

> 
> Fam
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>>> Cc: Jeff Cody <jcody@redhat.com>
>>>> ---
>>>>  block/Makefile.objs | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>>>> index db2933e..0950973 100644
>>>> --- a/block/Makefile.objs
>>>> +++ b/block/Makefile.objs
>>>> @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>>>>  block-obj-$(CONFIG_LIBSSH2) += ssh.o
>>>>  block-obj-y += accounting.o
>>>>  block-obj-y += write-threshold.o
>>>> +block-obj-y += backup.o
>>>>  
>>>>  common-obj-y += stream.o
>>>>  common-obj-y += commit.o
>>>> -common-obj-y += backup.o
>>>>  
>>>>  iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
>>>>  iscsi.o-libs       := $(LIBISCSI_LIBS)
>>>> -- 
>>>> 2.1.0
>>>>
>>> .
>>>
>>
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS
  2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
@ 2015-03-26  7:31   ` Fam Zheng
  0 siblings, 0 replies; 55+ messages in thread
From: Fam Zheng @ 2015-03-26  7:31 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Wed, 03/25 17:36, Wen Congyang wrote:
> Usage:
> -drive file=xxx,id=Y, \
> -drive file=xxxx,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*
> 
> It will create such backing chain:
>                {virtio-blk dev 'Y'}                                      {virtio-blk dev 'X'}
>                          |                                                          |
>                          |                                                          |
>                          v                                                          v
> 
>     [base] <- [mid] <- ( Y )  <----------------- (hidden target) <--------------- ( X )
> 
>                          v                              ^
>                          v                              ^
>                          v                              ^
>                          v                              ^
>                          >>>> drive-backup sync=none >>>>
> 
> X's backing file is hidden-disk, and hidden-disk's backing file is Y.
> Disk Y may be opened or reopened in read-write mode, so A block backup
> job is automatically created: source is Y and target is hidden-disk.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block.c                    | 145 ++++++++++++++++++++++++++++++++++++++++++++-
>  include/block/block.h      |   1 +
>  include/block/block_int.h  |   1 +
>  tests/qemu-iotests/051     |  13 ++++
>  tests/qemu-iotests/051.out |  13 ++++
>  5 files changed, 170 insertions(+), 3 deletions(-)
> 
> diff --git a/block.c b/block.c
> index b4d629e..bd7fa9c 100644
> --- a/block.c
> +++ b/block.c
> @@ -1351,6 +1351,113 @@ free_exit:
>      return ret;
>  }
>  
> +static void backing_reference_completed(void *opaque, int ret)
> +{
> +    BlockDriverState *hidden_disk = opaque;
> +
> +    assert(!hidden_disk->backing_reference);
> +}
> +
> +static int bdrv_open_backing_reference_file(BlockDriverState *bs,
> +                                            QDict *options, Error **errp)
> +{
> +    const char *backing_name;
> +    QDict *hidden_disk_options = NULL;
> +    BlockDriverState *backing_hd, *hidden_disk;
> +    BlockBackend *backing_blk;
> +    Error *local_err = NULL;
> +    int ret = 0;
> +
> +    backing_name = qdict_get_try_str(options, "drive_id");
> +    if (!backing_name) {
> +        error_setg(errp, "Backing reference needs option drive_id");
> +        ret = -EINVAL;
> +        goto free_exit;
> +    }
> +    qdict_del(options, "drive_id");
> +
> +    qdict_extract_subqdict(options, &hidden_disk_options, "hidden-disk.");
> +    if (!qdict_size(hidden_disk_options)) {
> +        error_setg(errp, "Backing reference needs option hidden-disk.*");
> +        ret = -EINVAL;
> +        goto free_exit;
> +    }
> +
> +    if (qdict_size(options)) {
> +        const QDictEntry *entry = qdict_first(options);
> +        error_setg(errp, "Backing reference used by '%s' doesn't support "
> +                   "the option '%s'", bdrv_get_device_name(bs), entry->key);
> +        ret = -EINVAL;
> +        goto free_exit;
> +    }
> +
> +    backing_blk = blk_by_name(backing_name);
> +    if (!backing_blk) {
> +        error_set(errp, QERR_DEVICE_NOT_FOUND, backing_name);
> +        ret = -ENOENT;
> +        goto free_exit;
> +    }
> +
> +    backing_hd = blk_bs(backing_blk);
> +    /* Backing reference itself? */
> +    if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
> +        error_setg(errp, "Backing reference itself");
> +        ret = -EINVAL;
> +        goto free_exit;
> +    }
> +
> +    if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
> +                           errp)) {
> +        ret = -EBUSY;
> +        goto free_exit;
> +    }
> +
> +    /* hidden-disk is bs's backing file */
> +    ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
> +    hidden_disk_options = NULL;
> +    if (ret < 0) {
> +        goto free_exit;
> +    }
> +
> +    hidden_disk = bs->backing_hd;
> +    if (!hidden_disk->drv || !hidden_disk->drv->supports_backing) {
> +        ret = -EINVAL;
> +        error_setg(errp, "Hidden disk's driver doesn't support backing files");
> +        goto free_exit;
> +    }
> +
> +    bdrv_set_backing_hd(hidden_disk, backing_hd);
> +    bdrv_ref(backing_hd);
> +
> +    /*
> +     * backing hd may be opened or reopened in read-write mode, so we
> +     * should backup backing hd to hidden disk
> +     */
> +    bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> +                    bs->backing_blocker);
> +    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +                    hidden_disk->backing_blocker);
> +
> +    bdrv_ref(hidden_disk);
> +    backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE,
> +                 BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
> +                 backing_reference_completed, hidden_disk, &local_err);

We need to be careful, the backing_hd may not be safe to access, depending on
which AioContext it is running on. At least, we should make sure bs,
hidden_disk and backing_hd are all on the same AioContext.

Here, before accessing backing_hd, we need to call aio_context_acquire, like in
qmp_drive_backup.

Fam

> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        bdrv_unref(hidden_disk);
> +        /* FIXME, use which errno? */
> +        ret = -EIO;
> +        goto free_exit;
> +    }
> +
> +    bs->backing_reference = true;
> +
> +free_exit:
> +    QDECREF(hidden_disk_options);
> +    QDECREF(options);
> +    return ret;
> +}
> +
>  /*
>   * Opens a disk image whose options are given as BlockdevRef in another block
>   * device's options.
> @@ -1604,13 +1711,37 @@ int bdrv_open(BlockDriverState **pbs, const char *filename,
>  
>      /* If there is a backing file, use it */
>      if ((flags & BDRV_O_NO_BACKING) == 0) {
> -        QDict *backing_options;
> +        QDict *backing_options, *backing_reference_options;
>  
> +        qdict_extract_subqdict(options, &backing_reference_options,
> +                               "backing_reference.");
>          qdict_extract_subqdict(options, &backing_options, "backing.");
> -        ret = bdrv_open_backing_file(bs, backing_options, &local_err);
> -        if (ret < 0) {
> +
> +        if (qdict_size(backing_reference_options) &&
> +            qdict_size(backing_options)) {
> +            error_setg(&local_err,
> +                       "Option \"backing_reference.*\" and \"backing.*\""
> +                       " cannot be used together");
> +            ret = -EINVAL;
> +            QDECREF(backing_reference_options);
> +            QDECREF(backing_options);
>              goto close_and_fail;
>          }
> +        if (qdict_size(backing_reference_options)) {
> +            QDECREF(backing_options);
> +            ret = bdrv_open_backing_reference_file(bs,
> +                                                   backing_reference_options,
> +                                                   &local_err);
> +            if (ret) {
> +                goto close_and_fail;
> +            }
> +        } else {
> +            QDECREF(backing_reference_options);
> +            ret = bdrv_open_backing_file(bs, backing_options, &local_err);
> +            if (ret < 0) {
> +                goto close_and_fail;
> +            }
> +        }
>      }
>  
>      bdrv_refresh_filename(bs);
> @@ -1941,6 +2072,14 @@ void bdrv_close(BlockDriverState *bs)
>      if (bs->drv) {
>          if (bs->backing_hd) {
>              BlockDriverState *backing_hd = bs->backing_hd;
> +            if (bs->backing_reference) {
> +                assert(backing_hd->backing_hd);
> +                if (backing_hd->backing_hd->job) {
> +                    block_job_cancel(backing_hd->backing_hd->job);
> +                }
> +                bdrv_set_backing_hd(backing_hd, NULL);
> +                bdrv_unref(backing_hd->backing_hd);
> +            }
>              bdrv_set_backing_hd(bs, NULL);
>              bdrv_unref(backing_hd);
>          }
> diff --git a/include/block/block.h b/include/block/block.h
> index 68f3b1a..7138e90 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -159,6 +159,7 @@ typedef enum BlockOpType {
>      BLOCK_OP_TYPE_RESIZE,
>      BLOCK_OP_TYPE_STREAM,
>      BLOCK_OP_TYPE_REPLACE,
> +    BLOCK_OP_TYPE_BACKING_REFERENCE,
>      BLOCK_OP_TYPE_MAX,
>  } BlockOpType;
>  
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 08dd8ba..624945d 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -375,6 +375,7 @@ struct BlockDriverState {
>      QDict *full_open_options;
>      char exact_filename[PATH_MAX];
>  
> +    bool backing_reference;
>      BlockDriverState *backing_hd;
>      BlockDriverState *file;
>  
> diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
> index 0360f37..fd67f40 100755
> --- a/tests/qemu-iotests/051
> +++ b/tests/qemu-iotests/051
> @@ -116,6 +116,19 @@ run_qemu -drive file="$TEST_IMG",file.backing.driver=file,file.backing.filename=
>  run_qemu -drive file="$TEST_IMG",file.backing.driver=qcow2,file.backing.file.filename="$TEST_IMG.orig"
>  
>  echo
> +echo === Backing file reference ===
> +echo
> +
> +run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
> +    -drive file="$TEST_IMG",driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename="$TEST_IMG.hidden"
> +
> +run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
> +    -drive file="$TEST_IMG",driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename="$TEST_IMG.hidden",backing.file.filename="$TEST_IMG.orig"
> +
> +run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
> +    -drive file="$TEST_IMG",driver=qcow2,file.backing_reference.drive_id=drive0,file.backing_reference.hidden-disk.filename="$TEST_IMG.hidden",file.backing.file.filename="$TEST_IMG.orig"
> +
> +echo
>  echo === Enable and disable lazy refcounting on the command line, plus some invalid values ===
>  echo
>  
> diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
> index 2890eac..cb8340b 100644
> --- a/tests/qemu-iotests/051.out
> +++ b/tests/qemu-iotests/051.out
> @@ -75,6 +75,19 @@ Testing: -drive file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.fil
>  QEMU_PROG: -drive file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.file.filename=TEST_DIR/t.qcow2.orig: Driver doesn't support backing files
>  
>  
> +=== Backing file reference ===
> +
> +Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden
> +QEMU X.Y.Z monitor - type 'help' for more information
> +(qemu) q^[[K^[[Dqu^[[K^[[D^[[Dqui^[[K^[[D^[[D^[[Dquit^[[K
> +
> +Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden,backing.file.filename=TEST_DIR/t.qcow2.orig
> +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing=drive0,backing.file.filename=TEST_DIR/t.qcow2.orig: Option "backing_reference.*" and "backing.*" cannot be used together
> +
> +Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,file.backing_reference.drive_id=drive0,file.backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden,file.backing.file.filename=TEST_DIR/t.qcow2.orig
> +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,driver=qcow2,file.backing=drive0,file.backing.file.filename=TEST_DIR/t.qcow2.orig: Option "backing_reference.*" and "backing.*" cannot be used together
> +
> +
>  === Enable and disable lazy refcounting on the command line, plus some invalid values ===
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication
  2015-03-26  7:21   ` Fam Zheng
@ 2015-03-26  7:32     ` Wen Congyang
  2015-03-27  1:06       ` Fam Zheng
  0 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  7:32 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/26/2015 03:21 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 49 insertions(+)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 3faf865..753b322 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
>>      bs->full_open_options = opts;
>>  }
>>  
>> +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
>> +                                  Error **errp)
>> +{
>> +    BDRVNBDState *s = bs->opaque;
>> +
>> +    /*
>> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
>> +     * QEMU becoming primary QEMU.
>> +     */
>> +    if (mode != COLO_MODE_PRIMARY) {
>> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");
> 
> Please use error_setg. (Please grep the whole series :)

Why? QERR_INVALID_PARAMETER includes ERROR_CLASS_GENERIC_ERROR.

Thanks
Wen Congyang

> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-25 15:38   ` [Qemu-devel] [Qemu-block] " Eric Blake
@ 2015-03-26  8:58     ` Wen Congyang
  2015-03-26 10:28       ` Gonglei
  0 siblings, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-03-26  8:58 UTC (permalink / raw)
  To: Eric Blake, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

On 03/25/2015 11:38 PM, Eric Blake wrote:
> On 03/25/2015 03:36 AM, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 147 insertions(+)
>>  create mode 100644 docs/block-replication.txt
>>
> 
> Grammar review only (I'll leave the technical review to others)
> 
>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>> new file mode 100644
>> index 0000000..874ed8e
>> --- /dev/null
>> +++ b/docs/block-replication.txt
>> @@ -0,0 +1,147 @@
>> +Block replication
>> +----------------------------------------
>> +Copyright Fujitsu, Corp. 2015
>> +Copyright (c) 2015 Intel Corporation
>> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> 
> Space after comma in English writing.

Yes, but I am not sure I can change it. HUAWEI always use this way.
You can find it in bootdevice.c

Thanks
Wen Congyang

> 
>> +
>> +This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +See the COPYING file in the top-level directory.
>> +
>> +The block replication is used for continuous checkpoints. It is designed
> 
> Sounds better as either of:
> The block replication feature is...
> Block replication is...
> 
>> +for COLO that Secondary VM is running. It can also be applied for FT/HA
> 
> Please define COLO and FT/HA on first use (okay to abbreviate elsewhere
> in the document, but the first use should not assume the acronym is
> well-known)
> 
> s/for COLO that/for COLO (COurse-grain LOck-stepping), where the/
> 
>> +scene that Secondary VM is not running.
> 
> s/for FT/HA scene that/for the FT/HA (Fault-tolerance/High Assurance)
> scenario, where the/
> 
>> +
>> +This document gives an overview of block replication's design.
>> +
>> +== Background ==
>> +High availability solutions such as micro checkpoint and COLO will do
>> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
> 
> s/checkpoint/checkpoints/
> 
>> +identical right after a VM checkpoint, but becomes different as the VM
> ...
>> +
>> +4) The hidden-disk is created automatically. It buffers the original content
>> +that is modified by the primary VM. It should also be an empty disk, and
>> +the dirver supports bdrv_make_empty().
> 
> s/dirver/driver/
> 
>> +
>> +== New block driver interface ==
>> +We add three block driver interfaces to control block replication:
>> +a. bdrv_start_replication()
>> +   Start block replication, called in migration/checkpoint thread.
>> +   We must call bdrv_start_replication() in secondary QEMU before
>> +   calling bdrv_start_replication() in primary QEMU.
>> +b. bdrv_do_checkpoint()
>> +   This interface is called after all VM state is transfered to
> 
> s/transfered/transferred/
> 
>> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
>> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
>> +   thread.
>> +c. bdrv_stop_replication()
>> +   It is called when failover. We will flush the Disk buffer into
> 
> s/when/on/
> 
>> +   Secondary Disk and stop block replication. The vm should be stopped
>> +   before calling it. The caller must hold the I/O mutex lock if it is
>> +   in migration/checkpoint thread.
>> +
>> +== Usage ==
>> +Primary:
>> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
>> +         children.0.file.filename=1.raw,\
>> +         children.0.driver=raw,\
>> +         children.1.file.driver=nbd+colo,\
>> +         children.1.file.host=xxx,\
>> +         children.1.file.port=xxx,\
>> +         children.1.file.export=xxx,\
>> +         children.1.driver=raw,\
>> +         children.1.ignore-errors=on
> 
> This command line looks like multiple arguments because of the leading
> whitespace on succeeding lines.  I don't know if there is any better way
> to format it, though, to make it obvious that it is all a single
> argument to -drive.
> 
>> +  Note:
>> +  1. NBD Client should not be the first child of quorum.
>> +  2. There should be only one NBD Client.
>> +  3. host is the secondary physical machine's hostname or IP
>> +  4. Each disk must have its own export name.
> 
> Maybe a note 5 to call out the formatting aspect of the command line?
> 
>> +
>> +Secondary:
>> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
>> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
>> +         backing_reference.drive_id=nbd_target1,\
>> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
>> +         backing_reference.hidden-disk.driver=qcow2,\
>> +         backing_reference.hidden-disk.allow-write-backing-file=on
>> +  Then run qmp command:
>> +    nbd_server_start host:port
>> +  Note:
>> +  1. The export name for the same disk must be the same in primary
>> +     and secondary QEMU command line
>> +  2. The qmp command nbd_server_start must be run before running the
>> +     qmp command migrate on primary QEMU
>> +  3. Don't use nbd_server_start's other options
>> +  4. Active disk, hidden disk and nbd target's length should be the
>> +     same.
>> +  5. It is better to put active disk and hidden disk in ramdisk.
>>
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-26  8:58     ` Wen Congyang
@ 2015-03-26 10:28       ` Gonglei
  2015-03-26 12:30         ` Eric Blake
  0 siblings, 1 reply; 55+ messages in thread
From: Gonglei @ 2015-03-26 10:28 UTC (permalink / raw)
  To: Wen Congyang, Eric Blake, qemu devel, Fam Zheng, Max Reitz,
	Paolo Bonzini
  Cc: Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

On 2015/3/26 16:58, Wen Congyang wrote:
> On 03/25/2015 11:38 PM, Eric Blake wrote:
>> > On 03/25/2015 03:36 AM, Wen Congyang wrote:
>>> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>> >> ---
>>> >>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>>> >>  1 file changed, 147 insertions(+)
>>> >>  create mode 100644 docs/block-replication.txt
>>> >>
>> > 
>> > Grammar review only (I'll leave the technical review to others)
>> > 
>>> >> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>>> >> new file mode 100644
>>> >> index 0000000..874ed8e
>>> >> --- /dev/null
>>> >> +++ b/docs/block-replication.txt
>>> >> @@ -0,0 +1,147 @@
>>> >> +Block replication
>>> >> +----------------------------------------
>>> >> +Copyright Fujitsu, Corp. 2015
>>> >> +Copyright (c) 2015 Intel Corporation
>>> >> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>> > 
>> > Space after comma in English writing.
> Yes, but I am not sure I can change it. HUAWEI always use this way.
> You can find it in bootdevice.c

Good catch, Eric is right. I will change all of this writing way in Qemu at 2.4.

Thanks.

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-26 10:28       ` Gonglei
@ 2015-03-26 12:30         ` Eric Blake
  2015-03-26 12:46           ` Gonglei
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Blake @ 2015-03-26 12:30 UTC (permalink / raw)
  To: Gonglei, Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

On 03/26/2015 04:28 AM, Gonglei wrote:

>>>> Grammar review only (I'll leave the technical review to others)
>>>>

>>>>>> +Copyright Fujitsu, Corp. 2015
>>>>>> +Copyright (c) 2015 Intel Corporation
>>>>>> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>>>>
>>>> Space after comma in English writing.
>> Yes, but I am not sure I can change it. HUAWEI always use this way.

Copyright lines are one thing that I am reluctant to change if it is not
my own line (some companies have policies on how their lines must look,
and I'm not in a position to argue the policy as I am not a lawyer).

>> You can find it in bootdevice.c

Such existing precedence is a strong argument to NOT changing it, at
least for a patch author that is not the copyright owner.

> 
> Good catch, Eric is right. I will change all of this writing way in Qemu at 2.4.

So, sounds like such a change would be a separate patch (probably
through the -trivial tree), and cover all files in the repo in one go,
without affecting this series.  Such a patch by a copyright owner would
have no problem being accepted, if it is wanted.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-26 12:30         ` Eric Blake
@ 2015-03-26 12:46           ` Gonglei
  0 siblings, 0 replies; 55+ messages in thread
From: Gonglei @ 2015-03-26 12:46 UTC (permalink / raw)
  To: Eric Blake, Wen Congyang, qemu devel, Fam Zheng, Max Reitz,
	Paolo Bonzini
  Cc: Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

On 2015/3/26 20:30, Eric Blake wrote:
> On 03/26/2015 04:28 AM, Gonglei wrote:
> 
>>>>> Grammar review only (I'll leave the technical review to others)
>>>>>
> 
>>>>>>> +Copyright Fujitsu, Corp. 2015
>>>>>>> +Copyright (c) 2015 Intel Corporation
>>>>>>> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>>>>>
>>>>> Space after comma in English writing.
>>> Yes, but I am not sure I can change it. HUAWEI always use this way.
> 
> Copyright lines are one thing that I am reluctant to change if it is not
> my own line (some companies have policies on how their lines must look,
> and I'm not in a position to argue the policy as I am not a lawyer).
> 
>>> You can find it in bootdevice.c
> 
> Such existing precedence is a strong argument to NOT changing it, at
> least for a patch author that is not the copyright owner.
> 
>>
>> Good catch, Eric is right. I will change all of this writing way in Qemu at 2.4.
> 
> So, sounds like such a change would be a separate patch (probably
> through the -trivial tree), and cover all files in the repo in one go,
> without affecting this series.  Such a patch by a copyright owner would
> have no problem being accepted, if it is wanted.
> 
Okay, will do, if it is not too later for rc2. :)

Thanks,
-Gonglei

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS
  2015-03-26  7:23         ` Wen Congyang
@ 2015-03-26 13:53           ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2015-03-26 13:53 UTC (permalink / raw)
  To: Wen Congyang, Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Max Reitz, zhanghailiang



On 26/03/2015 08:23, Wen Congyang wrote:
>>> >> Hmm, we need to use backup API in block.c, and block.o will
>>> >> be used by qemu-img which doesn't use common-obj.
>> > 
>> > I see. How about adding the referenced functions to stubs/?
> Good idea. I will try it.

Even better would be to move the functions that need backup APIs to
another file.  block.c is big, and it is going to be split soon into
multiple files.

Paolo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication
  2015-03-26  7:32     ` Wen Congyang
@ 2015-03-27  1:06       ` Fam Zheng
  2015-03-27  1:16         ` Wen Congyang
  2015-03-27  7:34         ` [Qemu-devel] Use of QERR_ macros and error classes (was: [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication) Markus Armbruster
  0 siblings, 2 replies; 55+ messages in thread
From: Fam Zheng @ 2015-03-27  1:06 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, armbru,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Thu, 03/26 15:32, Wen Congyang wrote:
> On 03/26/2015 03:21 PM, Fam Zheng wrote:
> > On Wed, 03/25 17:36, Wen Congyang wrote:
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >> ---
> >>  block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 49 insertions(+)
> >>
> >> diff --git a/block/nbd.c b/block/nbd.c
> >> index 3faf865..753b322 100644
> >> --- a/block/nbd.c
> >> +++ b/block/nbd.c
> >> @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
> >>      bs->full_open_options = opts;
> >>  }
> >>  
> >> +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
> >> +                                  Error **errp)
> >> +{
> >> +    BDRVNBDState *s = bs->opaque;
> >> +
> >> +    /*
> >> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
> >> +     * QEMU becoming primary QEMU.
> >> +     */
> >> +    if (mode != COLO_MODE_PRIMARY) {
> >> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");
> > 
> > Please use error_setg. (Please grep the whole series :)
> 
> Why? QERR_INVALID_PARAMETER includes ERROR_CLASS_GENERIC_ERROR.

Because error classes are deprecated. See also commit 5b347c5410.

Fam

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication
  2015-03-27  1:06       ` Fam Zheng
@ 2015-03-27  1:16         ` Wen Congyang
  2015-03-27  7:34         ` [Qemu-devel] Use of QERR_ macros and error classes (was: [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication) Markus Armbruster
  1 sibling, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-03-27  1:16 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block, armbru,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/27/2015 09:06 AM, Fam Zheng wrote:
> On Thu, 03/26 15:32, Wen Congyang wrote:
>> On 03/26/2015 03:21 PM, Fam Zheng wrote:
>>> On Wed, 03/25 17:36, Wen Congyang wrote:
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>>> ---
>>>>  block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 49 insertions(+)
>>>>
>>>> diff --git a/block/nbd.c b/block/nbd.c
>>>> index 3faf865..753b322 100644
>>>> --- a/block/nbd.c
>>>> +++ b/block/nbd.c
>>>> @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
>>>>      bs->full_open_options = opts;
>>>>  }
>>>>  
>>>> +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
>>>> +                                  Error **errp)
>>>> +{
>>>> +    BDRVNBDState *s = bs->opaque;
>>>> +
>>>> +    /*
>>>> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
>>>> +     * QEMU becoming primary QEMU.
>>>> +     */
>>>> +    if (mode != COLO_MODE_PRIMARY) {
>>>> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");
>>>
>>> Please use error_setg. (Please grep the whole series :)
>>
>> Why? QERR_INVALID_PARAMETER includes ERROR_CLASS_GENERIC_ERROR.
> 
> Because error classes are deprecated. See also commit 5b347c5410.

I see. Will fix it in the next version.

Thanks
Wen Congyang

> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] Use of QERR_ macros and error classes (was: [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication)
  2015-03-27  1:06       ` Fam Zheng
  2015-03-27  1:16         ` Wen Congyang
@ 2015-03-27  7:34         ` Markus Armbruster
  1 sibling, 0 replies; 55+ messages in thread
From: Markus Armbruster @ 2015-03-27  7:34 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, qemu devel, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Max Reitz, zhanghailiang

Fam Zheng <famz@redhat.com> writes:

> On Thu, 03/26 15:32, Wen Congyang wrote:
>> On 03/26/2015 03:21 PM, Fam Zheng wrote:
>> > On Wed, 03/25 17:36, Wen Congyang wrote:
>> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> >> ---
>> >>  block/nbd.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>> >>  1 file changed, 49 insertions(+)
>> >>
>> >> diff --git a/block/nbd.c b/block/nbd.c
>> >> index 3faf865..753b322 100644
>> >> --- a/block/nbd.c
>> >> +++ b/block/nbd.c
>> >> @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
>> >>      bs->full_open_options = opts;
>> >>  }
>> >>  
>> >> +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
>> >> +                                  Error **errp)
>> >> +{
>> >> +    BDRVNBDState *s = bs->opaque;
>> >> +
>> >> +    /*
>> >> +     * TODO: support COLO_SECONDARY_MODE if we allow secondary
>> >> +     * QEMU becoming primary QEMU.
>> >> +     */
>> >> +    if (mode != COLO_MODE_PRIMARY) {
>> >> +        error_set(errp, QERR_INVALID_PARAMETER, "mode");
>> > 
>> > Please use error_setg. (Please grep the whole series :)
>> 
>> Why? QERR_INVALID_PARAMETER includes ERROR_CLASS_GENERIC_ERROR.
>
> Because error classes are deprecated. See also commit 5b347c5410.

Yes, new code should use ERROR_CLASS_GENERIC_ERROR, preferably through
error_setg() or similar.

The QERR_ macros in qerror.h expand to *two* expressions CLS, FMT, for
use as function arguments.  Dirty.  Was done that way when error classes
were introduced to reduce churn (commit 8546505..0f32cf6).

Besides being dirty, they hide the error class.  Bad, because we really
want unusual things like a funky error class stand out right where the
error is created.  If they don't, we're prone to accidental uses of
funky error classes creeping in.

So far, we've eliminated all QERR_ macros using a funky error class but
one: QERR_DEVICE_NOT_FOUND.  That one needs to go, too.  I've got
patches in my queue for 2.4.

In fact, all of qerror.h needs to go.  I've got patches for 2.4 to clean
it out except for a bunch of QERR_.  Replacing these isn't hard, just
prone to conflicts, so I'm saving it for a little later.

For now, adding uses of QERR_ macros other than QERR_DEVICE_NOT_FOUND is
still okay, it just creates a little extra work for me.  But avoiding
them makes their ultimate removal a bit easier, and is encouraged.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-03-26  6:31   ` [Qemu-devel] " Fam Zheng
  2015-03-26  7:17     ` Wen Congyang
@ 2015-04-03  2:35     ` Wen Congyang
  2015-04-03  5:19       ` Fam Zheng
  1 sibling, 1 reply; 55+ messages in thread
From: Wen Congyang @ 2015-04-03  2:35 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On 03/26/2015 02:31 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 147 insertions(+)
>>  create mode 100644 docs/block-replication.txt
>>
>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>> new file mode 100644
>> index 0000000..874ed8e
>> --- /dev/null
>> +++ b/docs/block-replication.txt
>> @@ -0,0 +1,147 @@
>> +Block replication
>> +----------------------------------------
>> +Copyright Fujitsu, Corp. 2015
>> +Copyright (c) 2015 Intel Corporation
>> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>> +
>> +This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +See the COPYING file in the top-level directory.
>> +
>> +The block replication is used for continuous checkpoints. It is designed
>> +for COLO that Secondary VM is running. It can also be applied for FT/HA
>> +scene that Secondary VM is not running.
>> +
>> +This document gives an overview of block replication's design.
>> +
>> +== Background ==
>> +High availability solutions such as micro checkpoint and COLO will do
>> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
>> +identical right after a VM checkpoint, but becomes different as the VM
>> +executes till the next checkpoint. To support disk contents checkpoint,
>> +the modified disk contents in the Secondary VM must be buffered, and are
>> +only dropped at next checkpoint time. To reduce the network transportation
>> +effort at the time of checkpoint, the disk modification operations of
>> +Primary disk are asynchronously forwarded to the Secondary node.
>> +
>> +== Workflow ==
>> +The following is the image of block replication workflow:
>> +
>> +        +----------------------+            +------------------------+
>> +        |Primary Write Requests|            |Secondary Write Requests|
>> +        +----------------------+            +------------------------+
>> +                  |                                       |
>> +                  |                                      (4)
>> +                  |                                       V
>> +                  |                              /-------------\
>> +                  |      Copy and Forward        |             |
>> +                  |---------(1)----------+       | Disk Buffer |
>> +                  |                      |       |             |
>> +                  |                     (3)      \-------------/
>> +                  |                 speculative      ^
>> +                  |                write through    (2)
>> +                  |                      |           |
>> +                  V                      V           |
>> +           +--------------+           +----------------+
>> +           | Primary Disk |           | Secondary Disk |
>> +           +--------------+           +----------------+
>> +
>> +    1) Primary write requests will be copied and forwarded to Secondary
>> +       QEMU.
>> +    2) Before Primary write requests are written to Secondary disk, the
>> +       original sector content will be read from Secondary disk and
>> +       buffered in the Disk buffer, but it will not overwrite the existing
>> +       sector content in the Disk buffer.
> 
> Could you elaborate a bit about the "existing sector content" here? IIUC, it
> could be from either "Secondary Write Requests" or previous COW of "Primary
> Write Requests". Is that right?
> 
>> +    3) Primary write requests will be written to Secondary disk.
>> +    4) Secondary write requests will be buffered in the Disk buffer and it
>> +       will overwrite the existing sector content in the buffer.
>> +
>> +== Architecture ==
>> +We are going to implement COLO block replication from many basic
>> +blocks that are already in QEMU.
>> +
>> +         virtio-blk       ||
>> +             ^            ||                            .----------
>> +             |            ||                            | Secondary
>> +        1 Quorum          ||                            '----------
>> +         /      \         ||
>> +        /        \        ||
>> +   Primary      2 NBD  ------->  2 NBD
>> +     disk       client    ||     server                                         virtio-blk
>> +                          ||        ^                                                ^
>> +--------.                 ||        |                                                |
>> +Primary |                 ||  Secondary disk <--------- hidden-disk 4 <--------- active-disk 3
>> +--------'                 ||        |          backing        ^       backing
>> +                          ||        |                         |
>> +                          ||        |                         |
>> +                          ||        '-------------------------'
>> +                          ||           drive-backup sync=none
>> +
>> +1) The disk on the primary is represented by a block device with two
>> +children, providing replication between a primary disk and the host that
>> +runs the secondary VM. The read pattern for quorum can be extended to
>> +make the primary always read from the local disk instead of going through
>> +NBD.
>> +
>> +2) The secondary disk receives writes from the primary VM through QEMU's
>> +embedded NBD server (speculative write-through).
>> +
>> +3) The disk on the secondary is represented by a custom block device
>> +(called active-disk). It should be an empty disk, and the format should
>> +be qcow2.
>> +
>> +4) The hidden-disk is created automatically. It buffers the original content
>> +that is modified by the primary VM. It should also be an empty disk, and
>> +the dirver supports bdrv_make_empty().
>> +
>> +== New block driver interface ==
>> +We add three block driver interfaces to control block replication:
>> +a. bdrv_start_replication()
>> +   Start block replication, called in migration/checkpoint thread.
>> +   We must call bdrv_start_replication() in secondary QEMU before
>> +   calling bdrv_start_replication() in primary QEMU.
>> +b. bdrv_do_checkpoint()
>> +   This interface is called after all VM state is transfered to
>> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
>> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
>> +   thread.
>> +c. bdrv_stop_replication()
>> +   It is called when failover. We will flush the Disk buffer into
>> +   Secondary Disk and stop block replication. The vm should be stopped
>> +   before calling it. The caller must hold the I/O mutex lock if it is
>> +   in migration/checkpoint thread.
>> +
>> +== Usage ==
>> +Primary:
>> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
>> +         children.0.file.filename=1.raw,\
>> +         children.0.driver=raw,\
>> +         children.1.file.driver=nbd+colo,\
>> +         children.1.file.host=xxx,\
>> +         children.1.file.port=xxx,\
>> +         children.1.file.export=xxx,\
>> +         children.1.driver=raw,\
>> +         children.1.ignore-errors=on
>    ^^^^^^^^^
> Won't the leading spaces cause trouble? :)
> 
>> +  Note:
>> +  1. NBD Client should not be the first child of quorum.
>> +  2. There should be only one NBD Client.
>> +  3. host is the secondary physical machine's hostname or IP
>> +  4. Each disk must have its own export name.
>> +
>> +Secondary:
>> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
>> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
>> +         backing_reference.drive_id=nbd_target1,\
>> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
>> +         backing_reference.hidden-disk.driver=qcow2,\
>> +         backing_reference.hidden-disk.allow-write-backing-file=on
>> +  Then run qmp command:
>> +    nbd_server_start host:port
> 
> s/nbd_server_start/nbd-server-start

The command name in qmp-commands.hx is nbd-server-start, but
(qemu) nbd-server-start 192.168.3.1:8889
unknown command: 'nbd-server-start'

What's wrong?

Thanks
Wen Congyang

> 
> For a few more too.
> 
> 
>> +  Note:
>> +  1. The export name for the same disk must be the same in primary
>> +     and secondary QEMU command line
>> +  2. The qmp command nbd_server_start must be run before running the
>> +     qmp command migrate on primary QEMU
>> +  3. Don't use nbd_server_start's other options
>> +  4. Active disk, hidden disk and nbd target's length should be the
>> +     same.
>> +  5. It is better to put active disk and hidden disk in ramdisk.
> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description
  2015-04-03  2:35     ` Wen Congyang
@ 2015-04-03  5:19       ` Fam Zheng
  0 siblings, 0 replies; 55+ messages in thread
From: Fam Zheng @ 2015-04-03  5:19 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Lai Jiangshan, qemu block,
	Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert, qemu devel,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Max Reitz,
	zhanghailiang

On Fri, 04/03 10:35, Wen Congyang wrote:
> On 03/26/2015 02:31 PM, Fam Zheng wrote:
> > On Wed, 03/25 17:36, Wen Congyang wrote:
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >> ---
> >>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 147 insertions(+)
> >>  create mode 100644 docs/block-replication.txt
> >>
> >> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
> >> new file mode 100644
> >> index 0000000..874ed8e
> >> --- /dev/null
> >> +++ b/docs/block-replication.txt
> >> @@ -0,0 +1,147 @@
> >> +Block replication
> >> +----------------------------------------
> >> +Copyright Fujitsu, Corp. 2015
> >> +Copyright (c) 2015 Intel Corporation
> >> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> >> +
> >> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> >> +See the COPYING file in the top-level directory.
> >> +
> >> +The block replication is used for continuous checkpoints. It is designed
> >> +for COLO that Secondary VM is running. It can also be applied for FT/HA
> >> +scene that Secondary VM is not running.
> >> +
> >> +This document gives an overview of block replication's design.
> >> +
> >> +== Background ==
> >> +High availability solutions such as micro checkpoint and COLO will do
> >> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
> >> +identical right after a VM checkpoint, but becomes different as the VM
> >> +executes till the next checkpoint. To support disk contents checkpoint,
> >> +the modified disk contents in the Secondary VM must be buffered, and are
> >> +only dropped at next checkpoint time. To reduce the network transportation
> >> +effort at the time of checkpoint, the disk modification operations of
> >> +Primary disk are asynchronously forwarded to the Secondary node.
> >> +
> >> +== Workflow ==
> >> +The following is the image of block replication workflow:
> >> +
> >> +        +----------------------+            +------------------------+
> >> +        |Primary Write Requests|            |Secondary Write Requests|
> >> +        +----------------------+            +------------------------+
> >> +                  |                                       |
> >> +                  |                                      (4)
> >> +                  |                                       V
> >> +                  |                              /-------------\
> >> +                  |      Copy and Forward        |             |
> >> +                  |---------(1)----------+       | Disk Buffer |
> >> +                  |                      |       |             |
> >> +                  |                     (3)      \-------------/
> >> +                  |                 speculative      ^
> >> +                  |                write through    (2)
> >> +                  |                      |           |
> >> +                  V                      V           |
> >> +           +--------------+           +----------------+
> >> +           | Primary Disk |           | Secondary Disk |
> >> +           +--------------+           +----------------+
> >> +
> >> +    1) Primary write requests will be copied and forwarded to Secondary
> >> +       QEMU.
> >> +    2) Before Primary write requests are written to Secondary disk, the
> >> +       original sector content will be read from Secondary disk and
> >> +       buffered in the Disk buffer, but it will not overwrite the existing
> >> +       sector content in the Disk buffer.
> > 
> > Could you elaborate a bit about the "existing sector content" here? IIUC, it
> > could be from either "Secondary Write Requests" or previous COW of "Primary
> > Write Requests". Is that right?
> > 
> >> +    3) Primary write requests will be written to Secondary disk.
> >> +    4) Secondary write requests will be buffered in the Disk buffer and it
> >> +       will overwrite the existing sector content in the buffer.
> >> +
> >> +== Architecture ==
> >> +We are going to implement COLO block replication from many basic
> >> +blocks that are already in QEMU.
> >> +
> >> +         virtio-blk       ||
> >> +             ^            ||                            .----------
> >> +             |            ||                            | Secondary
> >> +        1 Quorum          ||                            '----------
> >> +         /      \         ||
> >> +        /        \        ||
> >> +   Primary      2 NBD  ------->  2 NBD
> >> +     disk       client    ||     server                                         virtio-blk
> >> +                          ||        ^                                                ^
> >> +--------.                 ||        |                                                |
> >> +Primary |                 ||  Secondary disk <--------- hidden-disk 4 <--------- active-disk 3
> >> +--------'                 ||        |          backing        ^       backing
> >> +                          ||        |                         |
> >> +                          ||        |                         |
> >> +                          ||        '-------------------------'
> >> +                          ||           drive-backup sync=none
> >> +
> >> +1) The disk on the primary is represented by a block device with two
> >> +children, providing replication between a primary disk and the host that
> >> +runs the secondary VM. The read pattern for quorum can be extended to
> >> +make the primary always read from the local disk instead of going through
> >> +NBD.
> >> +
> >> +2) The secondary disk receives writes from the primary VM through QEMU's
> >> +embedded NBD server (speculative write-through).
> >> +
> >> +3) The disk on the secondary is represented by a custom block device
> >> +(called active-disk). It should be an empty disk, and the format should
> >> +be qcow2.
> >> +
> >> +4) The hidden-disk is created automatically. It buffers the original content
> >> +that is modified by the primary VM. It should also be an empty disk, and
> >> +the dirver supports bdrv_make_empty().
> >> +
> >> +== New block driver interface ==
> >> +We add three block driver interfaces to control block replication:
> >> +a. bdrv_start_replication()
> >> +   Start block replication, called in migration/checkpoint thread.
> >> +   We must call bdrv_start_replication() in secondary QEMU before
> >> +   calling bdrv_start_replication() in primary QEMU.
> >> +b. bdrv_do_checkpoint()
> >> +   This interface is called after all VM state is transfered to
> >> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
> >> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
> >> +   thread.
> >> +c. bdrv_stop_replication()
> >> +   It is called when failover. We will flush the Disk buffer into
> >> +   Secondary Disk and stop block replication. The vm should be stopped
> >> +   before calling it. The caller must hold the I/O mutex lock if it is
> >> +   in migration/checkpoint thread.
> >> +
> >> +== Usage ==
> >> +Primary:
> >> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
> >> +         children.0.file.filename=1.raw,\
> >> +         children.0.driver=raw,\
> >> +         children.1.file.driver=nbd+colo,\
> >> +         children.1.file.host=xxx,\
> >> +         children.1.file.port=xxx,\
> >> +         children.1.file.export=xxx,\
> >> +         children.1.driver=raw,\
> >> +         children.1.ignore-errors=on
> >    ^^^^^^^^^
> > Won't the leading spaces cause trouble? :)
> > 
> >> +  Note:
> >> +  1. NBD Client should not be the first child of quorum.
> >> +  2. There should be only one NBD Client.
> >> +  3. host is the secondary physical machine's hostname or IP
> >> +  4. Each disk must have its own export name.
> >> +
> >> +Secondary:
> >> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
> >> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
> >> +         backing_reference.drive_id=nbd_target1,\
> >> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
> >> +         backing_reference.hidden-disk.driver=qcow2,\
> >> +         backing_reference.hidden-disk.allow-write-backing-file=on
> >> +  Then run qmp command:
> >> +    nbd_server_start host:port
> > 
> > s/nbd_server_start/nbd-server-start
> 
> The command name in qmp-commands.hx is nbd-server-start, but
> (qemu) nbd-server-start 192.168.3.1:8889
> unknown command: 'nbd-server-start'
> 
> What's wrong?

You're using HMP interface. QMP is a JSON-like protocol, you can see
scripts/qmp/qmp.py as the python library, and tests/qemu-iotests for its usage.

Fam

> 
> Thanks
> Wen Congyang
> 
> > 
> > For a few more too.
> > 
> > 
> >> +  Note:
> >> +  1. The export name for the same disk must be the same in primary
> >> +     and secondary QEMU command line
> >> +  2. The qmp command nbd_server_start must be run before running the
> >> +     qmp command migrate on primary QEMU
> >> +  3. Don't use nbd_server_start's other options
> >> +  4. Active disk, hidden disk and nbd target's length should be the
> >> +     same.
> >> +  5. It is better to put active disk and hidden disk in ramdisk.
> > 
> > Fam
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
                   ` (14 preceding siblings ...)
  2015-03-25 14:24 ` Dr. David Alan Gilbert
@ 2015-07-01  3:09 ` Michael R. Hines
  2015-07-01  4:11   ` Wen Congyang
  15 siblings, 1 reply; 55+ messages in thread
From: Michael R. Hines @ 2015-07-01  3:09 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert,
	mrhines@us.ibm.com >> Michael R. Hines, Stefan Hajnoczi,
	Yang Hongyang

On 03/25/2015 04:36 AM, Wen Congyang wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
>
> Usage:
> Please refer to docs/block-replication.txt
>
> You can get the patch here:
> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
>
> Changs Log:
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
>
> Wen Congyang (13):
>    docs: block replication's description
>    quorum: allow ignoring child errors
>    NBD client: connect to nbd server later
>    Add new block driver interfaces to control block replication
>    quorum: implement block driver interfaces for block replication
>    NBD client: implement block driver interfaces for block replication
>    allow writing to the backing file
>    Allow creating backup jobs when opening BDS
>    block: Parse "backing_reference" option to reference existing BDS
>    Backup: clear all bitmap when doing block checkpoint
>    qcow2: support colo
>    skip nbd_target when starting block replication
>    Don't allow a disk use backing reference target
>
>   block.c                    | 242 +++++++++++++++++++++++-
>   block/Makefile.objs        |   2 +-
>   block/backup.c             |  12 ++
>   block/nbd.c                | 171 +++++++++++++++--
>   block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>   block/qcow2.h              |   6 +
>   block/quorum.c             | 143 ++++++++++++++-
>   docs/block-replication.txt | 147 +++++++++++++++
>   include/block/block.h      |   5 +
>   include/block/block_int.h  |  13 ++
>   include/qemu/hbitmap.h     |   8 +
>   qapi/block.json            |  16 ++
>   tests/qemu-iotests/051     |  13 ++
>   tests/qemu-iotests/051.out |  13 ++
>   util/hbitmap.c             |  19 ++
>   15 files changed, 1230 insertions(+), 27 deletions(-)
>   create mode 100644 docs/block-replication.txt
>

When I try this patch with the MicroCheckpointing codebase, I'm having 
trouble starting up the secondary/backup VM
using the branch "block-replication-v7"

$ qemu-system-x86_64 -drive 
if=none,driver=raw,file=file.raw,id=nbd_target1 -drive 
if=virtio,driver=replication,mode=secondary,export=foo,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing_reference.drive_id=nbd_target1,file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,file.backing_reference.hidden-disk.driver=qcow2,file.backing_reference.hidden-disk.allow-write-backing-file=on

Block format 'qcow2' used by device '' doesn't support the option 
'backing_reference.hidden-disk.allow-write-backing-file'

What am I doing wrong here?

- Michael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-07-01  3:09 ` Michael R. Hines
@ 2015-07-01  4:11   ` Wen Congyang
  2015-07-01 19:30     ` Michael R. Hines
  2015-07-01 19:37     ` Michael R. Hines
  0 siblings, 2 replies; 55+ messages in thread
From: Wen Congyang @ 2015-07-01  4:11 UTC (permalink / raw)
  To: Michael R. Hines, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert,
	mrhines@us.ibm.com >> Michael R. Hines, Stefan Hajnoczi,
	Yang Hongyang

On 07/01/2015 11:09 AM, Michael R. Hines wrote:
> On 03/25/2015 04:36 AM, Wen Congyang wrote:
>> Block replication is a very important feature which is used for
>> continuous checkpoints(for example: COLO).
>>
>> Usage:
>> Please refer to docs/block-replication.txt
>>
>> You can get the patch here:
>> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
>>
>> Changs Log:
>> V2:
>> 1. Redesign the secondary qemu(use image-fleecing)
>> 2. Use Error objects to return error message
>> 3. Address the comments from Max Reitz and Eric Blake
>>
>> Wen Congyang (13):
>>    docs: block replication's description
>>    quorum: allow ignoring child errors
>>    NBD client: connect to nbd server later
>>    Add new block driver interfaces to control block replication
>>    quorum: implement block driver interfaces for block replication
>>    NBD client: implement block driver interfaces for block replication
>>    allow writing to the backing file
>>    Allow creating backup jobs when opening BDS
>>    block: Parse "backing_reference" option to reference existing BDS
>>    Backup: clear all bitmap when doing block checkpoint
>>    qcow2: support colo
>>    skip nbd_target when starting block replication
>>    Don't allow a disk use backing reference target
>>
>>   block.c                    | 242 +++++++++++++++++++++++-
>>   block/Makefile.objs        |   2 +-
>>   block/backup.c             |  12 ++
>>   block/nbd.c                | 171 +++++++++++++++--
>>   block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>>   block/qcow2.h              |   6 +
>>   block/quorum.c             | 143 ++++++++++++++-
>>   docs/block-replication.txt | 147 +++++++++++++++
>>   include/block/block.h      |   5 +
>>   include/block/block_int.h  |  13 ++
>>   include/qemu/hbitmap.h     |   8 +
>>   qapi/block.json            |  16 ++
>>   tests/qemu-iotests/051     |  13 ++
>>   tests/qemu-iotests/051.out |  13 ++
>>   util/hbitmap.c             |  19 ++
>>   15 files changed, 1230 insertions(+), 27 deletions(-)
>>   create mode 100644 docs/block-replication.txt
>>
> 
> When I try this patch with the MicroCheckpointing codebase, I'm having trouble starting up the secondary/backup VM
> using the branch "block-replication-v7"
> 
> $ qemu-system-x86_64 -drive if=none,driver=raw,file=file.raw,id=nbd_target1 -drive if=virtio,driver=replication,mode=secondary,export=foo,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing_reference.drive_id=nbd_target1,file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,file.backing_reference.hidden-disk.driver=qcow2,file.backing_reference.hidden-disk.allow-write-backing-file=on
> 
> Block format 'qcow2' used by device '' doesn't support the option 'backing_reference.hidden-disk.allow-write-backing-file'
> 
> What am I doing wrong here?

The command is wrong. If you use the branch block-replication-v7, the command line should like:
-drive if=none,driver=raw,file=/data/images/kvm/suse/suse11_3.img,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=colo1

Thanks
Wen Congyang

> 
> - Michael
> 
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-07-01  4:11   ` Wen Congyang
@ 2015-07-01 19:30     ` Michael R. Hines
  2015-07-01 19:37     ` Michael R. Hines
  1 sibling, 0 replies; 55+ messages in thread
From: Michael R. Hines @ 2015-07-01 19:30 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert,
	mrhines@us.ibm.com >> Michael R. Hines, Stefan Hajnoczi,
	Yang Hongyang

On 06/30/2015 11:11 PM, Wen Congyang wrote:
> On 07/01/2015 11:09 AM, Michael R. Hines wrote:
>> On 03/25/2015 04:36 AM, Wen Congyang wrote:
>>> Block replication is a very important feature which is used for
>>> continuous checkpoints(for example: COLO).
>>>
>>> Usage:
>>> Please refer to docs/block-replication.txt
>>>
>>> You can get the patch here:
>>> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
>>>
>>> Changs Log:
>>> V2:
>>> 1. Redesign the secondary qemu(use image-fleecing)
>>> 2. Use Error objects to return error message
>>> 3. Address the comments from Max Reitz and Eric Blake
>>>
>>> Wen Congyang (13):
>>>     docs: block replication's description
>>>     quorum: allow ignoring child errors
>>>     NBD client: connect to nbd server later
>>>     Add new block driver interfaces to control block replication
>>>     quorum: implement block driver interfaces for block replication
>>>     NBD client: implement block driver interfaces for block replication
>>>     allow writing to the backing file
>>>     Allow creating backup jobs when opening BDS
>>>     block: Parse "backing_reference" option to reference existing BDS
>>>     Backup: clear all bitmap when doing block checkpoint
>>>     qcow2: support colo
>>>     skip nbd_target when starting block replication
>>>     Don't allow a disk use backing reference target
>>>
>>>    block.c                    | 242 +++++++++++++++++++++++-
>>>    block/Makefile.objs        |   2 +-
>>>    block/backup.c             |  12 ++
>>>    block/nbd.c                | 171 +++++++++++++++--
>>>    block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>>>    block/qcow2.h              |   6 +
>>>    block/quorum.c             | 143 ++++++++++++++-
>>>    docs/block-replication.txt | 147 +++++++++++++++
>>>    include/block/block.h      |   5 +
>>>    include/block/block_int.h  |  13 ++
>>>    include/qemu/hbitmap.h     |   8 +
>>>    qapi/block.json            |  16 ++
>>>    tests/qemu-iotests/051     |  13 ++
>>>    tests/qemu-iotests/051.out |  13 ++
>>>    util/hbitmap.c             |  19 ++
>>>    15 files changed, 1230 insertions(+), 27 deletions(-)
>>>    create mode 100644 docs/block-replication.txt
>>>
>> When I try this patch with the MicroCheckpointing codebase, I'm having trouble starting up the secondary/backup VM
>> using the branch "block-replication-v7"
>>
>> $ qemu-system-x86_64 -drive if=none,driver=raw,file=file.raw,id=nbd_target1 -drive if=virtio,driver=replication,mode=secondary,export=foo,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing_reference.drive_id=nbd_target1,file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,file.backing_reference.hidden-disk.driver=qcow2,file.backing_reference.hidden-disk.allow-write-backing-file=on
>>
>> Block format 'qcow2' used by device '' doesn't support the option 'backing_reference.hidden-disk.allow-write-backing-file'
>>
>> What am I doing wrong here?
> The command is wrong. If you use the branch block-replication-v7, the command line should like:
> -drive if=none,driver=raw,file=/data/images/kvm/suse/suse11_3.img,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=colo1
>
> Thanks
> Wen Congyang
>
>> - Michael
>>
>> .
>>

Can you also tell me what's wrong with the primary/source VM command 
line? Or could you update the documentation please?
Am I using the correct branch?

Here's my command line:

qemu-system-x86_64: -drive 
if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=/kvm_repo/image.raw,children.0.driver=raw,children.1.file.driver=nbd,children.1.file.host=192.168.168.3,children.1.file.port=6262,children.1.file.export=foo,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on: 

Failed to read export length

What exactly is the purpose of the "export" parameter?

Thanks,

- Michael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-07-01  4:11   ` Wen Congyang
  2015-07-01 19:30     ` Michael R. Hines
@ 2015-07-01 19:37     ` Michael R. Hines
  2015-07-02  0:58       ` Wen Congyang
  2015-07-02  1:43       ` Wen Congyang
  1 sibling, 2 replies; 55+ messages in thread
From: Michael R. Hines @ 2015-07-01 19:37 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert,
	mrhines@us.ibm.com >> Michael R. Hines, Stefan Hajnoczi,
	Yang Hongyang

On 06/30/2015 11:11 PM, Wen Congyang wrote:
> On 07/01/2015 11:09 AM, Michael R. Hines wrote:
>> On 03/25/2015 04:36 AM, Wen Congyang wrote:
>>> Block replication is a very important feature which is used for
>>> continuous checkpoints(for example: COLO).
>>>
>>> Usage:
>>> Please refer to docs/block-replication.txt
>>>
>>> You can get the patch here:
>>> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
>>>
>>> Changs Log:
>>> V2:
>>> 1. Redesign the secondary qemu(use image-fleecing)
>>> 2. Use Error objects to return error message
>>> 3. Address the comments from Max Reitz and Eric Blake
>>>
>>> Wen Congyang (13):
>>>     docs: block replication's description
>>>     quorum: allow ignoring child errors
>>>     NBD client: connect to nbd server later
>>>     Add new block driver interfaces to control block replication
>>>     quorum: implement block driver interfaces for block replication
>>>     NBD client: implement block driver interfaces for block replication
>>>     allow writing to the backing file
>>>     Allow creating backup jobs when opening BDS
>>>     block: Parse "backing_reference" option to reference existing BDS
>>>     Backup: clear all bitmap when doing block checkpoint
>>>     qcow2: support colo
>>>     skip nbd_target when starting block replication
>>>     Don't allow a disk use backing reference target
>>>
>>>    block.c                    | 242 +++++++++++++++++++++++-
>>>    block/Makefile.objs        |   2 +-
>>>    block/backup.c             |  12 ++
>>>    block/nbd.c                | 171 +++++++++++++++--
>>>    block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>>>    block/qcow2.h              |   6 +
>>>    block/quorum.c             | 143 ++++++++++++++-
>>>    docs/block-replication.txt | 147 +++++++++++++++
>>>    include/block/block.h      |   5 +
>>>    include/block/block_int.h  |  13 ++
>>>    include/qemu/hbitmap.h     |   8 +
>>>    qapi/block.json            |  16 ++
>>>    tests/qemu-iotests/051     |  13 ++
>>>    tests/qemu-iotests/051.out |  13 ++
>>>    util/hbitmap.c             |  19 ++
>>>    15 files changed, 1230 insertions(+), 27 deletions(-)
>>>    create mode 100644 docs/block-replication.txt
>>>
>> When I try this patch with the MicroCheckpointing codebase, I'm having trouble starting up the secondary/backup VM
>> using the branch "block-replication-v7"
>>
>> $ qemu-system-x86_64 -drive if=none,driver=raw,file=file.raw,id=nbd_target1 -drive if=virtio,driver=replication,mode=secondary,export=foo,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing_reference.drive_id=nbd_target1,file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,file.backing_reference.hidden-disk.driver=qcow2,file.backing_reference.hidden-disk.allow-write-backing-file=on
>>
>> Block format 'qcow2' used by device '' doesn't support the option 'backing_reference.hidden-disk.allow-write-backing-file'
>>
>> What am I doing wrong here?
> The command is wrong. If you use the branch block-replication-v7, the command line should like:
> -drive if=none,driver=raw,file=/data/images/kvm/suse/suse11_3.img,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=colo1
>
> Thanks
> Wen Congyang
>
>> - Michael
>>
>> .
>>
>

Also, this command line you sent doesn't have the export option, but 
your wiki does. What's the difference?

- Michael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-07-01 19:37     ` Michael R. Hines
@ 2015-07-02  0:58       ` Wen Congyang
  2015-07-02  1:43       ` Wen Congyang
  1 sibling, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-07-02  0:58 UTC (permalink / raw)
  To: Michael R. Hines, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert,
	mrhines@us.ibm.com >> Michael R. Hines, Stefan Hajnoczi,
	Yang Hongyang

On 07/02/2015 03:37 AM, Michael R. Hines wrote:
> On 06/30/2015 11:11 PM, Wen Congyang wrote:
>> On 07/01/2015 11:09 AM, Michael R. Hines wrote:
>>> On 03/25/2015 04:36 AM, Wen Congyang wrote:
>>>> Block replication is a very important feature which is used for
>>>> continuous checkpoints(for example: COLO).
>>>>
>>>> Usage:
>>>> Please refer to docs/block-replication.txt
>>>>
>>>> You can get the patch here:
>>>> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
>>>>
>>>> Changs Log:
>>>> V2:
>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>> 2. Use Error objects to return error message
>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>
>>>> Wen Congyang (13):
>>>>     docs: block replication's description
>>>>     quorum: allow ignoring child errors
>>>>     NBD client: connect to nbd server later
>>>>     Add new block driver interfaces to control block replication
>>>>     quorum: implement block driver interfaces for block replication
>>>>     NBD client: implement block driver interfaces for block replication
>>>>     allow writing to the backing file
>>>>     Allow creating backup jobs when opening BDS
>>>>     block: Parse "backing_reference" option to reference existing BDS
>>>>     Backup: clear all bitmap when doing block checkpoint
>>>>     qcow2: support colo
>>>>     skip nbd_target when starting block replication
>>>>     Don't allow a disk use backing reference target
>>>>
>>>>    block.c                    | 242 +++++++++++++++++++++++-
>>>>    block/Makefile.objs        |   2 +-
>>>>    block/backup.c             |  12 ++
>>>>    block/nbd.c                | 171 +++++++++++++++--
>>>>    block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>    block/qcow2.h              |   6 +
>>>>    block/quorum.c             | 143 ++++++++++++++-
>>>>    docs/block-replication.txt | 147 +++++++++++++++
>>>>    include/block/block.h      |   5 +
>>>>    include/block/block_int.h  |  13 ++
>>>>    include/qemu/hbitmap.h     |   8 +
>>>>    qapi/block.json            |  16 ++
>>>>    tests/qemu-iotests/051     |  13 ++
>>>>    tests/qemu-iotests/051.out |  13 ++
>>>>    util/hbitmap.c             |  19 ++
>>>>    15 files changed, 1230 insertions(+), 27 deletions(-)
>>>>    create mode 100644 docs/block-replication.txt
>>>>
>>> When I try this patch with the MicroCheckpointing codebase, I'm having trouble starting up the secondary/backup VM
>>> using the branch "block-replication-v7"
>>>
>>> $ qemu-system-x86_64 -drive if=none,driver=raw,file=file.raw,id=nbd_target1 -drive if=virtio,driver=replication,mode=secondary,export=foo,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing_reference.drive_id=nbd_target1,file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,file.backing_reference.hidden-disk.driver=qcow2,file.backing_reference.hidden-disk.allow-write-backing-file=on
>>>
>>> Block format 'qcow2' used by device '' doesn't support the option 'backing_reference.hidden-disk.allow-write-backing-file'
>>>
>>> What am I doing wrong here?
>> The command is wrong. If you use the branch block-replication-v7, the command line should like:
>> -drive if=none,driver=raw,file=/data/images/kvm/suse/suse11_3.img,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=colo1
>>
>> Thanks
>> Wen Congyang
>>
>>> - Michael
>>>
>>> .
>>>
>>
> 
> Also, this command line you sent doesn't have the export option, but your wiki does. What's the difference?

The wiki is older than the codes. I will update it. Please wait.

Thanks
Wen Congyang

> 
> - Michael
> 
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
  2015-07-01 19:37     ` Michael R. Hines
  2015-07-02  0:58       ` Wen Congyang
@ 2015-07-02  1:43       ` Wen Congyang
  1 sibling, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2015-07-02  1:43 UTC (permalink / raw)
  To: Michael R. Hines, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert,
	mrhines@us.ibm.com >> Michael R. Hines, Stefan Hajnoczi,
	Yang Hongyang

On 07/02/2015 03:37 AM, Michael R. Hines wrote:
> On 06/30/2015 11:11 PM, Wen Congyang wrote:
>> On 07/01/2015 11:09 AM, Michael R. Hines wrote:
>>> On 03/25/2015 04:36 AM, Wen Congyang wrote:
>>>> Block replication is a very important feature which is used for
>>>> continuous checkpoints(for example: COLO).
>>>>
>>>> Usage:
>>>> Please refer to docs/block-replication.txt
>>>>
>>>> You can get the patch here:
>>>> https://github.com/wencongyang/qemu-colo/commits/block-replication-v2
>>>>
>>>> Changs Log:
>>>> V2:
>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>> 2. Use Error objects to return error message
>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>
>>>> Wen Congyang (13):
>>>>     docs: block replication's description
>>>>     quorum: allow ignoring child errors
>>>>     NBD client: connect to nbd server later
>>>>     Add new block driver interfaces to control block replication
>>>>     quorum: implement block driver interfaces for block replication
>>>>     NBD client: implement block driver interfaces for block replication
>>>>     allow writing to the backing file
>>>>     Allow creating backup jobs when opening BDS
>>>>     block: Parse "backing_reference" option to reference existing BDS
>>>>     Backup: clear all bitmap when doing block checkpoint
>>>>     qcow2: support colo
>>>>     skip nbd_target when starting block replication
>>>>     Don't allow a disk use backing reference target
>>>>
>>>>    block.c                    | 242 +++++++++++++++++++++++-
>>>>    block/Makefile.objs        |   2 +-
>>>>    block/backup.c             |  12 ++
>>>>    block/nbd.c                | 171 +++++++++++++++--
>>>>    block/qcow2.c              | 447 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>    block/qcow2.h              |   6 +
>>>>    block/quorum.c             | 143 ++++++++++++++-
>>>>    docs/block-replication.txt | 147 +++++++++++++++
>>>>    include/block/block.h      |   5 +
>>>>    include/block/block_int.h  |  13 ++
>>>>    include/qemu/hbitmap.h     |   8 +
>>>>    qapi/block.json            |  16 ++
>>>>    tests/qemu-iotests/051     |  13 ++
>>>>    tests/qemu-iotests/051.out |  13 ++
>>>>    util/hbitmap.c             |  19 ++
>>>>    15 files changed, 1230 insertions(+), 27 deletions(-)
>>>>    create mode 100644 docs/block-replication.txt
>>>>
>>> When I try this patch with the MicroCheckpointing codebase, I'm having trouble starting up the secondary/backup VM
>>> using the branch "block-replication-v7"
>>>
>>> $ qemu-system-x86_64 -drive if=none,driver=raw,file=file.raw,id=nbd_target1 -drive if=virtio,driver=replication,mode=secondary,export=foo,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing_reference.drive_id=nbd_target1,file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,file.backing_reference.hidden-disk.driver=qcow2,file.backing_reference.hidden-disk.allow-write-backing-file=on
>>>
>>> Block format 'qcow2' used by device '' doesn't support the option 'backing_reference.hidden-disk.allow-write-backing-file'
>>>
>>> What am I doing wrong here?
>> The command is wrong. If you use the branch block-replication-v7, the command line should like:
>> -drive if=none,driver=raw,file=/data/images/kvm/suse/suse11_3.img,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=colo1
>>
>> Thanks
>> Wen Congyang
>>
>>> - Michael
>>>
>>> .
>>>
>>
> 
> Also, this command line you sent doesn't have the export option, but your wiki does. What's the difference?

You can get the codes from here:
https://github.com/coloft/qemu/tree/wency/colo_huawei_v7.2
And I have updated the wiki. Please setup your environment according to the wiki.

Thanks
Wen Congyang

> 
> - Michael
> 
> .
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2015-07-02  1:39 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
2015-03-25 15:38   ` [Qemu-devel] [Qemu-block] " Eric Blake
2015-03-26  8:58     ` Wen Congyang
2015-03-26 10:28       ` Gonglei
2015-03-26 12:30         ` Eric Blake
2015-03-26 12:46           ` Gonglei
2015-03-26  6:31   ` [Qemu-devel] " Fam Zheng
2015-03-26  7:17     ` Wen Congyang
2015-04-03  2:35     ` Wen Congyang
2015-04-03  5:19       ` Fam Zheng
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors Wen Congyang
2015-03-25 12:45   ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later Wen Congyang
2015-03-25 12:46   ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication Wen Congyang
2015-03-25 12:48   ` Paolo Bonzini
2015-03-25 15:43     ` Eric Blake
2015-03-26  7:12   ` Fam Zheng
2015-03-26  7:22     ` Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for " Wen Congyang
2015-03-25 12:50   ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: " Wen Congyang
2015-03-25 12:50   ` Paolo Bonzini
2015-03-26  7:21   ` Fam Zheng
2015-03-26  7:32     ` Wen Congyang
2015-03-27  1:06       ` Fam Zheng
2015-03-27  1:16         ` Wen Congyang
2015-03-27  7:34         ` [Qemu-devel] Use of QERR_ macros and error classes (was: [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication) Markus Armbruster
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 07/13] allow writing to the backing file Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS Wen Congyang
2015-03-26  7:07   ` Fam Zheng
2015-03-26  7:14     ` Wen Congyang
2015-03-26  7:18       ` Fam Zheng
2015-03-26  7:23         ` Wen Congyang
2015-03-26 13:53           ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
2015-03-26  7:31   ` Fam Zheng
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-03-25 12:55   ` Paolo Bonzini
2015-03-26  0:59     ` Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 11/13] qcow2: support colo Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication Wen Congyang
2015-03-26  7:03   ` Fam Zheng
2015-03-26  7:15     ` Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 13/13] Don't allow a disk use backing reference target Wen Congyang
2015-03-25 12:56 ` [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Paolo Bonzini
2015-03-25 14:24 ` Dr. David Alan Gilbert
2015-03-26  2:34   ` Gonglei
2015-07-01  3:09 ` Michael R. Hines
2015-07-01  4:11   ` Wen Congyang
2015-07-01 19:30     ` Michael R. Hines
2015-07-01 19:37     ` Michael R. Hines
2015-07-02  0:58       ` Wen Congyang
2015-07-02  1:43       ` Wen Congyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.