All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints
@ 2015-07-07  8:42 Wen Congyang
  2015-07-07  8:42 ` [Qemu-devel] [PATCH COLO-BLOCK v8 01/18] Add new block driver interface to add/delete a BDS's child Wen Congyang
                   ` (18 more replies)
  0 siblings, 19 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:42 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/coloft/qemu/tree/wency/block-replication-v8

You can get the patch with framework here:
https://github.com/coloft/qemu/tree/wency/colo_framework_v8

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Wen Congyang (18):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement block driver interfaces add/delete a BDS's child
  hmp: add monitor command to add/remove a child
  introduce a new API qemu_opts_absorb_qdict_by_index()
  quorum: allow ignoring child errors
  introduce a new API to enable/disable attach device model
  introduce a new API to check if blk is attached
  block: make bdrv_put_ref_bh_schedule() as a public API
  Backup: clear all bitmap when doing block checkpoint
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Allow references for backing files
  docs: block replication's description
  Add new block driver interfaces to control block replication
  skip nbd_target when starting block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c                        | 266 ++++++++++++++++++++++++-
 block/Makefile.objs            |   3 +-
 block/backup.c                 |  13 ++
 block/block-backend.c          |  33 +++
 block/quorum.c                 | 244 ++++++++++++++++++++++-
 block/replication.c            | 443 +++++++++++++++++++++++++++++++++++++++++
 blockdev.c                     |  90 ++++++---
 blockjob.c                     |  10 +
 docs/block-replication.txt     | 182 +++++++++++++++++
 hmp-commands.hx                |  28 +++
 include/block/block.h          |  15 ++
 include/block/block_int.h      |  19 ++
 include/block/blockjob.h       |  12 ++
 include/qemu/option.h          |   2 +
 include/sysemu/block-backend.h |   3 +
 include/sysemu/blockdev.h      |   2 +
 qapi/block.json                |  16 ++
 util/qemu-option.c             |  44 ++++
 18 files changed, 1378 insertions(+), 47 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.4.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 01/18] Add new block driver interface to add/delete a BDS's child
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
@ 2015-07-07  8:42 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces " Wen Congyang
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:42 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c                   | 39 +++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  4 ++++
 include/block/block_int.h |  5 +++++
 3 files changed, 48 insertions(+)

diff --git a/block.c b/block.c
index 7e130cc..2cbc4f9 100644
--- a/block.c
+++ b/block.c
@@ -4195,3 +4195,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
     return &bs->stats;
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *bs, QDict *options, Error **errp)
+{
+
+    if (!bs->drv || !bs->drv->bdrv_add_child) {
+        error_setg(errp, "this feature or command is not currently supported");
+        return;
+    }
+
+    bs->drv->bdrv_add_child(bs, options, errp);
+}
+
+void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+                    Error **errp)
+{
+    BdrvChild *child;
+
+    if (!bs->drv || !bs->drv->bdrv_del_child) {
+        error_setg(errp, "this feature or command is not currently supported");
+        return;
+    }
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        if (child->bs == child_bs) {
+            break;
+        }
+    }
+
+    if (!child) {
+        error_setg(errp, "Invalid child");
+        return;
+    }
+
+    bs->drv->bdrv_del_child(bs, child_bs, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index 06e4137..29d3363 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -609,4 +609,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *bs, QDict *options, Error **errp);
+void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child,
+                    Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8996baf..6fddea4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -288,6 +288,11 @@ struct BlockDriver {
      */
     int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+    void (*bdrv_add_child)(BlockDriverState *bs, QDict *options,
+                           Error **errp);
+    void (*bdrv_del_child)(BlockDriverState *bs, BlockDriverState *child,
+                           Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces add/delete a BDS's child
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
  2015-07-07  8:42 ` [Qemu-devel] [PATCH COLO-BLOCK v8 01/18] Add new block driver interface to add/delete a BDS's child Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-08-05 12:19   ` Alberto Garcia
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 03/18] hmp: add monitor command to add/remove a child Wen Congyang
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Alberto Garcia, qemu block, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Michael R. Hines, Gonglei,
	Yang Hongyang, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index b0eead0..76b29b1 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -67,6 +67,9 @@ typedef struct QuorumVotes {
 typedef struct BDRVQuorumState {
     BlockDriverState **bs; /* children BlockDriverStates */
     int num_children;      /* children count */
+    int max_children;      /* The maximum children count, we need to reallocate
+                            * bs if num_children will larger than maximum.
+                            */
     int threshold;         /* if less than threshold children reads gave the
                             * same result a quorum error occurs.
                             */
@@ -879,9 +882,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
         ret = -EINVAL;
         goto exit;
     }
-    if (s->num_children < 2) {
+    if (s->num_children < 1) {
         error_setg(&local_err,
-                   "Number of provided children must be greater than 1");
+                   "Number of provided children must be 1 or more");
         ret = -EINVAL;
         goto exit;
     }
@@ -930,6 +933,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     /* allocate the children BlockDriverState array */
     s->bs = g_new0(BlockDriverState *, s->num_children);
     opened = g_new0(bool, s->num_children);
+    s->max_children = s->num_children;
 
     for (i = 0; i < s->num_children; i++) {
         char indexstr[32];
@@ -1000,6 +1004,68 @@ static void quorum_attach_aio_context(BlockDriverState *bs,
     }
 }
 
+static void quorum_add_child(BlockDriverState *bs, QDict *options, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int ret;
+    Error *local_err = NULL;
+
+    bdrv_drain(bs);
+
+    if (s->num_children == s->max_children) {
+        if (s->max_children >= INT_MAX) {
+            error_setg(errp, "Too many children");
+            return;
+        }
+
+        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
+        s->bs[s->num_children] = NULL;
+        s->max_children += 1;
+    }
+
+    ret = bdrv_open_image(&s->bs[s->num_children], NULL, options, "child", bs,
+                          &child_format, false, &local_err);
+    if (ret < 0) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    s->num_children++;
+}
+
+static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+                             Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        if (s->bs[i] == child_bs) {
+            break;
+        }
+    }
+
+    if (i == s->num_children) {
+        error_setg(errp, "Invalid child");
+        return;
+    }
+
+    if (s->num_children <= s->threshold) {
+        error_setg(errp, "Cannot remove any more child");
+        return;
+    }
+
+    if (s->num_children == 1) {
+        error_setg(errp, "Cannot remove the last child");
+        return;
+    }
+
+    bdrv_drain(bs);
+    /* We can safe remove this child now */
+    memmove(&s->bs[i], &s->bs[i+1], (s->num_children - i - 1) * sizeof(void *));
+    s->num_children--;
+    s->bs[s->num_children] = NULL;
+}
+
 static void quorum_refresh_filename(BlockDriverState *bs)
 {
     BDRVQuorumState *s = bs->opaque;
@@ -1054,6 +1120,9 @@ static BlockDriver bdrv_quorum = {
     .bdrv_detach_aio_context            = quorum_detach_aio_context,
     .bdrv_attach_aio_context            = quorum_attach_aio_context,
 
+    .bdrv_add_child                     = quorum_add_child,
+    .bdrv_del_child                     = quorum_del_child,
+
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
 };
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 03/18] hmp: add monitor command to add/remove a child
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
  2015-07-07  8:42 ` [Qemu-devel] [PATCH COLO-BLOCK v8 01/18] Add new block driver interface to add/delete a BDS's child Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces " Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 04/18] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 blockdev.c                | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 hmp-commands.hx           | 28 +++++++++++++++++++++++++
 include/sysemu/blockdev.h |  2 ++
 3 files changed, 83 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index c11611d..e5f7779 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2186,6 +2186,59 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
     aio_context_release(aio_context);
 }
 
+void hmp_child_add(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+    const char *optstr = qdict_get_str(qdict, "opts");
+    QemuOpts *opts;
+    QDict *bs_opts = qdict_new();
+    BlockDriverState *bs;
+    Error *local_err = NULL;
+
+    opts = drive_def(optstr);
+    if (!opts) {
+        /* We have reported error in drive_def */
+        return;
+    }
+    bs_opts = qemu_opts_to_qdict(opts, bs_opts);
+
+    bs = bdrv_lookup_bs(id, id, &local_err);
+    if (!bs) {
+        error_report_err(local_err);
+        return;
+    }
+
+    bdrv_add_child(bs, bs_opts, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+}
+
+void hmp_child_del(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+    const char *child_id = qdict_get_str(qdict, "child");
+    BlockDriverState *bs, *child_bs;
+    Error *local_err = NULL;
+
+    bs = bdrv_lookup_bs(id, id, &local_err);
+    if (!bs) {
+        error_report_err(local_err);
+        return;
+    }
+
+    child_bs = bdrv_lookup_bs(child_id, child_id, &local_err);
+    if (!child_bs) {
+        error_report_err(local_err);
+        return;
+    }
+
+    bdrv_del_child(bs, child_bs, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+}
+
 void qmp_block_resize(bool has_device, const char *device,
                       bool has_node_name, const char *node_name,
                       int64_t size, Error **errp)
diff --git a/hmp-commands.hx b/hmp-commands.hx
index d3b7932..1d5b392 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -193,6 +193,34 @@ actions (drive options rerror, werror).
 ETEXI
 
     {
+        .name       = "child_add",
+        .args_type  = "id:B,opts:s",
+        .params     = "device child.file=file",
+        .help       = "add a child to a BDS",
+        .mhandler.cmd = hmp_child_add,
+    },
+
+STEXI
+@item child_add @var{device} @var{options}
+@findex child_add
+Add a child to the block device.
+ETEXI
+
+    {
+        .name       = "child_del",
+        .args_type  = "id:B,child:B",
+        .params     = "parent child",
+        .help       = "remove a child from a BDS",
+        .mhandler.cmd = hmp_child_del,
+    },
+
+STEXI
+@item child_del @var{parent device} @var{child device}
+@findex child_del
+Remove a child from the parent device.
+ETEXI
+
+    {
         .name       = "change",
         .args_type  = "device:B,target:F,arg:s?",
         .params     = "device filename [format]",
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 3104150..594bfab 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -67,4 +67,6 @@ void qmp_change_blockdev(const char *device, const char *filename,
                          const char *format, Error **errp);
 void hmp_commit(Monitor *mon, const QDict *qdict);
 void hmp_drive_del(Monitor *mon, const QDict *qdict);
+void hmp_child_add(Monitor *mon, const QDict *qdict);
+void hmp_child_del(Monitor *mon, const QDict *qdict);
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 04/18] introduce a new API qemu_opts_absorb_qdict_by_index()
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (2 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 03/18] hmp: add monitor command to add/remove a child Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 05/18] quorum: allow ignoring child errors Wen Congyang
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index 57e51c9..725a781 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -129,6 +129,8 @@ QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
                                Error **errp);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+                                     const char *index, Error **errp);
 
 typedef int (*qemu_opts_loopfunc)(void *opaque, QemuOpts *opts, Error **errp);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func,
diff --git a/util/qemu-option.c b/util/qemu-option.c
index efe9d27..a93a269 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1021,6 +1021,50 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp)
 }
 
 /*
+ * Adds all QDict entries to the QemuOpts that can be added and removes them
+ * from the QDict. The key starts with "%index." in the %qdict. When this
+ * function returns, the QDict contains only those entries that couldn't be
+ * added to the QemuOpts.
+ */
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+                                     const char *index, Error **errp)
+{
+    const QDictEntry *entry, *next;
+    const char *key;
+    int len = strlen(index);
+
+    entry = qdict_first(qdict);
+
+    while (entry != NULL) {
+        Error *local_err = NULL;
+        OptsFromQDictState state = {
+            .errp = &local_err,
+            .opts = opts,
+        };
+
+        next = qdict_next(qdict, entry);
+        if (strncmp(entry->key, index, len) || *(entry->key + len) != '.') {
+            entry = next;
+            continue;
+        }
+
+        key = entry->key + len + 1;
+
+        if (find_desc_by_name(opts->list->desc, key)) {
+            qemu_opts_from_qdict_1(key, entry->value, &state);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                return;
+            } else {
+                qdict_del(qdict, entry->key);
+            }
+        }
+
+        entry = next;
+    }
+}
+
+/*
  * Convert from QemuOpts to QDict.
  * The QDict values are of type QString.
  * TODO We'll want to use types appropriate for opt->desc->type, but
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 05/18] quorum: allow ignoring child errors
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (3 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 04/18] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 06/18] introduce a new API to enable/disable attach device model Wen Congyang
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Alberto Garcia, qemu block, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Michael R. Hines, Gonglei,
	Yang Hongyang, zhanghailiang

If the child is not ready, read/write/getlength/flush will
return -errno. It is not critical error, and can be ignored:
1. read/write:
   Just not report the error event.
2. getlength:
   just ignore it. If all children's getlength return -errno,
   and be ignored, return -EIO.
3. flush:
   Just ignore it. If all children's getlength return -errno,
   and be ignored, return 0.

Usage: children.x.ignore-errors=true

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 87 insertions(+), 7 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 76b29b1..2a45d0e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -31,6 +31,7 @@
 #define QUORUM_OPT_BLKVERIFY      "blkverify"
 #define QUORUM_OPT_REWRITE        "rewrite-corrupted"
 #define QUORUM_OPT_READ_PATTERN   "read-pattern"
+#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   "ignore-errors"
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -66,6 +67,7 @@ typedef struct QuorumVotes {
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
     BlockDriverState **bs; /* children BlockDriverStates */
+    bool *ignore_errors;   /* ignore children's error? */
     int num_children;      /* children count */
     int max_children;      /* The maximum children count, we need to reallocate
                             * bs if num_children will larger than maximum.
@@ -101,6 +103,7 @@ typedef struct QuorumChildRequest {
     uint8_t *buf;
     int ret;
     QuorumAIOCB *parent;
+    int index;
 } QuorumChildRequest;
 
 /* Quorum will use the following structure to track progress of each read/write
@@ -213,6 +216,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
         acb->qcrs[i].buf = NULL;
         acb->qcrs[i].ret = 0;
         acb->qcrs[i].parent = acb;
+        acb->qcrs[i].index = i;
     }
 
     return acb;
@@ -306,7 +310,7 @@ static void quorum_aio_cb(void *opaque, int ret)
     acb->count++;
     if (ret == 0) {
         acb->success_count++;
-    } else {
+    } else if (!s->ignore_errors[sacb->index]) {
         quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
     }
     assert(acb->count <= s->num_children);
@@ -721,19 +725,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
     BDRVQuorumState *s = bs->opaque;
-    int64_t result;
+    int64_t result = -EIO;
     int i;
 
     /* check that all file have the same length */
-    result = bdrv_getlength(s->bs[0]);
-    if (result < 0) {
-        return result;
-    }
-    for (i = 1; i < s->num_children; i++) {
+    for (i = 0; i < s->num_children; i++) {
         int64_t value = bdrv_getlength(s->bs[i]);
+
         if (value < 0) {
             return value;
         }
+
+        if (value == 0 && s->ignore_errors[i]) {
+            /*
+             * If the child is not ready, it cannot return -errno,
+             * otherwise refresh_total_sectors() will fail when
+             * we open the child.
+             */
+            continue;
+        }
+
+        if (result == -EIO) {
+            result = value;
+            continue;
+        }
+
         if (value != result) {
             return -EIO;
         }
@@ -771,6 +787,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
 
     for (i = 0; i < s->num_children; i++) {
         result = bdrv_co_flush(s->bs[i]);
+        if (result < 0 && s->ignore_errors[i]) {
+            result = 0;
+        }
         result_value.l = result;
         quorum_count_vote(&error_votes, &result_value, i);
     }
@@ -845,6 +864,19 @@ static QemuOptsList quorum_runtime_opts = {
     },
 };
 
+static QemuOptsList quorum_children_common_opts = {
+    .name = "quorum children",
+    .head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
+    .desc = {
+        {
+            .name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+            .type = QEMU_OPT_BOOL,
+            .help = "ignore child I/O error",
+        },
+        { /* end of list */ }
+    },
+};
+
 static int parse_read_pattern(const char *opt)
 {
     int i;
@@ -863,6 +895,37 @@ static int parse_read_pattern(const char *opt)
     return -EINVAL;
 }
 
+static int parse_children_options(BDRVQuorumState *s, QDict *options,
+                                  const char *indexstr, int index,
+                                  Error **errp)
+{
+    QemuOpts *children_opts = NULL;
+    Error *local_err = NULL;
+    int ret = 0;
+    bool value;
+
+    children_opts = qemu_opts_create(&quorum_children_common_opts, NULL, 0,
+                                     &error_abort);
+    qemu_opts_absorb_qdict_by_index(children_opts, options, indexstr,
+                                    &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    value = qemu_opt_get_bool(children_opts, QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+                              false);
+    s->ignore_errors[index] = value;
+
+out:
+    qemu_opts_del(children_opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
 static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
                        Error **errp)
 {
@@ -934,12 +997,18 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     s->bs = g_new0(BlockDriverState *, s->num_children);
     opened = g_new0(bool, s->num_children);
     s->max_children = s->num_children;
+    s->ignore_errors = g_new0(bool, s->num_children);
 
     for (i = 0; i < s->num_children; i++) {
         char indexstr[32];
         ret = snprintf(indexstr, 32, "children.%d", i);
         assert(ret < 32);
 
+        ret = parse_children_options(s, options, indexstr, i, &local_err);
+        if (ret < 0) {
+            goto close_exit;
+        }
+
         ret = bdrv_open_image(&s->bs[i], NULL, options, indexstr, bs,
                               &child_format, false, &local_err);
         if (ret < 0) {
@@ -981,6 +1050,7 @@ static void quorum_close(BlockDriverState *bs)
     }
 
     g_free(s->bs);
+    g_free(s->ignore_errors);
 }
 
 static void quorum_detach_aio_context(BlockDriverState *bs)
@@ -1019,10 +1089,18 @@ static void quorum_add_child(BlockDriverState *bs, QDict *options, Error **errp)
         }
 
         s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
+        s->ignore_errors = g_renew(bool, s->ignore_errors, s->max_children + 1);
         s->bs[s->num_children] = NULL;
         s->max_children += 1;
     }
 
+    ret = parse_children_options(s, options, "child", s->num_children,
+                                 &local_err);
+    if (ret < 0) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     ret = bdrv_open_image(&s->bs[s->num_children], NULL, options, "child", bs,
                           &child_format, false, &local_err);
     if (ret < 0) {
@@ -1062,6 +1140,8 @@ static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
     bdrv_drain(bs);
     /* We can safe remove this child now */
     memmove(&s->bs[i], &s->bs[i+1], (s->num_children - i - 1) * sizeof(void *));
+    memmove(&s->ignore_errors[i], &s->ignore_errors[i+1],
+            (s->num_children - i - 1) * sizeof(bool));
     s->num_children--;
     s->bs[s->num_children] = NULL;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 06/18] introduce a new API to enable/disable attach device model
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (4 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 05/18] quorum: allow ignoring child errors Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 07/18] introduce a new API to check if blk is attached Wen Congyang
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block/block-backend.c          | 24 ++++++++++++++++++++++++
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index aee8a12..72d8b2c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -344,6 +344,30 @@ void *blk_get_attached_dev(BlockBackend *blk)
 }
 
 /*
+ * Disable to attach a device mode to @blk.
+ * Return 0 on success, -EBUSY when a device model is attached already.
+ */
+int blk_disable_attach_dev(BlockBackend *blk)
+{
+    if (blk->dev) {
+        return blk->dev == (void *)-1 ? 0 : -EBUSY;
+    }
+
+    blk->dev = (void *)-1;
+    return 0;
+}
+
+/*
+ * Enable to attach a device mode to @blk.
+ */
+void blk_enable_attach_dev(BlockBackend *blk)
+{
+    if (blk->dev == (void *)-1) {
+        blk->dev = NULL;
+    }
+}
+
+/*
  * Set @blk's device model callbacks to @ops.
  * @opaque is the opaque argument to pass to the callbacks.
  * This is for use by device models.
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8fc960f..7619a9f 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -80,6 +80,8 @@ int blk_attach_dev(BlockBackend *blk, void *dev);
 void blk_attach_dev_nofail(BlockBackend *blk, void *dev);
 void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
+int blk_disable_attach_dev(BlockBackend *blk);
+void blk_enable_attach_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
              int nb_sectors);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 07/18] introduce a new API to check if blk is attached
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (5 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 06/18] introduce a new API to enable/disable attach device model Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 08/18] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block.c                        | 4 ++--
 block/block-backend.c          | 9 +++++++++
 include/sysemu/block-backend.h | 1 +
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 2cbc4f9..d560248 100644
--- a/block.c
+++ b/block.c
@@ -2032,7 +2032,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     }
 
     /* bs_new must be unattached and shouldn't have anything fancy enabled */
-    assert(!bs_new->blk);
+    assert(!blk_is_attached(bs_new->blk));
     assert(QLIST_EMPTY(&bs_new->dirty_bitmaps));
     assert(bs_new->job == NULL);
     assert(bs_new->io_limits_enabled == false);
@@ -2049,7 +2049,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     bdrv_move_feature_fields(bs_new, &tmp);
 
     /* bs_new must remain unattached */
-    assert(!bs_new->blk);
+    assert(!blk_is_attached(bs_new->blk));
 
     /* Check a few fields that should remain attached to the device */
     assert(bs_new->job == NULL);
diff --git a/block/block-backend.c b/block/block-backend.c
index 72d8b2c..1463c37 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -368,6 +368,15 @@ void blk_enable_attach_dev(BlockBackend *blk)
 }
 
 /*
+ * Return true if a device model is attached to @blk already,
+ * otherwise, return false.
+ */
+bool blk_is_attached(BlockBackend *blk)
+{
+    return blk != NULL && blk->dev != NULL && blk->dev != (void *)-1;
+}
+
+/*
  * Set @blk's device model callbacks to @ops.
  * @opaque is the opaque argument to pass to the callbacks.
  * This is for use by device models.
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7619a9f..a8c6fd2 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -82,6 +82,7 @@ void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
 int blk_disable_attach_dev(BlockBackend *blk);
 void blk_enable_attach_dev(BlockBackend *blk);
+bool blk_is_attached(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
              int nb_sectors);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 08/18] block: make bdrv_put_ref_bh_schedule() as a public API
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (6 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 07/18] introduce a new API to check if blk is attached Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint Wen Congyang
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block.c               | 25 +++++++++++++++++++++++++
 blockdev.c            | 37 ++++++-------------------------------
 include/block/block.h |  1 +
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index d560248..f319921 100644
--- a/block.c
+++ b/block.c
@@ -3562,6 +3562,31 @@ void bdrv_unref(BlockDriverState *bs)
     }
 }
 
+typedef struct {
+    QEMUBH *bh;
+    BlockDriverState *bs;
+} BDRVPutRefBH;
+
+static void bdrv_put_ref_bh(void *opaque)
+{
+    BDRVPutRefBH *s = opaque;
+
+    bdrv_unref(s->bs);
+    qemu_bh_delete(s->bh);
+    g_free(s);
+}
+
+/* Release a BDS reference in a BH */
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
+{
+    BDRVPutRefBH *s;
+
+    s = g_new(BDRVPutRefBH, 1);
+    s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
+    s->bs = bs;
+    qemu_bh_schedule(s->bh);
+}
+
 struct BdrvOpBlocker {
     Error *reason;
     QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/blockdev.c b/blockdev.c
index e5f7779..7ad8401 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -276,37 +276,6 @@ static void bdrv_format_print(void *opaque, const char *name)
     error_printf(" %s", name);
 }
 
-typedef struct {
-    QEMUBH *bh;
-    BlockDriverState *bs;
-} BDRVPutRefBH;
-
-static void bdrv_put_ref_bh(void *opaque)
-{
-    BDRVPutRefBH *s = opaque;
-
-    bdrv_unref(s->bs);
-    qemu_bh_delete(s->bh);
-    g_free(s);
-}
-
-/*
- * Release a BDS reference in a BH
- *
- * It is not safe to use bdrv_unref() from a callback function when the callers
- * still need the BlockDriverState.  In such cases we schedule a BH to release
- * the reference.
- */
-static void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
-{
-    BDRVPutRefBH *s;
-
-    s = g_new(BDRVPutRefBH, 1);
-    s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
-    s->bs = bs;
-    qemu_bh_schedule(s->bh);
-}
-
 static int parse_block_error_action(const char *buf, bool is_read, Error **errp)
 {
     if (!strcmp(buf, "ignore")) {
@@ -2326,6 +2295,12 @@ static void block_job_cb(void *opaque, int ret)
         block_job_event_completed(bs->job, msg);
     }
 
+
+    /*
+     * It is not safe to use bdrv_unref() from a callback function when the
+     * callers still need the BlockDriverState. In such cases we schedule
+     * a BH to release the reference.
+     */
     bdrv_put_ref_bh_schedule(bs);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 29d3363..cbe79bc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -507,6 +507,7 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
 void bdrv_ref(BlockDriverState *bs);
 void bdrv_unref(BlockDriverState *bs);
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (7 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 08/18] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-09  1:28   ` Dr. David Alan Gilbert
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 10/18] allow writing to the backing file Wen Congyang
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jeff Cody, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Jeff Cody <jcody@redhat.com>
---
 block/backup.c           | 13 +++++++++++++
 blockjob.c               | 10 ++++++++++
 include/block/blockjob.h | 12 ++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index d3c7d9f..ebb8a88 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -211,11 +211,24 @@ static void backup_iostatus_reset(BlockJob *job)
     bdrv_iostatus_reset(s->target);
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "this feature or command is not currently supported");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
     .set_speed      = backup_set_speed,
     .iostatus_reset = backup_iostatus_reset,
+    .do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index ec46fad..cb412d1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -400,3 +400,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
     qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+    if (!job->driver->do_checkpoint) {
+        error_setg(errp, "this feature or command is not currently supported");
+        return;
+    }
+
+    job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..b832dc3 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
      * manually.
      */
     void (*complete)(BlockJob *job, Error **errp);
+
+    /** Optional callback for job types that support checkpoint. */
+    void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
                                   BlockJobDeferToMainLoopFn *fn,
                                   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 10/18] allow writing to the backing file
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (8 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 11/18] Allow creating backup jobs when opening BDS Wen Congyang
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index f319921..879ca75 100644
--- a/block.c
+++ b/block.c
@@ -747,6 +747,15 @@ static const BdrvChildRole child_backing = {
     .inherit_flags = bdrv_backing_flags,
 };
 
+static int bdrv_backing_rw_flags(int flags)
+{
+    return bdrv_backing_flags(flags) | BDRV_O_RDWR;
+}
+
+static const BdrvChildRole child_backing_rw = {
+    .inherit_flags = bdrv_backing_rw_flags,
+};
+
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
     int open_flags = flags | BDRV_O_CACHE_WB;
@@ -1133,6 +1142,20 @@ out:
     bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+    .name = "backing_file",
+    .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+    .desc = {
+        {
+            .name = ALLOW_WRITE_BACKING_FILE,
+            .type = QEMU_OPT_BOOL,
+            .help = "allow write to backing file",
+        },
+        { /* end of list */ }
+    },
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1147,6 +1170,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     int ret = 0;
     BlockDriverState *backing_hd;
     Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    bool child_rw = false;
+    const BdrvChildRole *child_role = NULL;
 
     if (bs->backing_hd != NULL) {
         QDECREF(options);
@@ -1159,6 +1185,18 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     }
 
     bs->open_flags &= ~BDRV_O_NO_BACKING;
+
+    opts = qemu_opts_create(&backing_file_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        error_propagate(errp, local_err);
+        QDECREF(options);
+        goto free_exit;
+    }
+    child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+    child_role = child_rw ? &child_backing_rw : &child_backing;
+
     if (qdict_haskey(options, "file.filename")) {
         backing_filename[0] = '\0';
     } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1191,7 +1229,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     assert(bs->backing_hd == NULL);
     ret = bdrv_open_inherit(&backing_hd,
                             *backing_filename ? backing_filename : NULL,
-                            NULL, options, 0, bs, &child_backing,
+                            NULL, options, 0, bs, child_role,
                             NULL, &local_err);
     if (ret < 0) {
         bdrv_unref(backing_hd);
@@ -1205,6 +1243,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     bdrv_set_backing_hd(bs, backing_hd);
 
 free_exit:
+    qemu_opts_del(opts);
     g_free(backing_filename);
     return ret;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 11/18] Allow creating backup jobs when opening BDS
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (9 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 10/18] allow writing to the backing file Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 12/18] block: Allow references for backing files Wen Congyang
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jeff Cody, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c34fd7c..f068666 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 12/18] block: Allow references for backing files
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (10 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 11/18] Allow creating backup jobs when opening BDS Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 13/18] docs: block replication's description Wen Congyang
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Usage:
-drive file=xxx,id=Y, \
-drive file=xxxx,id=X,backing.backing_reference=Y

It will create such backing chain:
               {virtio-blk dev 'Y'}      {virtio-blk dev 'X'}
                         |                         |
                         |                         |
                         v                         v

    [base] <- [mid] <- ( Y )  <----------------- ( X )

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c               | 39 +++++++++++++++++++++++++++++++++++----
 include/block/block.h |  1 +
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 879ca75..f7192a3 100644
--- a/block.c
+++ b/block.c
@@ -1143,6 +1143,7 @@ out:
 }
 
 #define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
+#define BACKING_REFERENCE           "backing_reference"
 static QemuOptsList backing_file_opts = {
     .name = "backing_file",
     .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
@@ -1152,6 +1153,11 @@ static QemuOptsList backing_file_opts = {
             .type = QEMU_OPT_BOOL,
             .help = "allow write to backing file",
         },
+        {
+            .name = BACKING_REFERENCE,
+            .type = QEMU_OPT_STRING,
+            .help = "reference to the exsiting BDS",
+        },
         { /* end of list */ }
     },
 };
@@ -1168,11 +1174,12 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
 {
     char *backing_filename = g_malloc0(PATH_MAX);
     int ret = 0;
-    BlockDriverState *backing_hd;
+    BlockDriverState *backing_hd = NULL;
     Error *local_err = NULL;
     QemuOpts *opts = NULL;
     bool child_rw = false;
     const BdrvChildRole *child_role = NULL;
+    const char *reference = NULL;
 
     if (bs->backing_hd != NULL) {
         QDECREF(options);
@@ -1195,9 +1202,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
         goto free_exit;
     }
     child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+    reference = qemu_opt_get(opts, BACKING_REFERENCE);
     child_role = child_rw ? &child_backing_rw : &child_backing;
 
-    if (qdict_haskey(options, "file.filename")) {
+    if (qdict_haskey(options, "file.filename") || reference) {
         backing_filename[0] = '\0';
     } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
         QDECREF(options);
@@ -1220,7 +1228,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
         goto free_exit;
     }
 
-    backing_hd = bdrv_new();
+    if (!reference) {
+        backing_hd = bdrv_new();
+    }
 
     if (bs->backing_format[0] != '\0' && !qdict_haskey(options, "driver")) {
         qdict_put(options, "driver", qstring_from_str(bs->backing_format));
@@ -1229,7 +1239,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     assert(bs->backing_hd == NULL);
     ret = bdrv_open_inherit(&backing_hd,
                             *backing_filename ? backing_filename : NULL,
-                            NULL, options, 0, bs, child_role,
+                            reference, options, 0, bs, child_role,
                             NULL, &local_err);
     if (ret < 0) {
         bdrv_unref(backing_hd);
@@ -1240,12 +1250,30 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
         error_free(local_err);
         goto free_exit;
     }
+    if (reference) {
+        if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+                               errp)) {
+            ret = -EBUSY;
+            goto free_reference_exit;
+        }
+        if (backing_hd->blk && blk_disable_attach_dev(backing_hd->blk)) {
+            error_setg(errp, "backing_hd %s is used by the other device model",
+                       reference);
+            ret = -EBUSY;
+            goto free_reference_exit;
+        }
+    }
     bdrv_set_backing_hd(bs, backing_hd);
 
 free_exit:
     qemu_opts_del(opts);
     g_free(backing_filename);
     return ret;
+
+free_reference_exit:
+    bdrv_unref(backing_hd);
+    bs->open_flags |= BDRV_O_NO_BACKING;
+    goto free_exit;
 }
 
 /*
@@ -1899,6 +1927,9 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->backing_hd) {
             BlockDriverState *backing_hd = bs->backing_hd;
             bdrv_set_backing_hd(bs, NULL);
+            if (backing_hd->blk) {
+                blk_enable_attach_dev(backing_hd->blk);
+            }
             bdrv_unref(backing_hd);
         }
         bs->drv->bdrv_close(bs);
diff --git a/include/block/block.h b/include/block/block.h
index cbe79bc..db52306 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -168,6 +168,7 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_RESIZE,
     BLOCK_OP_TYPE_STREAM,
     BLOCK_OP_TYPE_REPLACE,
+    BLOCK_OP_TYPE_BACKING_REFERENCE,
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 13/18] docs: block replication's description
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (11 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 12/18] block: Allow references for backing files Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 14/18] Add new block driver interfaces to control block replication Wen Congyang
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 docs/block-replication.txt | 182 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 0000000..13e004e
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,182 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content(it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary    2 filter
+     disk         ^                                                             virtio-blk
+                  |                                                                  ^
+                3 NBD  ------->  3 NBD                                               |
+                client    ||     server                                          2 filter
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The new block filter(the name is replication) will control the block
+replication.
+
+3) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+4) The disk on the secondary is represented by a custom block device
+(called active-disk). It should be an empty disk, and the format should
+support bdrv_make_empty() and backing file.
+
+5) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also be an empty disk, and
+the driver supports bdrv_make_empty() and backing file.
+
+== Failure Handling ==
+There are 6 internal errors when block replication is running:
+1. I/O error on primary disk
+2. Forwarding primary write requests failed
+3. Backup failed
+4. I/O error on secondary disk
+5. I/O error on active disk
+6. Making active disk or hidden disk empty failed
+In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
+4 and 6, we just report block replication's error to FT/HA manager(which
+decides when to do a new checkpoint, when to do failover).
+There is one internal error when doing failover:
+1. Commiting the data in active disk/hidden disk to secondary disk failed
+We just to report this error to FT/HA manager.
+
+== New block driver interface ==
+We add three block driver interfaces to control block replication:
+a. bdrv_start_replication()
+   Start block replication, called in migration/checkpoint thread.
+   We must call bdrv_start_replication() in secondary QEMU before
+   calling bdrv_start_replication() in primary QEMU. The caller
+   must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+b. bdrv_do_checkpoint()
+   This interface is called after all VM state is transferred to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. bdrv_stop_replication()
+   It is called on failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,id=disk1\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw,\
+
+  Run qmp command in primary qemu:
+    child_add disk1 child.driver=replication,child.mode=primary,\
+              child.file.host=xxx,child.file.port=xxx,\
+              child.file.driver=nbd,child.ignore-errors=on
+  Note:
+  1. There should be only one NBD Client for each primary disk.
+  2. host is the secondary physical machine's hostname or IP
+  3. Each disk must have its own export name.
+  4. It is all a single argument to -drive and child_add, and you should
+     ignore the leading whitespace.
+  5. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+Secondary:
+  -drive if=none,driver=raw,file=1.raw,id=colo1 \
+  -drive if=xxx,driver=replication,mode=secondary,\
+         file.file.filename=active_disk.qcow2,\
+         file.driver=qcow2,\
+         file.backing.file.filename=hidden_disk.qcow2,\
+         file.backing.driver=qcow2,\
+         file.backing.allow-write-backing-file=on,\
+         file.backing.backing.backing_reference=colo1\
+
+  Then run qmp command in secondary qemu:
+    nbd-server-start host:port
+    nbd-server-add -w colo1
+
+  Note:
+  1. The export name in secondary QEMU command line is the secondary
+     disk's id.
+  2. The export name for the same disk must be the same
+  3. The qmp command nbd-server-start and nbd-server-add must be run
+     before running the qmp command migrate on primary QEMU
+  4. Don't use nbd-server-start's other options
+  5. Active disk, hidden disk and nbd target's length should be the
+     same.
+  6. It is better to put active disk and hidden disk in ramdisk.
+  7. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 14/18] Add new block driver interfaces to control block replication
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (12 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 13/18] docs: block replication's description Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 15/18] skip nbd_target when starting " Wen Congyang
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Luiz Capitulino,
	Gonglei, Yang Hongyang, Michael Roth, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                   | 40 ++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  5 +++++
 include/block/block_int.h | 14 ++++++++++++++
 qapi/block.json           | 16 ++++++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index f7192a3..5778064 100644
--- a/block.c
+++ b/block.c
@@ -4329,3 +4329,43 @@ void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
 
     bs->drv->bdrv_del_child(bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_start_replication) {
+        drv->bdrv_start_replication(bs, mode, errp);
+    } else if (bs->file) {
+        bdrv_start_replication(bs->file, mode, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_do_checkpoint) {
+        drv->bdrv_do_checkpoint(bs, errp);
+    } else if (bs->file) {
+        bdrv_do_checkpoint(bs->file, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_stop_replication) {
+        drv->bdrv_stop_replication(bs, failover, errp);
+    } else if (bs->file) {
+        bdrv_stop_replication(bs->file, failover, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index db52306..1518ae8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -615,4 +615,9 @@ void bdrv_add_child(BlockDriverState *bs, QDict *options, Error **errp);
 void bdrv_del_child(BlockDriverState *bs, BlockDriverState *child,
                     Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6fddea4..296cba0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
     void (*bdrv_del_child)(BlockDriverState *bs, BlockDriverState *child,
                            Error **errp);
 
+    void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+                                   Error **errp);
+    /* Drop Disk buffer when doing checkpoint. */
+    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+    /*
+     * After failover, we should flush Disk buffer into secondary disk
+     * and stop block replication.
+     *
+     * If the guest is shutdown, we should drop Disk buffer and stop
+     * block representation.
+     */
+    void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+                                  Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index aad645c..04dc4c2 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @unprotected: Replication is not started or after failover.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'ReplicationMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 15/18] skip nbd_target when starting block replication
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (13 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 14/18] Add new block driver interfaces to control block replication Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 16/18] quorum: implement block driver interfaces for " Wen Congyang
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 5778064..0a6691e 100644
--- a/block.c
+++ b/block.c
@@ -4335,6 +4335,10 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
 {
     BlockDriver *drv = bs->drv;
 
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+        return;
+    }
+
     if (drv && drv->bdrv_start_replication) {
         drv->bdrv_start_replication(bs, mode, errp);
     } else if (bs->file) {
@@ -4348,6 +4352,10 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
 
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+        return;
+    }
+
     if (drv && drv->bdrv_do_checkpoint) {
         drv->bdrv_do_checkpoint(bs, errp);
     } else if (bs->file) {
@@ -4361,6 +4369,10 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
 {
     BlockDriver *drv = bs->drv;
 
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+        return;
+    }
+
     if (drv && drv->bdrv_stop_replication) {
         drv->bdrv_stop_replication(bs, failover, errp);
     } else if (bs->file) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 16/18] quorum: implement block driver interfaces for block replication
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (14 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 15/18] skip nbd_target when starting " Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 17/18] Implement new driver " Wen Congyang
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 2a45d0e..58238f7 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -88,6 +88,8 @@ typedef struct BDRVQuorumState {
                             */
 
     QuorumReadPattern read_pattern;
+
+    int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -1019,6 +1021,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     g_free(opened);
+    s->replication_index = -1;
     goto exit;
 
 close_exit:
@@ -1179,6 +1182,76 @@ static void quorum_refresh_filename(BlockDriverState *bs)
     bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                                     Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int count = 0, i, index;
+    Error *local_err = NULL;
+
+    /*
+     * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != REPLICATION_MODE_PRIMARY) {
+        error_setg(errp, "The replication mode for quorum should be 'primary'");
+        return;
+    }
+
+    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+        error_setg(errp, "Block replication needs read pattern 'fifo'");
+        return;
+    }
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_start_replication(s->bs[i], mode, &local_err);
+        if (local_err) {
+            error_free(local_err);
+            local_err = NULL;
+        } else {
+            count++;
+            index = i;
+        }
+    }
+
+    if (count == 0) {
+        error_setg(errp, "No child supports block replication");
+    } else if (count > 1) {
+        for (i = 0; i < s->num_children; i++) {
+            bdrv_stop_replication(s->bs[i], false, NULL);
+        }
+        error_setg(errp, "Too many children support block replication");
+    } else {
+        s->replication_index = index;
+    }
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    bdrv_do_checkpoint(s->bs[s->replication_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+                                    Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    bdrv_stop_replication(s->bs[s->replication_index], failover, errp);
+    s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name                        = "quorum",
     .protocol_name                      = "quorum",
@@ -1205,6 +1278,10 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+    .bdrv_start_replication             = quorum_start_replication,
+    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
+    .bdrv_stop_replication              = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 17/18] Implement new driver for block replication
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (15 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 16/18] quorum: implement block driver interfaces for " Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 18/18] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang
  2015-07-28  4:01 ` [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 443 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 444 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index f068666..84952b1 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 0000000..2124e2d
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,443 @@
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+    ReplicationMode mode;
+    int replication_state;
+    BlockDriverState *active_disk;
+    BlockDriverState *hidden_disk;
+    BlockDriverState *secondary_disk; /* nbd target */
+    int error;
+} BDRVReplicationState;
+
+enum {
+    BLOCK_REPLICATION_NONE,     /* block replication is not started */
+    BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+    BLOCK_REPLICATION_DONE,     /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp);
+
+#define REPLICATION_MODE        "mode"
+static QemuOptsList replication_runtime_opts = {
+    .name = "replication",
+    .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+    .desc = {
+        {
+            .name = REPLICATION_MODE,
+            .type = QEMU_OPT_STRING,
+        },
+        { /* end of list */ }
+    },
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+                            int flags, Error **errp)
+{
+    int ret;
+    BDRVReplicationState *s = bs->opaque;;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    const char *mode;
+
+    ret = -EINVAL;
+    opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    mode = qemu_opt_get(opts, REPLICATION_MODE);
+    if (!mode) {
+        error_setg(&local_err, "Missing the option mode");
+        goto fail;
+    }
+
+    if (!strcmp(mode, "primary")) {
+        s->mode = REPLICATION_MODE_PRIMARY;
+    } else if (!strcmp(mode, "secondary")) {
+        s->mode = REPLICATION_MODE_SECONDARY;
+    } else {
+        error_setg(&local_err,
+                   "The option mode's value should be primary or secondary");
+        goto fail;
+    }
+
+    return 0;
+
+fail:
+    qemu_opts_del(opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+        replication_stop(bs, false, NULL);
+    }
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+    switch (s->replication_state) {
+    case BLOCK_REPLICATION_NONE:
+        return -EIO;
+    case BLOCK_REPLICATION_RUNNING:
+        return 0;
+    case BLOCK_REPLICATION_DONE:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+    default:
+        abort();
+    }
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        return ret;
+    }
+
+    if (ret < 0) {
+        s->error = ret;
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        /* We only use it to forward primary write requests */
+        return -EIO;
+    }
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * After failover, because we don't commit active disk/hidden disk
+     * to secondary disk(nbd target), so we should read from active disk
+     * directly.
+     */
+    ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors, qiov);
+    return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int remaining_sectors,
+                                              QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    uint64_t bytes_done = 0;
+    BlockDriverState *top = bs->file;
+    BlockDriverState *base = s->secondary_disk;
+    BlockDriverState *target;
+    int ret, n;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 0) {
+        ret = bdrv_co_writev(bs->file, sector_num, remaining_sectors, qiov);
+        return replication_return_value(s, ret);
+    }
+
+    /*
+     * Only write to active disk if the sectors have
+     * already been allocated in active disk/hidden disk.
+     */
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    while (remaining_sectors > 0) {
+        ret = bdrv_is_allocated_above(top, base, sector_num,
+                                      remaining_sectors, &n);
+        if (ret < 0) {
+            return ret;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512);
+
+        target = ret ? top: base;
+        ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
+        if (ret < 0) {
+            return ret;
+        }
+
+        remaining_sectors -= n;
+        sector_num += n;
+        bytes_done += n * BDRV_SECTOR_SIZE;
+    }
+
+    return 0;
+}
+
+static coroutine_fn int replication_co_discard(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 1) {
+        /* It is secondary qemu and we are after failover */
+        ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    ret = bdrv_co_discard(bs->file, sector_num, nb_sectors);
+    return replication_return_value(s, ret);
+}
+
+static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                    BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file, candidate);
+}
+
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->job) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    block_job_do_checkpoint(s->secondary_disk->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = s->hidden_disk->drv->bdrv_make_empty(s->hidden_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+static void backup_job_completed(void *opaque, int ret)
+{
+    BDRVReplicationState *s = opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_DONE) {
+        /* The backup job is cancelled unexpectedly */
+        s->error = -EIO;
+    }
+
+    bdrv_put_ref_bh_schedule(s->secondary_disk);
+}
+
+static void replication_start(BlockDriverState *bs, ReplicationMode mode,
+                              Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int64_t active_length, hidden_length, nbd_length;
+    AioContext *aio_context;
+    Error *local_err = NULL;
+
+    if (s->replication_state != BLOCK_REPLICATION_NONE) {
+        error_setg(errp, "Block replication is running or done");
+        return;
+    }
+
+    if (s->mode != mode) {
+        error_setg(errp, "Invalid parameter 'mode'");
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        s->active_disk = bs->file;
+        if (!bs->file->backing_hd) {
+            error_setg(errp, "Active disk doesn't have backing file");
+            return;
+        }
+
+        s->hidden_disk = s->active_disk->backing_hd;
+        if (!s->hidden_disk->backing_hd) {
+            error_setg(errp, "Hidden disk doesn't have backing file");
+            return;
+        }
+
+        s->secondary_disk = s->hidden_disk->backing_hd;
+        if (!s->secondary_disk->blk) {
+            error_setg(errp, "The secondary disk doesn't have block backend");
+            return;
+        }
+
+        /* verify the length */
+        active_length = bdrv_getlength(s->active_disk);
+        hidden_length = bdrv_getlength(s->hidden_disk);
+        nbd_length = bdrv_getlength(s->secondary_disk);
+        if (active_length < 0 || hidden_length < 0 || nbd_length < 0 ||
+            active_length != hidden_length || hidden_length != nbd_length) {
+            error_setg(errp, "active disk, hidden disk, nbd target's length"
+                       " are not the same");
+            return;
+        }
+
+        if (!s->active_disk->drv->bdrv_make_empty ||
+            !s->hidden_disk->drv->bdrv_make_empty) {
+            error_setg(errp,
+                       "active disk or hidden disk doesn't support make_empty");
+            return;
+        }
+
+        /* start backup job now */
+        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+                        s->active_disk->backing_blocker);
+        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                        s->hidden_disk->backing_blocker);
+        bdrv_ref(s->hidden_disk);
+
+        aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+        bdrv_set_aio_context(s->secondary_disk, aio_context);
+        backup_start(s->secondary_disk, s->hidden_disk, 0,
+                     MIRROR_SYNC_MODE_NONE, NULL, BLOCKDEV_ON_ERROR_REPORT,
+                     BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
+                     s, &local_err);
+        aio_context_release(aio_context);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            bdrv_unref(s->hidden_disk);
+            return;
+        }
+        break;
+    default:
+        abort();
+    }
+
+    s->replication_state = BLOCK_REPLICATION_RUNNING;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+
+    s->error = 0;
+}
+
+static void replication_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    if (s->error) {
+        error_setg(errp, "I/O error occurs");
+        return;
+    }
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+}
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    s->replication_state = BLOCK_REPLICATION_DONE;
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        if (!failover) {
+            secondary_do_checkpoint(s, errp);
+            return;
+        }
+
+        if (s->secondary_disk->job) {
+            block_job_cancel(s->secondary_disk->job);
+        }
+        break;
+    default:
+        abort();
+    }
+}
+
+BlockDriver bdrv_replication = {
+    .format_name                = "replication",
+    .protocol_name              = "replication",
+    .instance_size              = sizeof(BDRVReplicationState),
+
+    .bdrv_open                  = replication_open,
+    .bdrv_close                 = replication_close,
+
+    .bdrv_getlength             = replication_getlength,
+    .bdrv_co_readv              = replication_co_readv,
+    .bdrv_co_writev             = replication_co_writev,
+    .bdrv_co_discard            = replication_co_discard,
+
+    .is_filter                  = true,
+    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
+
+    .bdrv_start_replication     = replication_start,
+    .bdrv_do_checkpoint         = replication_do_checkpoint,
+    .bdrv_stop_replication      = replication_stop,
+
+    .has_variable_length        = true,
+};
+
+static void bdrv_replication_init(void)
+{
+    bdrv_register(&bdrv_replication);
+}
+
+block_init(bdrv_replication_init);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH COLO-BLOCK v8 18/18] Add a new API to start/stop replication, do checkpoint to all BDSes
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (16 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 17/18] Implement new driver " Wen Congyang
@ 2015-07-07  8:43 ` Wen Congyang
  2015-07-28  4:01 ` [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-07  8:43 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c               | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |  4 +++
 2 files changed, 72 insertions(+)

diff --git a/block.c b/block.c
index 0a6691e..43d175b 100644
--- a/block.c
+++ b/block.c
@@ -4381,3 +4381,71 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
         error_setg(errp, "this feature or command is not currently supported");
     }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+    BlockDriverState *bs = NULL, *temp = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_start_replication(bs, mode, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            goto fail;
+        }
+    }
+
+    return;
+
+fail:
+    while ((temp = bdrv_next(temp)) && bs != temp) {
+        bdrv_stop_replication(temp, false, NULL);
+    }
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_do_checkpoint(bs, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_stop_replication(bs, failover, &local_err);
+        if (!errp) {
+            /*
+             * The caller doesn't care the result, they just
+             * want to stop all block's replication.
+             */
+            continue;
+        }
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1518ae8..e1251bd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -620,4 +620,8 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-07-09  1:28   ` Dr. David Alan Gilbert
  2015-07-09  1:41     ` Wen Congyang
  0 siblings, 1 reply; 24+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-09  1:28 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Jeff Cody <jcody@redhat.com>
> ---
>  block/backup.c           | 13 +++++++++++++
>  blockjob.c               | 10 ++++++++++
>  include/block/blockjob.h | 12 ++++++++++++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/block/backup.c b/block/backup.c
> index d3c7d9f..ebb8a88 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -211,11 +211,24 @@ static void backup_iostatus_reset(BlockJob *job)
>      bdrv_iostatus_reset(s->target);
>  }
>  
> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
> +{
> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> +
> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
> +        error_setg(errp, "this feature or command is not currently supported");

You use this text in a few different errors in the block code; we've currently
got one test machine which is producing it and we haven't yet figured out
which one of the exit paths is doing it.  Please make each one unique and state
an identifier; e.g. "checkpoint on block backup in sync ... for blockjob ..."
(I'm not sure what exaclty makes sense for block jobs - but something like that).

Dave
  
> +        return;
> +    }
> +
> +    hbitmap_reset_all(backup_job->bitmap);
> +}
> +
>  static const BlockJobDriver backup_job_driver = {
>      .instance_size  = sizeof(BackupBlockJob),
>      .job_type       = BLOCK_JOB_TYPE_BACKUP,
>      .set_speed      = backup_set_speed,
>      .iostatus_reset = backup_iostatus_reset,
> +    .do_checkpoint  = backup_do_checkpoint,
>  };
>  
>  static BlockErrorAction backup_error_action(BackupBlockJob *job,
> diff --git a/blockjob.c b/blockjob.c
> index ec46fad..cb412d1 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -400,3 +400,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
>  
>      qemu_bh_schedule(data->bh);
>  }
> +
> +void block_job_do_checkpoint(BlockJob *job, Error **errp)
> +{
> +    if (!job->driver->do_checkpoint) {
> +        error_setg(errp, "this feature or command is not currently supported");
> +        return;
> +    }
> +
> +    job->driver->do_checkpoint(job, errp);
> +}
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 57d8ef1..b832dc3 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
>       * manually.
>       */
>      void (*complete)(BlockJob *job, Error **errp);
> +
> +    /** Optional callback for job types that support checkpoint. */
> +    void (*do_checkpoint)(BlockJob *job, Error **errp);
>  } BlockJobDriver;
>  
>  /**
> @@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
>                                    BlockJobDeferToMainLoopFn *fn,
>                                    void *opaque);
>  
> +/**
> + * block_job_do_checkpoint:
> + * @job: The job.
> + * @errp: Error object.
> + *
> + * Do block checkpoint on the specified job.
> + */
> +void block_job_do_checkpoint(BlockJob *job, Error **errp);
> +
>  #endif
> -- 
> 2.4.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint
  2015-07-09  1:28   ` Dr. David Alan Gilbert
@ 2015-07-09  1:41     ` Wen Congyang
  0 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-09  1:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jeff Cody,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang

On 07/09/2015 09:28 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Jeff Cody <jcody@redhat.com>
>> ---
>>  block/backup.c           | 13 +++++++++++++
>>  blockjob.c               | 10 ++++++++++
>>  include/block/blockjob.h | 12 ++++++++++++
>>  3 files changed, 35 insertions(+)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index d3c7d9f..ebb8a88 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -211,11 +211,24 @@ static void backup_iostatus_reset(BlockJob *job)
>>      bdrv_iostatus_reset(s->target);
>>  }
>>  
>> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
>> +{
>> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
>> +
>> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
>> +        error_setg(errp, "this feature or command is not currently supported");
> 
> You use this text in a few different errors in the block code; we've currently
> got one test machine which is producing it and we haven't yet figured out
> which one of the exit paths is doing it.  Please make each one unique and state
> an identifier; e.g. "checkpoint on block backup in sync ... for blockjob ..."
> (I'm not sure what exaclty makes sense for block jobs - but something like that).

OK, I will check all error messages.

Thanks
Wen Congyang

> 
> Dave
>   
>> +        return;
>> +    }
>> +
>> +    hbitmap_reset_all(backup_job->bitmap);
>> +}
>> +
>>  static const BlockJobDriver backup_job_driver = {
>>      .instance_size  = sizeof(BackupBlockJob),
>>      .job_type       = BLOCK_JOB_TYPE_BACKUP,
>>      .set_speed      = backup_set_speed,
>>      .iostatus_reset = backup_iostatus_reset,
>> +    .do_checkpoint  = backup_do_checkpoint,
>>  };
>>  
>>  static BlockErrorAction backup_error_action(BackupBlockJob *job,
>> diff --git a/blockjob.c b/blockjob.c
>> index ec46fad..cb412d1 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -400,3 +400,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
>>  
>>      qemu_bh_schedule(data->bh);
>>  }
>> +
>> +void block_job_do_checkpoint(BlockJob *job, Error **errp)
>> +{
>> +    if (!job->driver->do_checkpoint) {
>> +        error_setg(errp, "this feature or command is not currently supported");
>> +        return;
>> +    }
>> +
>> +    job->driver->do_checkpoint(job, errp);
>> +}
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index 57d8ef1..b832dc3 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
>>       * manually.
>>       */
>>      void (*complete)(BlockJob *job, Error **errp);
>> +
>> +    /** Optional callback for job types that support checkpoint. */
>> +    void (*do_checkpoint)(BlockJob *job, Error **errp);
>>  } BlockJobDriver;
>>  
>>  /**
>> @@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
>>                                    BlockJobDeferToMainLoopFn *fn,
>>                                    void *opaque);
>>  
>> +/**
>> + * block_job_do_checkpoint:
>> + * @job: The job.
>> + * @errp: Error object.
>> + *
>> + * Do block checkpoint on the specified job.
>> + */
>> +void block_job_do_checkpoint(BlockJob *job, Error **errp);
>> +
>>  #endif
>> -- 
>> 2.4.3
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints
  2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
                   ` (17 preceding siblings ...)
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 18/18] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang
@ 2015-07-28  4:01 ` Wen Congyang
  18 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-07-28  4:01 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Ping...

Does anyone have time to review this patchset? The block replication is not only for COLO. All
other HA/FT can reuse it(for example, microcheckpoint).

Thanks
Wen Congyang

On 07/07/2015 04:42 PM, Wen Congyang wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> You can the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> You can get the patch here:
> https://github.com/coloft/qemu/tree/wency/block-replication-v8
> 
> You can get the patch with framework here:
> https://github.com/coloft/qemu/tree/wency/colo_framework_v8
> 
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>    are accepted.
> 
> Wen Congyang (18):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement block driver interfaces add/delete a BDS's child
>   hmp: add monitor command to add/remove a child
>   introduce a new API qemu_opts_absorb_qdict_by_index()
>   quorum: allow ignoring child errors
>   introduce a new API to enable/disable attach device model
>   introduce a new API to check if blk is attached
>   block: make bdrv_put_ref_bh_schedule() as a public API
>   Backup: clear all bitmap when doing block checkpoint
>   allow writing to the backing file
>   Allow creating backup jobs when opening BDS
>   block: Allow references for backing files
>   docs: block replication's description
>   Add new block driver interfaces to control block replication
>   skip nbd_target when starting block replication
>   quorum: implement block driver interfaces for block replication
>   Implement new driver for block replication
>   Add a new API to start/stop replication, do checkpoint to all BDSes
> 
>  block.c                        | 266 ++++++++++++++++++++++++-
>  block/Makefile.objs            |   3 +-
>  block/backup.c                 |  13 ++
>  block/block-backend.c          |  33 +++
>  block/quorum.c                 | 244 ++++++++++++++++++++++-
>  block/replication.c            | 443 +++++++++++++++++++++++++++++++++++++++++
>  blockdev.c                     |  90 ++++++---
>  blockjob.c                     |  10 +
>  docs/block-replication.txt     | 182 +++++++++++++++++
>  hmp-commands.hx                |  28 +++
>  include/block/block.h          |  15 ++
>  include/block/block_int.h      |  19 ++
>  include/block/blockjob.h       |  12 ++
>  include/qemu/option.h          |   2 +
>  include/sysemu/block-backend.h |   3 +
>  include/sysemu/blockdev.h      |   2 +
>  qapi/block.json                |  16 ++
>  util/qemu-option.c             |  44 ++++
>  18 files changed, 1378 insertions(+), 47 deletions(-)
>  create mode 100644 block/replication.c
>  create mode 100644 docs/block-replication.txt
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces add/delete a BDS's child
  2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces " Wen Congyang
@ 2015-08-05 12:19   ` Alberto Garcia
  2015-08-06  3:15     ` Wen Congyang
  0 siblings, 1 reply; 24+ messages in thread
From: Alberto Garcia @ 2015-08-05 12:19 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

On Tue 07 Jul 2015 10:43:00 AM CEST, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Alberto Garcia <berto@igalia.com>

Sorry for being so late with this, I was on holidays during July.

> +static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
> +                             Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        if (s->bs[i] == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    if (i == s->num_children) {
> +        error_setg(errp, "Invalid child");
> +        return;
> +    }
> +
> +    if (s->num_children <= s->threshold) {
> +        error_setg(errp, "Cannot remove any more child");
> +        return;
> +    }

I think that should be "Cannot remove any more children".

You could also make it a bit more explanatory by saying "The number of
children cannot be lower than the vote threshold" or something like
that.

> +    bdrv_drain(bs);
> +    /* We can safe remove this child now */
> +    memmove(&s->bs[i], &s->bs[i+1], (s->num_children - i - 1) * sizeof(void *));

s/safe/safely/

Apart from that, the code looks good to me, so

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces add/delete a BDS's child
  2015-08-05 12:19   ` Alberto Garcia
@ 2015-08-06  3:15     ` Wen Congyang
  0 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2015-08-06  3:15 UTC (permalink / raw)
  To: Alberto Garcia, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang

On 08/05/2015 08:19 PM, Alberto Garcia wrote:
> On Tue 07 Jul 2015 10:43:00 AM CEST, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Alberto Garcia <berto@igalia.com>
> 
> Sorry for being so late with this, I was on holidays during July.
> 
>> +static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
>> +                             Error **errp)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        if (s->bs[i] == child_bs) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (i == s->num_children) {
>> +        error_setg(errp, "Invalid child");
>> +        return;
>> +    }
>> +
>> +    if (s->num_children <= s->threshold) {
>> +        error_setg(errp, "Cannot remove any more child");
>> +        return;
>> +    }
> 
> I think that should be "Cannot remove any more children".
> 
> You could also make it a bit more explanatory by saying "The number of
> children cannot be lower than the vote threshold" or something like
> that.
> 
>> +    bdrv_drain(bs);
>> +    /* We can safe remove this child now */
>> +    memmove(&s->bs[i], &s->bs[i+1], (s->num_children - i - 1) * sizeof(void *));
> 
> s/safe/safely/
> 
> Apart from that, the code looks good to me, so
> 
> Reviewed-by: Alberto Garcia <berto@igalia.com>

Thanks, I have send add/delete a quorum child separately. This patch is changed one line:
unref child_bs when it is removed.

I will update that patch and add your reviewed-by in that patchset.

Thanks
Wen Congyang

> 
> Berto
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-08-06  3:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-07  8:42 [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang
2015-07-07  8:42 ` [Qemu-devel] [PATCH COLO-BLOCK v8 01/18] Add new block driver interface to add/delete a BDS's child Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 02/18] quorum: implement block driver interfaces " Wen Congyang
2015-08-05 12:19   ` Alberto Garcia
2015-08-06  3:15     ` Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 03/18] hmp: add monitor command to add/remove a child Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 04/18] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 05/18] quorum: allow ignoring child errors Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 06/18] introduce a new API to enable/disable attach device model Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 07/18] introduce a new API to check if blk is attached Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 08/18] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-07-09  1:28   ` Dr. David Alan Gilbert
2015-07-09  1:41     ` Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 10/18] allow writing to the backing file Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 11/18] Allow creating backup jobs when opening BDS Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 12/18] block: Allow references for backing files Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 13/18] docs: block replication's description Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 14/18] Add new block driver interfaces to control block replication Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 15/18] skip nbd_target when starting " Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 16/18] quorum: implement block driver interfaces for " Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 17/18] Implement new driver " Wen Congyang
2015-07-07  8:43 ` [Qemu-devel] [PATCH COLO-BLOCK v8 18/18] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang
2015-07-28  4:01 ` [Qemu-devel] [PATCH COLO-BLOCK v8 00/18] Block replication for continuous checkpoints Wen Congyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.