[Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints
@ 2015-09-25  6:17 Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file Wen Congyang
                   ` (10 more replies)
  0 siblings, 11 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg05514.html
2. http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04900.html

You can get the patch here:
https://github.com/coloft/qemu/tree/wency/block-replication-v10

You can get the patch with framework here:
https://github.com/coloft/qemu/tree/wency/colo_framework_v9.5

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (10):
  allow writing to the backing file
  Backup: clear all bitmap when doing block checkpoint
  Allow creating backup jobs when opening BDS
  block: make bdrv_put_ref_bh_schedule() as a public API
  docs: block replication's description
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  support replication driver in blockdev-add
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c                    | 192 +++++++++++++++++-
 block/Makefile.objs        |   3 +-
 block/backup.c             |  14 ++
 block/quorum.c             |  77 ++++++++
 block/replication.c        | 471 +++++++++++++++++++++++++++++++++++++++++++++
 blockdev.c                 |  37 +---
 blockjob.c                 |  11 ++
 docs/block-replication.txt | 259 +++++++++++++++++++++++++
 include/block/block.h      |  10 +
 include/block/block_int.h  |  14 ++
 include/block/blockjob.h   |  12 ++
 qapi/block-core.json       |  34 +++-
 12 files changed, 1098 insertions(+), 36 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.4.3

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-10-09 19:07   ` Eric Blake
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint Wen Congyang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

For block replication, we have such backing chain:
    secondary disk <-- hidden disk <-- active disk
secondary disk is top BDS(use bacing reference), so it can be opened in
read-write mode. But hidden disk is read only, and we need to write to
hidden disk(backup job will write data to it).

TODO: support opening backing file in read-write mode if the BDS is
created by QMP command blockdev-add.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 073d8d6..328c52f 100644
--- a/block.c
+++ b/block.c
@@ -738,6 +738,15 @@ static const BdrvChildRole child_backing = {
     .inherit_flags = bdrv_backing_flags,
 };
 
+static int bdrv_backing_rw_flags(int flags)
+{
+    return bdrv_backing_flags(flags) | BDRV_O_RDWR;
+}
+
+static const BdrvChildRole child_backing_rw = {
+    .inherit_flags = bdrv_backing_rw_flags,
+};
+
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
     int open_flags = flags | BDRV_O_CACHE_WB;
@@ -1150,6 +1159,20 @@ out:
     bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+    .name = "backing_file",
+    .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+    .desc = {
+        {
+            .name = ALLOW_WRITE_BACKING_FILE,
+            .type = QEMU_OPT_BOOL,
+            .help = "allow writes to backing file",
+        },
+        { /* end of list */ }
+    },
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1164,6 +1187,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     int ret = 0;
     BlockDriverState *backing_hd;
     Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    bool child_rw = false;
+    const BdrvChildRole *child_role = NULL;
 
     if (bs->backing_hd != NULL) {
         QDECREF(options);
@@ -1176,6 +1202,18 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     }
 
     bs->open_flags &= ~BDRV_O_NO_BACKING;
+
+    opts = qemu_opts_create(&backing_file_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        error_propagate(errp, local_err);
+        QDECREF(options);
+        goto free_exit;
+    }
+    child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+    child_role = child_rw ? &child_backing_rw : &child_backing;
+
     if (qdict_haskey(options, "file.filename")) {
         backing_filename[0] = '\0';
     } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1208,7 +1246,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     assert(bs->backing_hd == NULL);
     ret = bdrv_open_inherit(&backing_hd,
                             *backing_filename ? backing_filename : NULL,
-                            NULL, options, 0, bs, &child_backing, &local_err);
+                            NULL, options, 0, bs, child_role, &local_err);
     if (ret < 0) {
         bdrv_unref(backing_hd);
         backing_hd = NULL;
@@ -1222,6 +1260,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     bdrv_set_backing_hd(bs, backing_hd);
 
 free_exit:
+    qemu_opts_del(opts);
     g_free(backing_filename);
     return ret;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-10-09 19:09   ` Eric Blake
  2015-10-12 13:45   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 03/10] Allow creating backup jobs when opening BDS Wen Congyang
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/backup.c           | 14 ++++++++++++++
 blockjob.c               | 11 +++++++++++
 include/block/blockjob.h | 12 ++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index c61e4c3..5e5995e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
     }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "The backup job only supports block checkpoint in"
+                   " sync=none mode");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
     .set_speed      = backup_set_speed,
     .iostatus_reset = backup_iostatus_reset,
+    .do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index ca4be94..ea4c44a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -405,3 +405,14 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
     qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+    if (!job->driver->do_checkpoint) {
+        error_setg(errp, "The job %s doesn't support block checkpoint",
+                   BlockJobType_lookup[job->driver->job_type]);
+        return;
+    }
+
+    job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index dd9d5e6..0b4f386 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
      * manually.
      */
     void (*complete)(BlockJob *job, Error **errp);
+
+    /** Optional callback for job types that support checkpoint. */
+    void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -356,4 +359,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
                                   BlockJobDeferToMainLoopFn *fn,
                                   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 03/10] Allow creating backup jobs when opening BDS
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 04/10] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 04/10] block: make bdrv_put_ref_bh_schedule() as a public API
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (2 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 03/10] Allow creating backup jobs when opening BDS Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 05/10] docs: block replication's description Wen Congyang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block.c               | 25 +++++++++++++++++++++++++
 blockdev.c            | 37 ++++++-------------------------------
 include/block/block.h |  1 +
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index 328c52f..f9a985c 100644
--- a/block.c
+++ b/block.c
@@ -3597,6 +3597,31 @@ void bdrv_unref(BlockDriverState *bs)
     }
 }
 
+typedef struct {
+    QEMUBH *bh;
+    BlockDriverState *bs;
+} BDRVPutRefBH;
+
+static void bdrv_put_ref_bh(void *opaque)
+{
+    BDRVPutRefBH *s = opaque;
+
+    bdrv_unref(s->bs);
+    qemu_bh_delete(s->bh);
+    g_free(s);
+}
+
+/* Release a BDS reference in a BH */
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
+{
+    BDRVPutRefBH *s;
+
+    s = g_new(BDRVPutRefBH, 1);
+    s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
+    s->bs = bs;
+    qemu_bh_schedule(s->bh);
+}
+
 struct BdrvOpBlocker {
     Error *reason;
     QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/blockdev.c b/blockdev.c
index 3289cc3..11bc992 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -278,37 +278,6 @@ static void bdrv_format_print(void *opaque, const char *name)
     error_printf(" %s", name);
 }
 
-typedef struct {
-    QEMUBH *bh;
-    BlockDriverState *bs;
-} BDRVPutRefBH;
-
-static void bdrv_put_ref_bh(void *opaque)
-{
-    BDRVPutRefBH *s = opaque;
-
-    bdrv_unref(s->bs);
-    qemu_bh_delete(s->bh);
-    g_free(s);
-}
-
-/*
- * Release a BDS reference in a BH
- *
- * It is not safe to use bdrv_unref() from a callback function when the callers
- * still need the BlockDriverState.  In such cases we schedule a BH to release
- * the reference.
- */
-static void bdrv_put_ref_bh_schedule(BlockDriverState *bs)
-{
-    BDRVPutRefBH *s;
-
-    s = g_new(BDRVPutRefBH, 1);
-    s->bh = qemu_bh_new(bdrv_put_ref_bh, s);
-    s->bs = bs;
-    qemu_bh_schedule(s->bh);
-}
-
 static int parse_block_error_action(const char *buf, bool is_read, Error **errp)
 {
     if (!strcmp(buf, "ignore")) {
@@ -2534,6 +2503,12 @@ static void block_job_cb(void *opaque, int ret)
         block_job_event_completed(bs->job, msg);
     }
 
+
+    /*
+     * It is not safe to use bdrv_unref() from a callback function when the
+     * callers still need the BlockDriverState. In such cases we schedule
+     * a BH to release the reference.
+     */
     bdrv_put_ref_bh_schedule(bs);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index e4be19f..5154388 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -505,6 +505,7 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
                              BlockDriverState *child_bs,
                              const BdrvChildRole *child_role);
+void bdrv_put_ref_bh_schedule(BlockDriverState *bs);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 05/10] docs: block replication's description
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (3 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 04/10] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 06/10] Add new block driver interfaces to control block replication Wen Congyang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 docs/block-replication.txt | 259 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 259 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 0000000..eab62df
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,259 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content (it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary    2 filter
+     disk         ^                                                             virtio-blk
+                  |                                                                  ^
+                3 NBD  ------->  3 NBD                                               |
+                client    ||     server                                          2 filter
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none 6
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The new block filter (the name is replication) will control the block
+replication.
+
+3) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+4) The disk on the secondary is represented by a custom block device
+(called active-disk). It should start as an empty disk, and the format
+should support bdrv_make_empty() and backing file.
+
+5) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also start as an empty disk,
+and the driver supports bdrv_make_empty() and backing file.
+
+6) The drive-backup job(sync=none) is run to allow hidden-disk to buffer
+any state that would otherwise be lost by the speculative write-through
+of the NBD server into the secondary disk. So before block replication,
+the primary disk and secondary disk should contain the same data.
+
+== Failure Handling ==
+There are 6 internal errors when block replication is running:
+1. I/O error on primary disk
+2. Forwarding primary write requests failed
+3. Backup failed
+4. I/O error on secondary disk
+5. I/O error on active disk
+6. Making active disk or hidden disk empty failed
+In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
+4 and 6, we just report block replication's error to FT/HA manager (which
+decides when to do a new checkpoint, when to do failover).
+There is no internal error when doing failover.
+
+== New block driver interface ==
+We add three block driver interfaces to control block replication:
+a. bdrv_start_replication()
+   Start block replication, called in migration/checkpoint thread.
+   We must call bdrv_start_replication() in secondary QEMU before
+   calling bdrv_start_replication() in primary QEMU. The caller
+   must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+b. bdrv_do_checkpoint()
+   This interface is called after all VM state is transferred to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. bdrv_stop_replication()
+   It is called on failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it if you use this API to shutdown the guest, or other
+   things except failover. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw,\
+
+  Run qmp command in primary qemu:
+    { 'execute': 'blockdev-add',
+      'arguments': {
+          'options': {
+              'driver': 'replication',
+              'mode': 'primary',
+              'node-name': 'nbd_client1',
+              'file': {
+                  'host': 'xxx',
+                  'port': 'xxx',
+                  'export': 'colo1',
+                  'driver': 'nbd'
+              }
+          }
+      }
+    }
+    { 'execute': 'x-child-add',
+      'arguments': {
+          'parent': 'colo1',
+          'child': 'nbd_client1'
+      }
+    }
+  Note:
+  1. There should be only one NBD Client for each primary disk.
+  2. host is the secondary physical machine's hostname or IP
+  3. Each disk must have its own export name.
+  4. It is all a single argument to -drive and you should ignore the
+     leading whitespace.
+  5. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+Secondary:
+  -drive if=none,driver=raw,file=/dev/null,id=colo1 \
+  -drive if=xxx,driver=replication,mode=secondary,\
+         file.file.filename=active_disk.qcow2,\
+         file.driver=qcow2,\
+         file.backing.file.filename=hidden_disk.qcow2,\
+         file.backing.driver=qcow2,\
+         file.backing.allow-write-backing-file=on,\
+         file.backing.backing.file.filename=1.raw,\
+         file.backing.backing.driver=raw,
+         file.backing.backing.allow-write-backing-file=on,\
+         file.backing.backing.node-name=secondary-disk1
+
+  Then run qmp command in secondary qemu:
+    { 'execute': 'blockdev-remove-medium',
+      'arguments': {
+          'device': 'colo1'
+      }
+    }
+    { 'execute': 'blockdev-insert-medium',
+      'arguments': {
+          'device': 'colo1',
+          'node-name': 'secondary-disk1'
+      }
+    }
+    { 'execute': 'nbd-server-start',
+      'arguments': {
+          'addr': {
+              'type': inet',
+              'data': {
+                  'host': 'xxx',
+                  'port': 'xxx'
+              }
+          }
+      }
+    }
+    { 'execute': 'nbd-server-add',
+      'arguments': {
+          'device': 'colo1',
+          'writable': true
+      }
+    }
+
+  Note:
+  1. The export name in secondary QEMU command line is the secondary
+     disk's id.
+  2. The export name for the same disk must be the same
+  3. The qmp command nbd-server-start and nbd-server-add must be run
+     before running the qmp command migrate on primary QEMU
+  4. Active disk, hidden disk and nbd target's length should be the
+     same.
+  5. It is better to put active disk and hidden disk in ramdisk.
+  6. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
+
+After Failover:
+Primary:
+  The secondary host is down, so we should run the following qmp command
+  to remove the nbd child from the quorum:
+  { 'execute': 'child-del',
+    'arguments': {
+        'parent': 'colo1',
+        'child': 'nbd_client1'
+    }
+  }
+  Note: there is no qmp command to remove the blockdev now
+
+Secondary:
+  The primary host is down, so we should do the following thing:
+  { 'execute': 'nbd-server-stop'
+    'arguments': { }
+  }
+  { 'execute': 'blockdev-remove-medium'
+    'arguments': {
+        'device': 'colo1'
+    }
+  }
+
+TODO:
+1. Continuous block replication
+2. Shared disk
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 06/10] Add new block driver interfaces to control block replication
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (4 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 05/10] docs: block replication's description Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 07/10] quorum: implement block driver interfaces for " Wen Congyang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Luiz Capitulino,
	Gonglei, Yang Hongyang, Michael Roth, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                   | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  5 +++++
 include/block/block_int.h | 14 ++++++++++++++
 qapi/block-core.json      | 13 +++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index f9a985c..5cb916b 100644
--- a/block.c
+++ b/block.c
@@ -4253,3 +4253,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
 
     parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_start_replication) {
+        drv->bdrv_start_replication(bs, mode, errp);
+    } else if (bs->file) {
+        bdrv_start_replication(bs->file, mode, errp);
+    } else {
+        error_setg(errp, "The BDS %s doesn't support starting block"
+                   " replication", bs->filename);
+    }
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_do_checkpoint) {
+        drv->bdrv_do_checkpoint(bs, errp);
+    } else if (bs->file) {
+        bdrv_do_checkpoint(bs->file, errp);
+    } else {
+        error_setg(errp, "The BDS %s doesn't support block checkpoint",
+                   bs->filename);
+    }
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_stop_replication) {
+        drv->bdrv_stop_replication(bs, failover, errp);
+    } else if (bs->file) {
+        bdrv_stop_replication(bs->file, failover, errp);
+    } else {
+        error_setg(errp, "The BDS %s doesn't support stopping block"
+                   " replication", bs->filename);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 5154388..40ef59f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -611,4 +611,9 @@ void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
                     Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 636d0c9..ee4b8fa 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
     void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
                            Error **errp);
 
+    void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+                                   Error **errp);
+    /* Drop Disk buffer when doing checkpoint. */
+    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+    /*
+     * After failover, we should flush Disk buffer into secondary disk
+     * and stop block replication.
+     *
+     * If the guest is shutdown, we should drop Disk buffer and stop
+     * block representation.
+     */
+    void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+                                  Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 000ae47..d5a177b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1797,6 +1797,19 @@
             '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.5
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 07/10] quorum: implement block driver interfaces for block replication
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (5 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 06/10] Add new block driver interfaces to control block replication Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 08/10] Implement new driver " Wen Congyang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 111a57b..d647ab4 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -85,6 +85,8 @@ typedef struct BDRVQuorumState {
                             */
 
     QuorumReadPattern read_pattern;
+
+    int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -945,6 +947,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     g_free(opened);
+    s->replication_index = -1;
     goto exit;
 
 close_exit:
@@ -1093,6 +1096,76 @@ static void quorum_refresh_filename(BlockDriverState *bs)
     bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                                     Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int count = 0, i, index;
+    Error *local_err = NULL;
+
+    /*
+     * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != REPLICATION_MODE_PRIMARY) {
+        error_setg(errp, "The replication mode for quorum should be 'primary'");
+        return;
+    }
+
+    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+        error_setg(errp, "Block replication needs read pattern 'fifo'");
+        return;
+    }
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_start_replication(s->bs[i], mode, &local_err);
+        if (local_err) {
+            error_free(local_err);
+            local_err = NULL;
+        } else {
+            count++;
+            index = i;
+        }
+    }
+
+    if (count == 0) {
+        error_setg(errp, "No child supports block replication");
+    } else if (count > 1) {
+        for (i = 0; i < s->num_children; i++) {
+            bdrv_stop_replication(s->bs[i], false, NULL);
+        }
+        error_setg(errp, "Too many children support block replication");
+    } else {
+        s->replication_index = index;
+    }
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    bdrv_do_checkpoint(s->bs[s->replication_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+                                    Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    bdrv_stop_replication(s->bs[s->replication_index], failover, errp);
+    s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name                        = "quorum",
     .protocol_name                      = "quorum",
@@ -1119,6 +1192,10 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+    .bdrv_start_replication             = quorum_start_replication,
+    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
+    .bdrv_stop_replication              = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 08/10] Implement new driver for block replication
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (6 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 07/10] quorum: implement block driver interfaces for " Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-10-12 16:25   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
                     ` (2 more replies)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 09/10] support replication driver in blockdev-add Wen Congyang
                   ` (2 subsequent siblings)
  10 siblings, 3 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 471 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 472 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fa05f37..94c1d03 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 0000000..813f610
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,471 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 Intel Corporation
+ * Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+    ReplicationMode mode;
+    int replication_state;
+    BlockDriverState *active_disk;
+    BlockDriverState *hidden_disk;
+    BlockDriverState *secondary_disk;
+    int error;
+} BDRVReplicationState;
+
+enum {
+    BLOCK_REPLICATION_NONE,     /* block replication is not started */
+    BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+    BLOCK_REPLICATION_DONE,     /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp);
+
+#define REPLICATION_MODE        "mode"
+static QemuOptsList replication_runtime_opts = {
+    .name = "replication",
+    .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+    .desc = {
+        {
+            .name = REPLICATION_MODE,
+            .type = QEMU_OPT_STRING,
+        },
+        { /* end of list */ }
+    },
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+                            int flags, Error **errp)
+{
+    int ret;
+    BDRVReplicationState *s = bs->opaque;;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    const char *mode;
+
+    ret = -EINVAL;
+    opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    mode = qemu_opt_get(opts, REPLICATION_MODE);
+    if (!mode) {
+        error_setg(&local_err, "Missing the option mode");
+        goto fail;
+    }
+
+    if (!strcmp(mode, "primary")) {
+        s->mode = REPLICATION_MODE_PRIMARY;
+    } else if (!strcmp(mode, "secondary")) {
+        s->mode = REPLICATION_MODE_SECONDARY;
+    } else {
+        error_setg(&local_err,
+                   "The option mode's value should be primary or secondary");
+        goto fail;
+    }
+
+    ret = 0;
+
+fail:
+    qemu_opts_del(opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+        replication_stop(bs, false, NULL);
+    }
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+    switch (s->replication_state) {
+    case BLOCK_REPLICATION_NONE:
+        return -EIO;
+    case BLOCK_REPLICATION_RUNNING:
+        return 0;
+    case BLOCK_REPLICATION_DONE:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+    default:
+        abort();
+    }
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        return ret;
+    }
+
+    if (ret < 0) {
+        s->error = ret;
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        /* We only use it to forward primary write requests */
+        return -EIO;
+    }
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * After failover, because we don't commit active disk/hidden disk
+     * to secondary disk, so we should read from active disk directly.
+     */
+    ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors, qiov);
+    return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int remaining_sectors,
+                                              QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    uint64_t bytes_done = 0;
+    BlockDriverState *top = bs->file;
+    BlockDriverState *base = s->secondary_disk;
+    BlockDriverState *target;
+    int ret, n;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 0) {
+        ret = bdrv_co_writev(bs->file, sector_num, remaining_sectors, qiov);
+        return replication_return_value(s, ret);
+    }
+
+    /*
+     * Only write to active disk if the sectors have
+     * already been allocated in active disk/hidden disk.
+     */
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    while (remaining_sectors > 0) {
+        ret = bdrv_is_allocated_above(top, base, sector_num,
+                                      remaining_sectors, &n);
+        if (ret < 0) {
+            return ret;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512);
+
+        target = ret ? top: base;
+        ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
+        if (ret < 0) {
+            return ret;
+        }
+
+        remaining_sectors -= n;
+        sector_num += n;
+        bytes_done += n * BDRV_SECTOR_SIZE;
+    }
+
+    return 0;
+}
+
+static coroutine_fn int replication_co_discard(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 1) {
+        /* It is secondary qemu and we are after failover */
+        ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    ret = bdrv_co_discard(bs->file, sector_num, nb_sectors);
+    return replication_return_value(s, ret);
+}
+
+static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                    BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file, candidate);
+}
+
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->job) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    block_job_do_checkpoint(s->secondary_disk->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = s->hidden_disk->drv->bdrv_make_empty(s->hidden_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+static void backup_job_completed(void *opaque, int ret)
+{
+    BDRVReplicationState *s = opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_DONE) {
+        /* The backup job is cancelled unexpectedly */
+        s->error = -EIO;
+    }
+
+    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+                  s->active_disk->backing_blocker);
+    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                  s->hidden_disk->backing_blocker);
+
+    bdrv_put_ref_bh_schedule(s->secondary_disk);
+}
+
+static void replication_start(BlockDriverState *bs, ReplicationMode mode,
+                              Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int64_t active_length, hidden_length, disk_length;
+    AioContext *aio_context;
+    Error *local_err = NULL;
+
+    if (s->replication_state != BLOCK_REPLICATION_NONE) {
+        error_setg(errp, "Block replication is running or done");
+        return;
+    }
+
+    if (s->mode != mode) {
+        error_setg(errp, "The parameter mode's value is invalid, needs %d,"
+                   " but receives %d", s->mode, mode);
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        s->active_disk = bs->file;
+        if (!bs->file->backing_hd) {
+            error_setg(errp, "Active disk doesn't have backing file");
+            return;
+        }
+
+        s->hidden_disk = s->active_disk->backing_hd;
+        if (!s->hidden_disk->backing_hd) {
+            error_setg(errp, "Hidden disk doesn't have backing file");
+            return;
+        }
+
+        s->secondary_disk = s->hidden_disk->backing_hd;
+        if (!s->secondary_disk->blk) {
+            error_setg(errp, "The secondary disk doesn't have block backend");
+            return;
+        }
+
+        /* verify the length */
+        active_length = bdrv_getlength(s->active_disk);
+        hidden_length = bdrv_getlength(s->hidden_disk);
+        disk_length = bdrv_getlength(s->secondary_disk);
+        if (active_length < 0 || hidden_length < 0 || disk_length < 0 ||
+            active_length != hidden_length || hidden_length != disk_length) {
+            error_setg(errp, "active disk, hidden disk, secondary disk's length"
+                       " are not the same");
+            return;
+        }
+
+        if (!s->active_disk->drv->bdrv_make_empty ||
+            !s->hidden_disk->drv->bdrv_make_empty) {
+            error_setg(errp,
+                       "active disk or hidden disk doesn't support make_empty");
+            return;
+        }
+
+        /* start backup job now */
+        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+                        s->active_disk->backing_blocker);
+        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                        s->hidden_disk->backing_blocker);
+        bdrv_ref(s->hidden_disk);
+
+        aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+        bdrv_set_aio_context(s->secondary_disk, aio_context);
+        backup_start(s->secondary_disk, s->hidden_disk, 0,
+                     MIRROR_SYNC_MODE_NONE, NULL, BLOCKDEV_ON_ERROR_REPORT,
+                     BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
+                     s, &local_err);
+        aio_context_release(aio_context);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+                          s->active_disk->backing_blocker);
+            bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                          s->hidden_disk->backing_blocker);
+            bdrv_unref(s->hidden_disk);
+            return;
+        }
+        break;
+    default:
+        abort();
+    }
+
+    s->replication_state = BLOCK_REPLICATION_RUNNING;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+
+    s->error = 0;
+}
+
+static void replication_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    if (s->error) {
+        error_setg(errp, "I/O error occurs");
+        return;
+    }
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+}
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    s->replication_state = BLOCK_REPLICATION_DONE;
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        if (!failover) {
+            /*
+             * The guest will be shutdown, and secondary disk is the
+             * same as the primary disk. Just make active disk and
+             * hidden disk empty.
+             */
+            secondary_do_checkpoint(s, errp);
+            return;
+        }
+
+        if (s->secondary_disk->job) {
+            block_job_cancel(s->secondary_disk->job);
+        }
+        break;
+    default:
+        abort();
+    }
+}
+
+BlockDriver bdrv_replication = {
+    .format_name                = "replication",
+    .protocol_name              = "replication",
+    .instance_size              = sizeof(BDRVReplicationState),
+
+    .bdrv_open                  = replication_open,
+    .bdrv_close                 = replication_close,
+
+    .bdrv_getlength             = replication_getlength,
+    .bdrv_co_readv              = replication_co_readv,
+    .bdrv_co_writev             = replication_co_writev,
+    .bdrv_co_discard            = replication_co_discard,
+
+    .is_filter                  = true,
+    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
+
+    .bdrv_start_replication     = replication_start,
+    .bdrv_do_checkpoint         = replication_do_checkpoint,
+    .bdrv_stop_replication      = replication_stop,
+
+    .has_variable_length        = true,
+};
+
+static void bdrv_replication_init(void)
+{
+    bdrv_register(&bdrv_replication);
+}
+
+block_init(bdrv_replication_init);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 09/10] support replication driver in blockdev-add
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (7 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 08/10] Implement new driver " Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-10-09 14:40   ` Eric Blake
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang
  2015-10-07  6:39 ` [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
  10 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 qapi/block-core.json | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d5a177b..0907a72 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -219,7 +219,7 @@
 #       'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'
 #       2.2: 'archipelago' added, 'cow' dropped
 #       2.3: 'host_floppy' deprecated
-#       2.5: 'host_floppy' dropped
+#       2.5: 'host_floppy' dropped, 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1375,6 +1375,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.5
 #
 # Since: 2.0
 ##
@@ -1382,8 +1383,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
             'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
             'http', 'https', 'null-aio', 'null-co', 'parallels',
-            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-            'vmdk', 'vpc', 'vvfat' ] }
+            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication',
+            'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -1810,6 +1811,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.5
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -1846,6 +1860,7 @@
       'quorum':     'BlockdevOptionsQuorum',
       'raw':        'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+      'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
       'tftp':       'BlockdevOptionsFile',
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v10 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (8 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 09/10] support replication driver in blockdev-add Wen Congyang
@ 2015-09-25  6:17 ` Wen Congyang
  2015-10-07  6:39 ` [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
  10 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-09-25  6:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c               | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |  4 +++
 2 files changed, 87 insertions(+)

diff --git a/block.c b/block.c
index 5cb916b..5891c4d 100644
--- a/block.c
+++ b/block.c
@@ -4296,3 +4296,86 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
                    " replication", bs->filename);
     }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+    BlockDriverState *bs = NULL, *temp = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (!QLIST_EMPTY(&bs->parents)) {
+            /* It is not top BDS */
+            continue;
+        }
+
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_start_replication(bs, mode, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            goto fail;
+        }
+    }
+
+    return;
+
+fail:
+    while ((temp = bdrv_next(temp)) && bs != temp) {
+        bdrv_stop_replication(temp, false, NULL);
+    }
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (!QLIST_EMPTY(&bs->parents)) {
+            /* It is not top BDS */
+            continue;
+        }
+
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_do_checkpoint(bs, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (!QLIST_EMPTY(&bs->parents)) {
+            /* It is not top BDS */
+            continue;
+        }
+
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_stop_replication(bs, failover, &local_err);
+        if (!errp) {
+            /*
+             * The caller doesn't care the result, they just
+             * want to stop all block's replication.
+             */
+            continue;
+        }
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 40ef59f..eb6a4a2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -616,4 +616,8 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints
  2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
                   ` (9 preceding siblings ...)
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang
@ 2015-10-07  6:39 ` Wen Congyang
  2015-10-09 12:52   ` Stefan Hajnoczi
  10 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-07  6:39 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

Ping...

On 09/25/2015 02:17 PM, Wen Congyang wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> You can the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg05514.html
> 2. http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04900.html
> 
> You can get the patch here:
> https://github.com/coloft/qemu/tree/wency/block-replication-v10
> 
> You can get the patch with framework here:
> https://github.com/coloft/qemu/tree/wency/colo_framework_v9.5
> 
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>    are accepted.
> 
> Changs Log:
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>    reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>    if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Wen Congyang (10):
>   allow writing to the backing file
>   Backup: clear all bitmap when doing block checkpoint
>   Allow creating backup jobs when opening BDS
>   block: make bdrv_put_ref_bh_schedule() as a public API
>   docs: block replication's description
>   Add new block driver interfaces to control block replication
>   quorum: implement block driver interfaces for block replication
>   Implement new driver for block replication
>   support replication driver in blockdev-add
>   Add a new API to start/stop replication, do checkpoint to all BDSes
> 
>  block.c                    | 192 +++++++++++++++++-
>  block/Makefile.objs        |   3 +-
>  block/backup.c             |  14 ++
>  block/quorum.c             |  77 ++++++++
>  block/replication.c        | 471 +++++++++++++++++++++++++++++++++++++++++++++
>  blockdev.c                 |  37 +---
>  blockjob.c                 |  11 ++
>  docs/block-replication.txt | 259 +++++++++++++++++++++++++
>  include/block/block.h      |  10 +
>  include/block/block_int.h  |  14 ++
>  include/block/blockjob.h   |  12 ++
>  qapi/block-core.json       |  34 +++-
>  12 files changed, 1098 insertions(+), 36 deletions(-)
>  create mode 100644 block/replication.c
>  create mode 100644 docs/block-replication.txt
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints
  2015-10-07  6:39 ` [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
@ 2015-10-09 12:52   ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-09 12:52 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Fam Zheng, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Dr. David Alan Gilbert,
	Paolo Bonzini, Max Reitz

On Wed, Oct 07, 2015 at 02:39:51PM +0800, Wen Congyang wrote:
> Ping...

Added to my todo list.  Will review on Monday.

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v10 09/10] support replication driver in blockdev-add
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 09/10] support replication driver in blockdev-add Wen Congyang
@ 2015-10-09 14:40   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-10-09 14:40 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 478 bytes --]

On 09/25/2015 12:17 AM, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---

Not sure if the commit message body should be empty, but no real
suggestions on what to put there.

>  qapi/block-core.json | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file Wen Congyang
@ 2015-10-09 19:07   ` Eric Blake
  2015-10-30  5:49     ` Wen Congyang
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2015-10-09 19:07 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

On 09/25/2015 12:17 AM, Wen Congyang wrote:
> For block replication, we have such backing chain:
>     secondary disk <-- hidden disk <-- active disk
> secondary disk is top BDS(use bacing reference), so it can be opened in

s/BDS(use bacing/BDS (use backing/

> read-write mode. But hidden disk is read only, and we need to write to
> hidden disk(backup job will write data to it).

s/disk(/disk (/

> 
> TODO: support opening backing file in read-write mode if the BDS is
> created by QMP command blockdev-add.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 

I really don't like this patch.  We are able to automatically (re-)open
backing files for write during block-commit, without having to expose a
knob to the user then, so exposing a knob to the user here feels wrong.

> +#define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
> +static QemuOptsList backing_file_opts = {
> +    .name = "backing_file",
> +    .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
> +    .desc = {
> +        {
> +            .name = ALLOW_WRITE_BACKING_FILE,
> +            .type = QEMU_OPT_BOOL,
> +            .help = "allow writes to backing file",
> +        },

And even if we DO need this knob (which I doubt), you need corresponding
documentation of the knob in qapi/block-core.json, since we are trying
to keep the command line and QMP in sync when it comes to adding new
options.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-10-09 19:09   ` Eric Blake
  2015-10-12 13:45   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-10-09 19:09 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 508 bytes --]

On 09/25/2015 12:17 AM, Wen Congyang wrote:

s/bitmap/bitmaps/ in the subject line.

The subject line says "what", but you are missing a commit body that
says "why".

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> ---

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint Wen Congyang
  2015-10-09 19:09   ` Eric Blake
@ 2015-10-12 13:45   ` Stefan Hajnoczi
  2015-10-13  9:13     ` Wen Congyang
  1 sibling, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-12 13:45 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/backup.c           | 14 ++++++++++++++
>  blockjob.c               | 11 +++++++++++
>  include/block/blockjob.h | 12 ++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/block/backup.c b/block/backup.c
> index c61e4c3..5e5995e 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
>      }
>  }
>  
> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
> +{
> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> +
> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
> +        error_setg(errp, "The backup job only supports block checkpoint in"
> +                   " sync=none mode");
> +        return;
> +    }
> +
> +    hbitmap_reset_all(backup_job->bitmap);
> +}

Is this a fast way to stop and then start a new backup blockjob without
emitting block job lifecycle events?

Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
there any other block job type that will implement .do_checkpoint()?

COLO block replication could call a public backup_do_checkpoint()
function.  That way the direct coupling between COLO and the backup
block job is obvious.  I'm not convinced a generic interface like
blockjob_do_checkpoint() makes sense since it's really not a generic
operation that makes sense for other block job types.

void backup_do_checkpoint(BlockJob *job, Error **errp)
{
    BackupBlockJob *s;

    if (job->driver != backup_job_driver) {
        error_setg(errp, "expected backup block job type for "
	           "checkpoint, got %d", job->driver->job_type);
        return;
    }

    s = container_of(job, BackupBlockJob, common);
    ...
}

Please also make the function name and documentation more specific.
Instead of "do" maybe this should be "pre" or "post" to indicate whether
this happens before or after the checkpoint commit.  What happens if
this function returns an error?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 08/10] Implement new driver " Wen Congyang
@ 2015-10-12 16:25   ` Stefan Hajnoczi
  2015-10-13  8:59     ` Wen Congyang
  2015-10-12 16:27   ` Stefan Hajnoczi
  2015-10-12 16:31   ` Stefan Hajnoczi
  2 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-12 16:25 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> +static void backup_job_completed(void *opaque, int ret)
> +{
> +    BDRVReplicationState *s = opaque;
> +
> +    if (s->replication_state != BLOCK_REPLICATION_DONE) {
> +        /* The backup job is cancelled unexpectedly */
> +        s->error = -EIO;
> +    }
> +
> +    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> +                  s->active_disk->backing_blocker);
> +    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +                  s->hidden_disk->backing_blocker);
> +
> +    bdrv_put_ref_bh_schedule(s->secondary_disk);

Why is bdrv_put_ref_bh_schedule() necessary?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 08/10] Implement new driver " Wen Congyang
  2015-10-12 16:25   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2015-10-12 16:27   ` Stefan Hajnoczi
  2015-10-13  9:08     ` Wen Congyang
  2015-10-12 16:31   ` Stefan Hajnoczi
  2 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-12 16:27 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> +        /* start backup job now */
> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> +                        s->active_disk->backing_blocker);
> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +                        s->hidden_disk->backing_blocker);

Why is it safe to unblock these operations?

Why do they have to be blocked for non-replication users?

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 08/10] Implement new driver " Wen Congyang
  2015-10-12 16:25   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2015-10-12 16:27   ` Stefan Hajnoczi
@ 2015-10-12 16:31   ` Stefan Hajnoczi
  2015-10-13  9:09     ` Wen Congyang
  2 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-12 16:31 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> +static void replication_start(BlockDriverState *bs, ReplicationMode mode,
> +                              Error **errp)
> +{
> +    BDRVReplicationState *s = bs->opaque;
> +    int64_t active_length, hidden_length, disk_length;
> +    AioContext *aio_context;
> +    Error *local_err = NULL;
> +
> +    if (s->replication_state != BLOCK_REPLICATION_NONE) {
> +        error_setg(errp, "Block replication is running or done");
> +        return;
> +    }
> +
> +    if (s->mode != mode) {
> +        error_setg(errp, "The parameter mode's value is invalid, needs %d,"
> +                   " but receives %d", s->mode, mode);
> +        return;
> +    }
> +
> +    switch (s->mode) {
> +    case REPLICATION_MODE_PRIMARY:
> +        break;
> +    case REPLICATION_MODE_SECONDARY:
> +        s->active_disk = bs->file;
> +        if (!bs->file->backing_hd) {
> +            error_setg(errp, "Active disk doesn't have backing file");
> +            return;
> +        }
> +
> +        s->hidden_disk = s->active_disk->backing_hd;
> +        if (!s->hidden_disk->backing_hd) {
> +            error_setg(errp, "Hidden disk doesn't have backing file");
> +            return;
> +        }
> +
> +        s->secondary_disk = s->hidden_disk->backing_hd;
> +        if (!s->secondary_disk->blk) {
> +            error_setg(errp, "The secondary disk doesn't have block backend");
> +            return;
> +        }
...
> +        aio_context = bdrv_get_aio_context(bs);
> +        aio_context_acquire(aio_context);
> +        bdrv_set_aio_context(s->secondary_disk, aio_context);

Why is this bdrv_set_aio_context() call necessary?

Child BDS nodes are in the same AioContext as their parents.  Other
block jobs need something like this because they operate on a second BDS
which is not bs' backing file chain.  I think you have a different
situation here so it's not needed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-12 16:25   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2015-10-13  8:59     ` Wen Congyang
  2015-10-13  9:41       ` Fam Zheng
  0 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-13  8:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +static void backup_job_completed(void *opaque, int ret)
>> +{
>> +    BDRVReplicationState *s = opaque;
>> +
>> +    if (s->replication_state != BLOCK_REPLICATION_DONE) {
>> +        /* The backup job is cancelled unexpectedly */
>> +        s->error = -EIO;
>> +    }
>> +
>> +    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>> +                  s->active_disk->backing_blocker);
>> +    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>> +                  s->hidden_disk->backing_blocker);
>> +
>> +    bdrv_put_ref_bh_schedule(s->secondary_disk);
> 
> Why is bdrv_put_ref_bh_schedule() necessary?

It is copied from block_job_cb(). According to the comments in bdrv_put_ref_bh_schedule():
/*
 * Release a BDS reference in a BH
 *
 * It is not safe to use bdrv_unref() from a callback function when the callers
 * still need the BlockDriverState.  In such cases we schedule a BH to release
 * the reference.
 */

If the comment is right, I think it is necessary to call bdrv_put_ref_bh_schedule() here.
Because the job is created on the BDS s->secondary disk, backup_job_completed() is
called in block_job_completed(), and we will still use s->secondary_disk in block_job_release().

Thanks
Wen Congyang

> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-12 16:27   ` Stefan Hajnoczi
@ 2015-10-13  9:08     ` Wen Congyang
  2015-10-14 14:27       ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-13  9:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +        /* start backup job now */
>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>> +                        s->active_disk->backing_blocker);
>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>> +                        s->hidden_disk->backing_blocker);
> 
> Why is it safe to unblock these operations?
> 
> Why do they have to be blocked for non-replication users?

hidden_disk and secondary disk are opened as backing file, so it is blocked for
non-replication users.
What can I do if I don't unblock it and want to do backup?

Thanks
Wen Congyang

> 
> Stefan
> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-12 16:31   ` Stefan Hajnoczi
@ 2015-10-13  9:09     ` Wen Congyang
  0 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-10-13  9:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert

On 10/13/2015 12:31 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +static void replication_start(BlockDriverState *bs, ReplicationMode mode,
>> +                              Error **errp)
>> +{
>> +    BDRVReplicationState *s = bs->opaque;
>> +    int64_t active_length, hidden_length, disk_length;
>> +    AioContext *aio_context;
>> +    Error *local_err = NULL;
>> +
>> +    if (s->replication_state != BLOCK_REPLICATION_NONE) {
>> +        error_setg(errp, "Block replication is running or done");
>> +        return;
>> +    }
>> +
>> +    if (s->mode != mode) {
>> +        error_setg(errp, "The parameter mode's value is invalid, needs %d,"
>> +                   " but receives %d", s->mode, mode);
>> +        return;
>> +    }
>> +
>> +    switch (s->mode) {
>> +    case REPLICATION_MODE_PRIMARY:
>> +        break;
>> +    case REPLICATION_MODE_SECONDARY:
>> +        s->active_disk = bs->file;
>> +        if (!bs->file->backing_hd) {
>> +            error_setg(errp, "Active disk doesn't have backing file");
>> +            return;
>> +        }
>> +
>> +        s->hidden_disk = s->active_disk->backing_hd;
>> +        if (!s->hidden_disk->backing_hd) {
>> +            error_setg(errp, "Hidden disk doesn't have backing file");
>> +            return;
>> +        }
>> +
>> +        s->secondary_disk = s->hidden_disk->backing_hd;
>> +        if (!s->secondary_disk->blk) {
>> +            error_setg(errp, "The secondary disk doesn't have block backend");
>> +            return;
>> +        }
> ...
>> +        aio_context = bdrv_get_aio_context(bs);
>> +        aio_context_acquire(aio_context);
>> +        bdrv_set_aio_context(s->secondary_disk, aio_context);
> 
> Why is this bdrv_set_aio_context() call necessary?
> 
> Child BDS nodes are in the same AioContext as their parents.  Other
> block jobs need something like this because they operate on a second BDS
> which is not bs' backing file chain.  I think you have a different
> situation here so it's not needed.

I think you are right. I will check it and remove it.

Thanks
Wen Congyang

> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint
  2015-10-12 13:45   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2015-10-13  9:13     ` Wen Congyang
  2015-10-14 14:12       ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-13  9:13 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Yang Hongyang, Dr. David Alan Gilbert

On 10/12/2015 09:45 PM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Jeff Cody <jcody@redhat.com>
>> ---
>>  block/backup.c           | 14 ++++++++++++++
>>  blockjob.c               | 11 +++++++++++
>>  include/block/blockjob.h | 12 ++++++++++++
>>  3 files changed, 37 insertions(+)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index c61e4c3..5e5995e 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
>>      }
>>  }
>>  
>> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
>> +{
>> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
>> +
>> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
>> +        error_setg(errp, "The backup job only supports block checkpoint in"
>> +                   " sync=none mode");
>> +        return;
>> +    }
>> +
>> +    hbitmap_reset_all(backup_job->bitmap);
>> +}
> 
> Is this a fast way to stop and then start a new backup blockjob without
> emitting block job lifecycle events?
> 
> Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
> there any other block job type that will implement .do_checkpoint()?

Currently, the answer is no.

> 
> COLO block replication could call a public backup_do_checkpoint()
> function.  That way the direct coupling between COLO and the backup
> block job is obvious.  I'm not convinced a generic interface like
> blockjob_do_checkpoint() makes sense since it's really not a generic
> operation that makes sense for other block job types.
> 
> void backup_do_checkpoint(BlockJob *job, Error **errp)
> {
>     BackupBlockJob *s;
> 
>     if (job->driver != backup_job_driver) {
>         error_setg(errp, "expected backup block job type for "
> 	           "checkpoint, got %d", job->driver->job_type);
>         return;
>     }
> 
>     s = container_of(job, BackupBlockJob, common);
>     ...
> }

In a older version, I implement it like this, but Paolo didn't like it.

> 
> Please also make the function name and documentation more specific.
> Instead of "do" maybe this should be "pre" or "post" to indicate whether
> this happens before or after the checkpoint commit.  What happens if

OK

> this function returns an error?

We just return this error to COLO, and COLO will do failover.

Thanks
Wen Congyang

> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-13  8:59     ` Wen Congyang
@ 2015-10-13  9:41       ` Fam Zheng
  2015-10-13  9:46         ` Wen Congyang
  0 siblings, 1 reply; 38+ messages in thread
From: Fam Zheng @ 2015-10-13  9:41 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, zhanghailiang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Tue, 10/13 16:59, Wen Congyang wrote:
> On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
> > On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> >> +static void backup_job_completed(void *opaque, int ret)
> >> +{
> >> +    BDRVReplicationState *s = opaque;
> >> +
> >> +    if (s->replication_state != BLOCK_REPLICATION_DONE) {
> >> +        /* The backup job is cancelled unexpectedly */
> >> +        s->error = -EIO;
> >> +    }
> >> +
> >> +    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> >> +                  s->active_disk->backing_blocker);
> >> +    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> >> +                  s->hidden_disk->backing_blocker);
> >> +
> >> +    bdrv_put_ref_bh_schedule(s->secondary_disk);
> > 
> > Why is bdrv_put_ref_bh_schedule() necessary?
> 
> It is copied from block_job_cb(). According to the comments in bdrv_put_ref_bh_schedule():
> /*
>  * Release a BDS reference in a BH
>  *
>  * It is not safe to use bdrv_unref() from a callback function when the callers
>  * still need the BlockDriverState.  In such cases we schedule a BH to release
>  * the reference.
>  */
> 
> If the comment is right, I think it is necessary to call bdrv_put_ref_bh_schedule() here.
> Because the job is created on the BDS s->secondary disk, backup_job_completed() is
> called in block_job_completed(), and we will still use s->secondary_disk in block_job_release().

Where is the matching bdrv_ref called?

Fam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-13  9:41       ` Fam Zheng
@ 2015-10-13  9:46         ` Wen Congyang
  2015-10-13 10:12           ` Fam Zheng
  0 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-13  9:46 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, zhanghailiang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On 10/13/2015 05:41 PM, Fam Zheng wrote:
> On Tue, 10/13 16:59, Wen Congyang wrote:
>> On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>> +static void backup_job_completed(void *opaque, int ret)
>>>> +{
>>>> +    BDRVReplicationState *s = opaque;
>>>> +
>>>> +    if (s->replication_state != BLOCK_REPLICATION_DONE) {
>>>> +        /* The backup job is cancelled unexpectedly */
>>>> +        s->error = -EIO;
>>>> +    }
>>>> +
>>>> +    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>> +                  s->active_disk->backing_blocker);
>>>> +    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>> +                  s->hidden_disk->backing_blocker);
>>>> +
>>>> +    bdrv_put_ref_bh_schedule(s->secondary_disk);
>>>
>>> Why is bdrv_put_ref_bh_schedule() necessary?
>>
>> It is copied from block_job_cb(). According to the comments in bdrv_put_ref_bh_schedule():
>> /*
>>  * Release a BDS reference in a BH
>>  *
>>  * It is not safe to use bdrv_unref() from a callback function when the callers
>>  * still need the BlockDriverState.  In such cases we schedule a BH to release
>>  * the reference.
>>  */
>>
>> If the comment is right, I think it is necessary to call bdrv_put_ref_bh_schedule() here.
>> Because the job is created on the BDS s->secondary disk, backup_job_completed() is
>> called in block_job_completed(), and we will still use s->secondary_disk in block_job_release().
> 
> Where is the matching bdrv_ref called?

It is in block_job_create()....

source: we call in bdrv_ref() in block_job_create(), and the user should unref it.
target: the user call bdrv_ref(), and we will unref it in the job

I don't know why we design it like this...

Thanks
Wen Congyang

> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-13  9:46         ` Wen Congyang
@ 2015-10-13 10:12           ` Fam Zheng
  2015-10-14  0:55             ` Wen Congyang
  0 siblings, 1 reply; 38+ messages in thread
From: Fam Zheng @ 2015-10-13 10:12 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, zhanghailiang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Tue, 10/13 17:46, Wen Congyang wrote:
> On 10/13/2015 05:41 PM, Fam Zheng wrote:
> > On Tue, 10/13 16:59, Wen Congyang wrote:
> >> On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
> >>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> >>>> +static void backup_job_completed(void *opaque, int ret)
> >>>> +{
> >>>> +    BDRVReplicationState *s = opaque;
> >>>> +
> >>>> +    if (s->replication_state != BLOCK_REPLICATION_DONE) {
> >>>> +        /* The backup job is cancelled unexpectedly */
> >>>> +        s->error = -EIO;
> >>>> +    }
> >>>> +
> >>>> +    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> >>>> +                  s->active_disk->backing_blocker);
> >>>> +    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> >>>> +                  s->hidden_disk->backing_blocker);
> >>>> +
> >>>> +    bdrv_put_ref_bh_schedule(s->secondary_disk);
> >>>
> >>> Why is bdrv_put_ref_bh_schedule() necessary?
> >>
> >> It is copied from block_job_cb(). According to the comments in bdrv_put_ref_bh_schedule():
> >> /*
> >>  * Release a BDS reference in a BH
> >>  *
> >>  * It is not safe to use bdrv_unref() from a callback function when the callers
> >>  * still need the BlockDriverState.  In such cases we schedule a BH to release
> >>  * the reference.
> >>  */
> >>
> >> If the comment is right, I think it is necessary to call bdrv_put_ref_bh_schedule() here.
> >> Because the job is created on the BDS s->secondary disk, backup_job_completed() is
> >> called in block_job_completed(), and we will still use s->secondary_disk in block_job_release().
> > 
> > Where is the matching bdrv_ref called?
> 
> It is in block_job_create()....
> 
> source: we call in bdrv_ref() in block_job_create(), and the user should unref it.
> target: the user call bdrv_ref(), and we will unref it in the job
> 
> I don't know why we design it like this...
> 

Maybe it's better to unref it in block_job_release. Then we can simply drop
bdrv_put_ref_bh_schedule.

Fam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-13 10:12           ` Fam Zheng
@ 2015-10-14  0:55             ` Wen Congyang
  0 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-10-14  0:55 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, zhanghailiang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On 10/13/2015 06:12 PM, Fam Zheng wrote:
> On Tue, 10/13 17:46, Wen Congyang wrote:
>> On 10/13/2015 05:41 PM, Fam Zheng wrote:
>>> On Tue, 10/13 16:59, Wen Congyang wrote:
>>>> On 10/13/2015 12:25 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>> +static void backup_job_completed(void *opaque, int ret)
>>>>>> +{
>>>>>> +    BDRVReplicationState *s = opaque;
>>>>>> +
>>>>>> +    if (s->replication_state != BLOCK_REPLICATION_DONE) {
>>>>>> +        /* The backup job is cancelled unexpectedly */
>>>>>> +        s->error = -EIO;
>>>>>> +    }
>>>>>> +
>>>>>> +    bdrv_op_block(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>> +                  s->active_disk->backing_blocker);
>>>>>> +    bdrv_op_block(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>> +                  s->hidden_disk->backing_blocker);
>>>>>> +
>>>>>> +    bdrv_put_ref_bh_schedule(s->secondary_disk);
>>>>>
>>>>> Why is bdrv_put_ref_bh_schedule() necessary?
>>>>
>>>> It is copied from block_job_cb(). According to the comments in bdrv_put_ref_bh_schedule():
>>>> /*
>>>>  * Release a BDS reference in a BH
>>>>  *
>>>>  * It is not safe to use bdrv_unref() from a callback function when the callers
>>>>  * still need the BlockDriverState.  In such cases we schedule a BH to release
>>>>  * the reference.
>>>>  */
>>>>
>>>> If the comment is right, I think it is necessary to call bdrv_put_ref_bh_schedule() here.
>>>> Because the job is created on the BDS s->secondary disk, backup_job_completed() is
>>>> called in block_job_completed(), and we will still use s->secondary_disk in block_job_release().
>>>
>>> Where is the matching bdrv_ref called?
>>
>> It is in block_job_create()....
>>
>> source: we call in bdrv_ref() in block_job_create(), and the user should unref it.
>> target: the user call bdrv_ref(), and we will unref it in the job
>>
>> I don't know why we design it like this...
>>
> 
> Maybe it's better to unref it in block_job_release. Then we can simply drop
> bdrv_put_ref_bh_schedule.

I agree with it.

Thanks
Wen Congyang

> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint
  2015-10-13  9:13     ` Wen Congyang
@ 2015-10-14 14:12       ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-14 14:12 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Tue, Oct 13, 2015 at 05:13:14PM +0800, Wen Congyang wrote:
> On 10/12/2015 09:45 PM, Stefan Hajnoczi wrote:
> > On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >> Reviewed-by: Jeff Cody <jcody@redhat.com>
> >> ---
> >>  block/backup.c           | 14 ++++++++++++++
> >>  blockjob.c               | 11 +++++++++++
> >>  include/block/blockjob.h | 12 ++++++++++++
> >>  3 files changed, 37 insertions(+)
> >>
> >> diff --git a/block/backup.c b/block/backup.c
> >> index c61e4c3..5e5995e 100644
> >> --- a/block/backup.c
> >> +++ b/block/backup.c
> >> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
> >>      }
> >>  }
> >>  
> >> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
> >> +{
> >> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> >> +
> >> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
> >> +        error_setg(errp, "The backup job only supports block checkpoint in"
> >> +                   " sync=none mode");
> >> +        return;
> >> +    }
> >> +
> >> +    hbitmap_reset_all(backup_job->bitmap);
> >> +}
> > 
> > Is this a fast way to stop and then start a new backup blockjob without
> > emitting block job lifecycle events?
> > 
> > Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
> > there any other block job type that will implement .do_checkpoint()?
> 
> Currently, the answer is no.
> 
> > 
> > COLO block replication could call a public backup_do_checkpoint()
> > function.  That way the direct coupling between COLO and the backup
> > block job is obvious.  I'm not convinced a generic interface like
> > blockjob_do_checkpoint() makes sense since it's really not a generic
> > operation that makes sense for other block job types.
> > 
> > void backup_do_checkpoint(BlockJob *job, Error **errp)
> > {
> >     BackupBlockJob *s;
> > 
> >     if (job->driver != backup_job_driver) {
> >         error_setg(errp, "expected backup block job type for "
> > 	           "checkpoint, got %d", job->driver->job_type);
> >         return;
> >     }
> > 
> >     s = container_of(job, BackupBlockJob, common);
> >     ...
> > }
> 
> In a older version, I implement it like this, but Paolo didn't like it.

It's a question of taste.  In this case it seems to me that there is
really a direct coupling between COLO and the backup block job.  This
isn't really a generic interface that makes sense in other scenarios.
That's why I advocate for direct coupling instead of pretending this is
a generic interface.

I wish COLO could just stop the existing block job and start a new one
for each checkpoint.  In reality we probably don't want QMP events and
the full block job lifecycle for each checkpoint...  But anyway, I like
this approach because it does not require a new interface at all.

> > Please also make the function name and documentation more specific.
> > Instead of "do" maybe this should be "pre" or "post" to indicate whether
> > this happens before or after the checkpoint commit.  What happens if
> 
> OK
> 
> > this function returns an error?
> 
> We just return this error to COLO, and COLO will do failover.

Okay, I ask these questions because the information should be part of
the doc comment for this new interface.

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-13  9:08     ` Wen Congyang
@ 2015-10-14 14:27       ` Stefan Hajnoczi
  2015-10-15  2:19         ` Wen Congyang
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-14 14:27 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> > On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> >> +        /* start backup job now */
> >> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> >> +                        s->active_disk->backing_blocker);
> >> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> >> +                        s->hidden_disk->backing_blocker);
> > 
> > Why is it safe to unblock these operations?
> > 
> > Why do they have to be blocked for non-replication users?
> 
> hidden_disk and secondary disk are opened as backing file, so it is blocked for
> non-replication users.
> What can I do if I don't unblock it and want to do backup?

CCing Jeff Cody, block jobs maintainer

You need to explain why it is safe remove this protection.  We can't
merge code that may be unsafe.

I think we can investigate further by asking: when does QEMU code assume
the backing file is read-only?

I haven't checked but these cases come to mind:

Operations that move data between BDS in the backing chain (e.g. commit
and stream block jobs) will lose or overwrite data if the backing file
is being written to by another coroutine.

We need to prevent users from running these operations at the same time.

Also, accessing bs->backing_blocker is a layering violation.  No one
outside block.c:bdrv_set_backing_hd() is supposed to access this field.

Let's figure out the safety concerns first and then the
bs->backing_blocker access will probably be eliminated as part of the
solution.

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-14 14:27       ` Stefan Hajnoczi
@ 2015-10-15  2:19         ` Wen Congyang
  2015-10-15 14:55           ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-15  2:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>> +        /* start backup job now */
>>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>> +                        s->active_disk->backing_blocker);
>>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>> +                        s->hidden_disk->backing_blocker);
>>>
>>> Why is it safe to unblock these operations?
>>>
>>> Why do they have to be blocked for non-replication users?
>>
>> hidden_disk and secondary disk are opened as backing file, so it is blocked for
>> non-replication users.
>> What can I do if I don't unblock it and want to do backup?
> 
> CCing Jeff Cody, block jobs maintainer
> 
> You need to explain why it is safe remove this protection.  We can't
> merge code that may be unsafe.
> 
> I think we can investigate further by asking: when does QEMU code assume
> the backing file is read-only?

The backing file is opened in read-only mode. I want to reopen it in read-write
mode here in the next version(So the patch 1 will be dropped)

> 
> I haven't checked but these cases come to mind:
> 
> Operations that move data between BDS in the backing chain (e.g. commit
> and stream block jobs) will lose or overwrite data if the backing file
> is being written to by another coroutine.
> 
> We need to prevent users from running these operations at the same time.

Yes, but qemu doesn't provide such API.

> 
> Also, accessing bs->backing_blocker is a layering violation.  No one
> outside block.c:bdrv_set_backing_hd() is supposed to access this field.

I agree with it.

Thanks
Wen Congyang

> 
> Let's figure out the safety concerns first and then the
> bs->backing_blocker access will probably be eliminated as part of the
> solution.
> 
> Stefan
> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-15  2:19         ` Wen Congyang
@ 2015-10-15 14:55           ` Stefan Hajnoczi
  2015-10-16  0:52             ` Wen Congyang
  2015-10-16  2:22             ` Wen Congyang
  0 siblings, 2 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-15 14:55 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
> > On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
> >> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> >>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> >>>> +        /* start backup job now */
> >>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> >>>> +                        s->active_disk->backing_blocker);
> >>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> >>>> +                        s->hidden_disk->backing_blocker);
> >>>
> >>> Why is it safe to unblock these operations?
> >>>
> >>> Why do they have to be blocked for non-replication users?
> >>
> >> hidden_disk and secondary disk are opened as backing file, so it is blocked for
> >> non-replication users.
> >> What can I do if I don't unblock it and want to do backup?
> > 
> > CCing Jeff Cody, block jobs maintainer
> > 
> > You need to explain why it is safe remove this protection.  We can't
> > merge code that may be unsafe.
> > 
> > I think we can investigate further by asking: when does QEMU code assume
> > the backing file is read-only?
> 
> The backing file is opened in read-only mode. I want to reopen it in read-write
> mode here in the next version(So the patch 1 will be dropped)
> 
> > 
> > I haven't checked but these cases come to mind:
> > 
> > Operations that move data between BDS in the backing chain (e.g. commit
> > and stream block jobs) will lose or overwrite data if the backing file
> > is being written to by another coroutine.
> > 
> > We need to prevent users from running these operations at the same time.
> 
> Yes, but qemu doesn't provide such API.

This series can't be merged unless it is safe.

Have you looked at op blockers and thought about how to prevent unsafe
operations?

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-15 14:55           ` Stefan Hajnoczi
@ 2015-10-16  0:52             ` Wen Congyang
  2015-10-16  2:22             ` Wen Congyang
  1 sibling, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-10-16  0:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>> +        /* start backup job now */
>>>>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>> +                        s->active_disk->backing_blocker);
>>>>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>> +                        s->hidden_disk->backing_blocker);
>>>>>
>>>>> Why is it safe to unblock these operations?
>>>>>
>>>>> Why do they have to be blocked for non-replication users?
>>>>
>>>> hidden_disk and secondary disk are opened as backing file, so it is blocked for
>>>> non-replication users.
>>>> What can I do if I don't unblock it and want to do backup?
>>>
>>> CCing Jeff Cody, block jobs maintainer
>>>
>>> You need to explain why it is safe remove this protection.  We can't
>>> merge code that may be unsafe.
>>>
>>> I think we can investigate further by asking: when does QEMU code assume
>>> the backing file is read-only?
>>
>> The backing file is opened in read-only mode. I want to reopen it in read-write
>> mode here in the next version(So the patch 1 will be dropped)
>>
>>>
>>> I haven't checked but these cases come to mind:
>>>
>>> Operations that move data between BDS in the backing chain (e.g. commit
>>> and stream block jobs) will lose or overwrite data if the backing file
>>> is being written to by another coroutine.
>>>
>>> We need to prevent users from running these operations at the same time.
>>
>> Yes, but qemu doesn't provide such API.
> 
> This series can't be merged unless it is safe.
> 
> Have you looked at op blockers and thought about how to prevent unsafe
> operations?

Hmm, only block jobs will write to the backing file? If so, op blockers can
prevent unsafe operations.

Thanks
Wen Congyang

> 
> Stefan
> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-15 14:55           ` Stefan Hajnoczi
  2015-10-16  0:52             ` Wen Congyang
@ 2015-10-16  2:22             ` Wen Congyang
  2015-10-16 11:37               ` Stefan Hajnoczi
  1 sibling, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-16  2:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>> +        /* start backup job now */
>>>>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>> +                        s->active_disk->backing_blocker);
>>>>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>> +                        s->hidden_disk->backing_blocker);
>>>>>
>>>>> Why is it safe to unblock these operations?
>>>>>
>>>>> Why do they have to be blocked for non-replication users?
>>>>
>>>> hidden_disk and secondary disk are opened as backing file, so it is blocked for
>>>> non-replication users.
>>>> What can I do if I don't unblock it and want to do backup?
>>>
>>> CCing Jeff Cody, block jobs maintainer
>>>
>>> You need to explain why it is safe remove this protection.  We can't
>>> merge code that may be unsafe.
>>>
>>> I think we can investigate further by asking: when does QEMU code assume
>>> the backing file is read-only?
>>
>> The backing file is opened in read-only mode. I want to reopen it in read-write
>> mode here in the next version(So the patch 1 will be dropped)
>>
>>>
>>> I haven't checked but these cases come to mind:
>>>
>>> Operations that move data between BDS in the backing chain (e.g. commit
>>> and stream block jobs) will lose or overwrite data if the backing file
>>> is being written to by another coroutine.
>>>
>>> We need to prevent users from running these operations at the same time.
>>
>> Yes, but qemu doesn't provide such API.
> 
> This series can't be merged unless it is safe.
> 
> Have you looked at op blockers and thought about how to prevent unsafe
> operations?

What about this solution:
1. unblock it in bdrv_set_backing_hd()
2. block it in qmp_block_commit(), qmp_block_stream(), qmp_block_backup()..., to
   prevent unsafe operations

Thanks
Wen Congyang

> 
> Stefan
> .
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-16  2:22             ` Wen Congyang
@ 2015-10-16 11:37               ` Stefan Hajnoczi
  2015-10-27  2:57                 ` Wen Congyang
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-16 11:37 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Fri, Oct 16, 2015 at 10:22:05AM +0800, Wen Congyang wrote:
> On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> > On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
> >> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
> >>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
> >>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> >>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> >>>>>> +        /* start backup job now */
> >>>>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> >>>>>> +                        s->active_disk->backing_blocker);
> >>>>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> >>>>>> +                        s->hidden_disk->backing_blocker);
> >>>>>
> >>>>> Why is it safe to unblock these operations?
> >>>>>
> >>>>> Why do they have to be blocked for non-replication users?
> >>>>
> >>>> hidden_disk and secondary disk are opened as backing file, so it is blocked for
> >>>> non-replication users.
> >>>> What can I do if I don't unblock it and want to do backup?
> >>>
> >>> CCing Jeff Cody, block jobs maintainer
> >>>
> >>> You need to explain why it is safe remove this protection.  We can't
> >>> merge code that may be unsafe.
> >>>
> >>> I think we can investigate further by asking: when does QEMU code assume
> >>> the backing file is read-only?
> >>
> >> The backing file is opened in read-only mode. I want to reopen it in read-write
> >> mode here in the next version(So the patch 1 will be dropped)
> >>
> >>>
> >>> I haven't checked but these cases come to mind:
> >>>
> >>> Operations that move data between BDS in the backing chain (e.g. commit
> >>> and stream block jobs) will lose or overwrite data if the backing file
> >>> is being written to by another coroutine.
> >>>
> >>> We need to prevent users from running these operations at the same time.
> >>
> >> Yes, but qemu doesn't provide such API.
> > 
> > This series can't be merged unless it is safe.
> > 
> > Have you looked at op blockers and thought about how to prevent unsafe
> > operations?
> 
> What about this solution:
> 1. unblock it in bdrv_set_backing_hd()
> 2. block it in qmp_block_commit(), qmp_block_stream(), qmp_block_backup()..., to
>    prevent unsafe operations

Come to think of it, currently QEMU only supports 1 block job per BDS.

This means that as long as COLO has a backup job running, no other block
jobs can interfere.

There still might be a risk with monitor commands like 'commit'.

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-16 11:37               ` Stefan Hajnoczi
@ 2015-10-27  2:57                 ` Wen Congyang
  2015-10-27 14:41                   ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Wen Congyang @ 2015-10-27  2:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On 10/16/2015 07:37 PM, Stefan Hajnoczi wrote:
> On Fri, Oct 16, 2015 at 10:22:05AM +0800, Wen Congyang wrote:
>> On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
>>> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>>>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
>>>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
>>>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>>>>>>>> +        /* start backup job now */
>>>>>>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>>>>>>>> +                        s->active_disk->backing_blocker);
>>>>>>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>>>>>>>> +                        s->hidden_disk->backing_blocker);
>>>>>>>
>>>>>>> Why is it safe to unblock these operations?
>>>>>>>
>>>>>>> Why do they have to be blocked for non-replication users?
>>>>>>
>>>>>> hidden_disk and secondary disk are opened as backing file, so it is blocked for
>>>>>> non-replication users.
>>>>>> What can I do if I don't unblock it and want to do backup?
>>>>>
>>>>> CCing Jeff Cody, block jobs maintainer
>>>>>
>>>>> You need to explain why it is safe remove this protection.  We can't
>>>>> merge code that may be unsafe.
>>>>>
>>>>> I think we can investigate further by asking: when does QEMU code assume
>>>>> the backing file is read-only?
>>>>
>>>> The backing file is opened in read-only mode. I want to reopen it in read-write
>>>> mode here in the next version(So the patch 1 will be dropped)
>>>>
>>>>>
>>>>> I haven't checked but these cases come to mind:
>>>>>
>>>>> Operations that move data between BDS in the backing chain (e.g. commit
>>>>> and stream block jobs) will lose or overwrite data if the backing file
>>>>> is being written to by another coroutine.
>>>>>
>>>>> We need to prevent users from running these operations at the same time.
>>>>
>>>> Yes, but qemu doesn't provide such API.
>>>
>>> This series can't be merged unless it is safe.
>>>
>>> Have you looked at op blockers and thought about how to prevent unsafe
>>> operations?
>>
>> What about this solution:
>> 1. unblock it in bdrv_set_backing_hd()
>> 2. block it in qmp_block_commit(), qmp_block_stream(), qmp_block_backup()..., to
>>    prevent unsafe operations
> 
> Come to think of it, currently QEMU only supports 1 block job per BDS.
> 
> This means that as long as COLO has a backup job running, no other block
> jobs can interfere.
> 
> There still might be a risk with monitor commands like 'commit'.

What about this?
diff --git a/block.c b/block.c
index e9f40dc..b181d67 100644
--- a/block.c
+++ b/block.c
@@ -1162,6 +1162,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
     /* Otherwise we won't be able to commit due to check in bdrv_commit */
     bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
                     bs->backing_blocker);
+    /*
+     * We do backup in 3 ways:
+     * 1. drive backup
+     *    The target bs is new opened, and the source is top BDS
+     * 2. blockdev backup
+     *    Both the source and the target are top BDSes.
+     * 3. internal backup(used for block replication)
+     *    Both the source and the target are backing file
+     *
+     * In case 1, and 2, the backing file is neither the source nor
+     * the target.
+     * In case 3, we will block the top BDS, so there is only one block
+     * job for the top BDS and its backing chain.
+     */
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+                    bs->backing_blocker);
 out:
     bdrv_refresh_limits(bs, NULL);
 }



> 
> Stefan
> .
> 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
  2015-10-27  2:57                 ` Wen Congyang
@ 2015-10-27 14:41                   ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-10-27 14:41 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block,
	Stefan Hajnoczi, Jeff Cody, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert

On Tue, Oct 27, 2015 at 10:57:42AM +0800, Wen Congyang wrote:
> On 10/16/2015 07:37 PM, Stefan Hajnoczi wrote:
> > On Fri, Oct 16, 2015 at 10:22:05AM +0800, Wen Congyang wrote:
> >> On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> >>> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
> >>>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
> >>>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
> >>>>>> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> >>>>>>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
> >>>>>>>> +        /* start backup job now */
> >>>>>>>> +        bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
> >>>>>>>> +                        s->active_disk->backing_blocker);
> >>>>>>>> +        bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
> >>>>>>>> +                        s->hidden_disk->backing_blocker);
> >>>>>>>
> >>>>>>> Why is it safe to unblock these operations?
> >>>>>>>
> >>>>>>> Why do they have to be blocked for non-replication users?
> >>>>>>
> >>>>>> hidden_disk and secondary disk are opened as backing file, so it is blocked for
> >>>>>> non-replication users.
> >>>>>> What can I do if I don't unblock it and want to do backup?
> >>>>>
> >>>>> CCing Jeff Cody, block jobs maintainer
> >>>>>
> >>>>> You need to explain why it is safe remove this protection.  We can't
> >>>>> merge code that may be unsafe.
> >>>>>
> >>>>> I think we can investigate further by asking: when does QEMU code assume
> >>>>> the backing file is read-only?
> >>>>
> >>>> The backing file is opened in read-only mode. I want to reopen it in read-write
> >>>> mode here in the next version(So the patch 1 will be dropped)
> >>>>
> >>>>>
> >>>>> I haven't checked but these cases come to mind:
> >>>>>
> >>>>> Operations that move data between BDS in the backing chain (e.g. commit
> >>>>> and stream block jobs) will lose or overwrite data if the backing file
> >>>>> is being written to by another coroutine.
> >>>>>
> >>>>> We need to prevent users from running these operations at the same time.
> >>>>
> >>>> Yes, but qemu doesn't provide such API.
> >>>
> >>> This series can't be merged unless it is safe.
> >>>
> >>> Have you looked at op blockers and thought about how to prevent unsafe
> >>> operations?
> >>
> >> What about this solution:
> >> 1. unblock it in bdrv_set_backing_hd()
> >> 2. block it in qmp_block_commit(), qmp_block_stream(), qmp_block_backup()..., to
> >>    prevent unsafe operations
> > 
> > Come to think of it, currently QEMU only supports 1 block job per BDS.
> > 
> > This means that as long as COLO has a backup job running, no other block
> > jobs can interfere.
> > 
> > There still might be a risk with monitor commands like 'commit'.
> 
> What about this?
> diff --git a/block.c b/block.c
> index e9f40dc..b181d67 100644
> --- a/block.c
> +++ b/block.c
> @@ -1162,6 +1162,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
>      /* Otherwise we won't be able to commit due to check in bdrv_commit */
>      bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
>                      bs->backing_blocker);
> +    /*
> +     * We do backup in 3 ways:
> +     * 1. drive backup
> +     *    The target bs is new opened, and the source is top BDS
> +     * 2. blockdev backup
> +     *    Both the source and the target are top BDSes.
> +     * 3. internal backup(used for block replication)
> +     *    Both the source and the target are backing file
> +     *
> +     * In case 1, and 2, the backing file is neither the source nor
> +     * the target.
> +     * In case 3, we will block the top BDS, so there is only one block
> +     * job for the top BDS and its backing chain.
> +     */
> +    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +                    bs->backing_blocker);

BLOCK_OP_TYPE_BACKUP_SOURCE does not modify the image so this should be
safe.

> +    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
> +                    bs->backing_blocker);

This one is trickier since it means write access, but
BLOCK_OP_TYPE_COMMIT_TARGET is already unblocked above.  At least it
should be no worse than allowing BLOCK_OP_TYPE_COMMIT_TARGET.

Jeff?

Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file
  2015-10-09 19:07   ` Eric Blake
@ 2015-10-30  5:49     ` Wen Congyang
  0 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2015-10-30  5:49 UTC (permalink / raw)
  To: Eric Blake, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei, Yang Hongyang

On 10/10/2015 03:07 AM, Eric Blake wrote:
> On 09/25/2015 12:17 AM, Wen Congyang wrote:
>> For block replication, we have such backing chain:
>>     secondary disk <-- hidden disk <-- active disk
>> secondary disk is top BDS(use bacing reference), so it can be opened in
> 
> s/BDS(use bacing/BDS (use backing/
> 
>> read-write mode. But hidden disk is read only, and we need to write to
>> hidden disk(backup job will write data to it).
> 
> s/disk(/disk (/
> 
>>
>> TODO: support opening backing file in read-write mode if the BDS is
>> created by QMP command blockdev-add.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  block.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 40 insertions(+), 1 deletion(-)
>>
> 
> I really don't like this patch.  We are able to automatically (re-)open
> backing files for write during block-commit, without having to expose a
> knob to the user then, so exposing a knob to the user here feels wrong.

I try to reopen the backing files for write when starting block replication.
I tested it, and it doesn't work.
Here is my usage:
command line:
-drive if=none,id=colo-disk1,file.filename=/data/images/kvm/suse/temp.img,driver=raw
-drive if=virtio,id=active-disk1,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing.file.filename=/data/images/kvm/suse/suse11_3.img,file.backing.backing.driver=raw,file.backing.backing.node-name=sdisk

{'execute': 'blockdev-remove-medium', 'arguments': {'device': 'colo-disk1'} }
{'execute': 'blockdev-insert-medium', 'arguments': {'device': 'colo-disk1', 'node-name': 'sdisk'} }
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.3.1', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk1', 'writable': true } }

All qmp command success, but the disk exported to nbd server is readonly even if I specify 
'writable': true in the QMP commad. The reason is that the BDS is readonly.

> 
>> +#define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
>> +static QemuOptsList backing_file_opts = {
>> +    .name = "backing_file",
>> +    .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = ALLOW_WRITE_BACKING_FILE,
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "allow writes to backing file",
>> +        },
> 
> And even if we DO need this knob (which I doubt), you need corresponding
> documentation of the knob in qapi/block-core.json, since we are trying
> to keep the command line and QMP in sync when it comes to adding new
> options.
> 

Yes, I know it, but I don't know how to do it. Update BlockdevOptions?

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-10-30  5:50 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-25  6:17 [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 01/10] allow writing to the backing file Wen Congyang
2015-10-09 19:07   ` Eric Blake
2015-10-30  5:49     ` Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-10-09 19:09   ` Eric Blake
2015-10-12 13:45   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-10-13  9:13     ` Wen Congyang
2015-10-14 14:12       ` Stefan Hajnoczi
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 03/10] Allow creating backup jobs when opening BDS Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 04/10] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 05/10] docs: block replication's description Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 06/10] Add new block driver interfaces to control block replication Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 07/10] quorum: implement block driver interfaces for " Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 08/10] Implement new driver " Wen Congyang
2015-10-12 16:25   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-10-13  8:59     ` Wen Congyang
2015-10-13  9:41       ` Fam Zheng
2015-10-13  9:46         ` Wen Congyang
2015-10-13 10:12           ` Fam Zheng
2015-10-14  0:55             ` Wen Congyang
2015-10-12 16:27   ` Stefan Hajnoczi
2015-10-13  9:08     ` Wen Congyang
2015-10-14 14:27       ` Stefan Hajnoczi
2015-10-15  2:19         ` Wen Congyang
2015-10-15 14:55           ` Stefan Hajnoczi
2015-10-16  0:52             ` Wen Congyang
2015-10-16  2:22             ` Wen Congyang
2015-10-16 11:37               ` Stefan Hajnoczi
2015-10-27  2:57                 ` Wen Congyang
2015-10-27 14:41                   ` Stefan Hajnoczi
2015-10-12 16:31   ` Stefan Hajnoczi
2015-10-13  9:09     ` Wen Congyang
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 09/10] support replication driver in blockdev-add Wen Congyang
2015-10-09 14:40   ` Eric Blake
2015-09-25  6:17 ` [Qemu-devel] [PATCH v10 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang
2015-10-07  6:39 ` [Qemu-devel] [PATCH v10 00/10] Block replication for continuous checkpoints Wen Congyang
2015-10-09 12:52   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.