All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: Fam Zheng <famz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Yang Hongyang <yanghy@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	qemu block <qemu-block@nongnu.org>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu devel <qemu-devel@nongnu.org>,
	Gonglei <arei.gonglei@huawei.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Max Reitz <mreitz@redhat.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: Re: [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description
Date: Fri, 3 Apr 2015 10:35:55 +0800	[thread overview]
Message-ID: <551DFC8B.2020007@cn.fujitsu.com> (raw)
In-Reply-To: <20150326063145.GG14724@ad.nay.redhat.com>

On 03/26/2015 02:31 PM, Fam Zheng wrote:
> On Wed, 03/25 17:36, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  docs/block-replication.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 147 insertions(+)
>>  create mode 100644 docs/block-replication.txt
>>
>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>> new file mode 100644
>> index 0000000..874ed8e
>> --- /dev/null
>> +++ b/docs/block-replication.txt
>> @@ -0,0 +1,147 @@
>> +Block replication
>> +----------------------------------------
>> +Copyright Fujitsu, Corp. 2015
>> +Copyright (c) 2015 Intel Corporation
>> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>> +
>> +This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +See the COPYING file in the top-level directory.
>> +
>> +The block replication is used for continuous checkpoints. It is designed
>> +for COLO that Secondary VM is running. It can also be applied for FT/HA
>> +scene that Secondary VM is not running.
>> +
>> +This document gives an overview of block replication's design.
>> +
>> +== Background ==
>> +High availability solutions such as micro checkpoint and COLO will do
>> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
>> +identical right after a VM checkpoint, but becomes different as the VM
>> +executes till the next checkpoint. To support disk contents checkpoint,
>> +the modified disk contents in the Secondary VM must be buffered, and are
>> +only dropped at next checkpoint time. To reduce the network transportation
>> +effort at the time of checkpoint, the disk modification operations of
>> +Primary disk are asynchronously forwarded to the Secondary node.
>> +
>> +== Workflow ==
>> +The following is the image of block replication workflow:
>> +
>> +        +----------------------+            +------------------------+
>> +        |Primary Write Requests|            |Secondary Write Requests|
>> +        +----------------------+            +------------------------+
>> +                  |                                       |
>> +                  |                                      (4)
>> +                  |                                       V
>> +                  |                              /-------------\
>> +                  |      Copy and Forward        |             |
>> +                  |---------(1)----------+       | Disk Buffer |
>> +                  |                      |       |             |
>> +                  |                     (3)      \-------------/
>> +                  |                 speculative      ^
>> +                  |                write through    (2)
>> +                  |                      |           |
>> +                  V                      V           |
>> +           +--------------+           +----------------+
>> +           | Primary Disk |           | Secondary Disk |
>> +           +--------------+           +----------------+
>> +
>> +    1) Primary write requests will be copied and forwarded to Secondary
>> +       QEMU.
>> +    2) Before Primary write requests are written to Secondary disk, the
>> +       original sector content will be read from Secondary disk and
>> +       buffered in the Disk buffer, but it will not overwrite the existing
>> +       sector content in the Disk buffer.
> 
> Could you elaborate a bit about the "existing sector content" here? IIUC, it
> could be from either "Secondary Write Requests" or previous COW of "Primary
> Write Requests". Is that right?
> 
>> +    3) Primary write requests will be written to Secondary disk.
>> +    4) Secondary write requests will be buffered in the Disk buffer and it
>> +       will overwrite the existing sector content in the buffer.
>> +
>> +== Architecture ==
>> +We are going to implement COLO block replication from many basic
>> +blocks that are already in QEMU.
>> +
>> +         virtio-blk       ||
>> +             ^            ||                            .----------
>> +             |            ||                            | Secondary
>> +        1 Quorum          ||                            '----------
>> +         /      \         ||
>> +        /        \        ||
>> +   Primary      2 NBD  ------->  2 NBD
>> +     disk       client    ||     server                                         virtio-blk
>> +                          ||        ^                                                ^
>> +--------.                 ||        |                                                |
>> +Primary |                 ||  Secondary disk <--------- hidden-disk 4 <--------- active-disk 3
>> +--------'                 ||        |          backing        ^       backing
>> +                          ||        |                         |
>> +                          ||        |                         |
>> +                          ||        '-------------------------'
>> +                          ||           drive-backup sync=none
>> +
>> +1) The disk on the primary is represented by a block device with two
>> +children, providing replication between a primary disk and the host that
>> +runs the secondary VM. The read pattern for quorum can be extended to
>> +make the primary always read from the local disk instead of going through
>> +NBD.
>> +
>> +2) The secondary disk receives writes from the primary VM through QEMU's
>> +embedded NBD server (speculative write-through).
>> +
>> +3) The disk on the secondary is represented by a custom block device
>> +(called active-disk). It should be an empty disk, and the format should
>> +be qcow2.
>> +
>> +4) The hidden-disk is created automatically. It buffers the original content
>> +that is modified by the primary VM. It should also be an empty disk, and
>> +the dirver supports bdrv_make_empty().
>> +
>> +== New block driver interface ==
>> +We add three block driver interfaces to control block replication:
>> +a. bdrv_start_replication()
>> +   Start block replication, called in migration/checkpoint thread.
>> +   We must call bdrv_start_replication() in secondary QEMU before
>> +   calling bdrv_start_replication() in primary QEMU.
>> +b. bdrv_do_checkpoint()
>> +   This interface is called after all VM state is transfered to
>> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
>> +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
>> +   thread.
>> +c. bdrv_stop_replication()
>> +   It is called when failover. We will flush the Disk buffer into
>> +   Secondary Disk and stop block replication. The vm should be stopped
>> +   before calling it. The caller must hold the I/O mutex lock if it is
>> +   in migration/checkpoint thread.
>> +
>> +== Usage ==
>> +Primary:
>> +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
>> +         children.0.file.filename=1.raw,\
>> +         children.0.driver=raw,\
>> +         children.1.file.driver=nbd+colo,\
>> +         children.1.file.host=xxx,\
>> +         children.1.file.port=xxx,\
>> +         children.1.file.export=xxx,\
>> +         children.1.driver=raw,\
>> +         children.1.ignore-errors=on
>    ^^^^^^^^^
> Won't the leading spaces cause trouble? :)
> 
>> +  Note:
>> +  1. NBD Client should not be the first child of quorum.
>> +  2. There should be only one NBD Client.
>> +  3. host is the secondary physical machine's hostname or IP
>> +  4. Each disk must have its own export name.
>> +
>> +Secondary:
>> +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
>> +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
>> +         backing_reference.drive_id=nbd_target1,\
>> +         backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
>> +         backing_reference.hidden-disk.driver=qcow2,\
>> +         backing_reference.hidden-disk.allow-write-backing-file=on
>> +  Then run qmp command:
>> +    nbd_server_start host:port
> 
> s/nbd_server_start/nbd-server-start

The command name in qmp-commands.hx is nbd-server-start, but
(qemu) nbd-server-start 192.168.3.1:8889
unknown command: 'nbd-server-start'

What's wrong?

Thanks
Wen Congyang

> 
> For a few more too.
> 
> 
>> +  Note:
>> +  1. The export name for the same disk must be the same in primary
>> +     and secondary QEMU command line
>> +  2. The qmp command nbd_server_start must be run before running the
>> +     qmp command migrate on primary QEMU
>> +  3. Don't use nbd_server_start's other options
>> +  4. Active disk, hidden disk and nbd target's length should be the
>> +     same.
>> +  5. It is better to put active disk and hidden disk in ramdisk.
> 
> Fam
> .
> 

  parent reply	other threads:[~2015-04-03  2:33 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-25  9:36 [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 01/13] docs: block replication's description Wen Congyang
2015-03-25 15:38   ` [Qemu-devel] [Qemu-block] " Eric Blake
2015-03-26  8:58     ` Wen Congyang
2015-03-26 10:28       ` Gonglei
2015-03-26 12:30         ` Eric Blake
2015-03-26 12:46           ` Gonglei
2015-03-26  6:31   ` [Qemu-devel] " Fam Zheng
2015-03-26  7:17     ` Wen Congyang
2015-04-03  2:35     ` Wen Congyang [this message]
2015-04-03  5:19       ` Fam Zheng
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 02/13] quorum: allow ignoring child errors Wen Congyang
2015-03-25 12:45   ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 03/13] NBD client: connect to nbd server later Wen Congyang
2015-03-25 12:46   ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication Wen Congyang
2015-03-25 12:48   ` Paolo Bonzini
2015-03-25 15:43     ` Eric Blake
2015-03-26  7:12   ` Fam Zheng
2015-03-26  7:22     ` Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 05/13] quorum: implement block driver interfaces for " Wen Congyang
2015-03-25 12:50   ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 06/13] NBD client: " Wen Congyang
2015-03-25 12:50   ` Paolo Bonzini
2015-03-26  7:21   ` Fam Zheng
2015-03-26  7:32     ` Wen Congyang
2015-03-27  1:06       ` Fam Zheng
2015-03-27  1:16         ` Wen Congyang
2015-03-27  7:34         ` [Qemu-devel] Use of QERR_ macros and error classes (was: [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication) Markus Armbruster
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 07/13] allow writing to the backing file Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS Wen Congyang
2015-03-26  7:07   ` Fam Zheng
2015-03-26  7:14     ` Wen Congyang
2015-03-26  7:18       ` Fam Zheng
2015-03-26  7:23         ` Wen Congyang
2015-03-26 13:53           ` Paolo Bonzini
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 09/13] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
2015-03-26  7:31   ` Fam Zheng
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 10/13] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-03-25 12:55   ` Paolo Bonzini
2015-03-26  0:59     ` Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 11/13] qcow2: support colo Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication Wen Congyang
2015-03-26  7:03   ` Fam Zheng
2015-03-26  7:15     ` Wen Congyang
2015-03-25  9:36 ` [Qemu-devel] [RFC PATCH COLO v2 13/13] Don't allow a disk use backing reference target Wen Congyang
2015-03-25 12:56 ` [Qemu-devel] [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints Paolo Bonzini
2015-03-25 14:24 ` Dr. David Alan Gilbert
2015-03-26  2:34   ` Gonglei
2015-07-01  3:09 ` Michael R. Hines
2015-07-01  4:11   ` Wen Congyang
2015-07-01 19:30     ` Michael R. Hines
2015-07-01 19:37     ` Michael R. Hines
2015-07-02  0:58       ` Wen Congyang
2015-07-02  1:43       ` Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551DFC8B.2020007@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=arei.gonglei@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.