On 01/13/2016 02:18 AM, Changlong Xie wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
> ---
>  docs/block-replication.txt | 229 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 229 insertions(+)
>  create mode 100644 docs/block-replication.txt
> 
> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
> new file mode 100644
> index 0000000..d1a231e
> --- /dev/null
> +++ b/docs/block-replication.txt
> @@ -0,0 +1,229 @@
> +Block replication
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2015
> +Copyright (c) 2015 Intel Corporation
> +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.

Do you want to claim 2016 for any of this?

> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +Block replication is used for continuous checkpoints. It is designed
> +for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
> +It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
> +where the Secondary VM is not running.
> +
> +This document gives an overview of block replication's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoints. The VM state of Primary VM and Secondary VM is

s/of Primary/of the Primary/

> +identical right after a VM checkpoint, but becomes different as the VM
> +executes till the next checkpoint. To support disk contents checkpoint,
> +the modified disk contents in the Secondary VM must be buffered, and are
> +only dropped at next checkpoint time. To reduce the network transportation
> +effort at the time of checkpoint, the disk modification operations of

s/at the time of/during a vmstate/
s/operations of/operations of the/

> +Primary disk are asynchronously forwarded to the Secondary node.
> +
> +== Workflow ==

> +== Architecture ==

> +
> +6) The drive-backup job(sync=none) is run to allow hidden-disk to buffer

Space before ( in English description.

> +any state that would otherwise be lost by the speculative write-through
> +of the NBD server into the secondary disk. So before block replication,
> +the primary disk and secondary disk should contain the same data.
> +
> +== Failure Handling ==

> +== Usage ==
> +Primary:
> +  -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
> +         children.0.file.filename=1.raw,\
> +         children.0.driver=raw
> +
> +  Run qmp command in primary qemu:
> +    { 'execute': 'human-monitor-command',
> +      'arguments': {
> +          'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=xxxx,file.port=xxxx,file.export=colo1,node-name=nbd_client1,if=none'

Eww. We shouldn't ever have to pack a command line as a single QMP
string that needs reparsing.  Instead, you should pass the information
as a nested QMP dictionary, something like:

'arguments': {
  'remote-command': { 'command': 'drive_add', 'name': 'buddy',
                      'driver': 'replication', 'mode': 'primary',
                      'file': { 'driver': 'nbd', 'host': 'xxxx',
  ... } } }

> +      }
> +    }
> +    { 'execute': 'x-blockdev-change',
> +      'arguments': {
> +          'parent': 'colo1',
> +          'node': 'nbd_client1'
> +      }
> +    }
> +  Note:
> +  1. There should be only one NBD Client for each primary disk.
> +  2. host is the secondary physical machine's hostname or IP
> +  3. Each disk must have its own export name.
> +  4. It is all a single argument to -drive and you should ignore the
> +     leading whitespace.
> +  5. The qmp command line must be run after running qmp command line in
> +     secondary qemu.
> +
> +Secondary:
> +  -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
> +  -drive if=xxx,driver=replication,mode=secondary,\
> +         file.file.filename=active_disk.qcow2,\
> +         file.driver=qcow2,\
> +         file.backing.file.filename=hidden_disk.qcow2,\
> +         file.backing.driver=qcow2,\
> +         file.backing.backing=colo1
> +
> +  Then run qmp command in secondary qemu:
> +    { 'execute': 'nbd-server-start',
> +      'arguments': {
> +          'addr': {
> +              'type': 'inet',
> +              'data': {
> +                  'host': 'xxx',
> +                  'port': 'xxx'
> +              }
> +          }
> +      }
> +    }
> +    { 'execute': 'nbd-server-add',
> +      'arguments': {
> +          'device': 'colo1',
> +          'writable': true
> +      }
> +    }
> +
> +  Note:
> +  1. The export name in secondary QEMU command line is the secondary
> +     disk's id.
> +  2. The export name for the same disk must be the same
> +  3. The qmp command nbd-server-start and nbd-server-add must be run
> +     before running the qmp command migrate on primary QEMU
> +  4. Active disk, hidden disk and nbd target's length should be the
> +     same.
> +  5. It is better to put active disk and hidden disk in ramdisk.
> +  6. It is all a single argument to -drive, and you should ignore
> +     the leading whitespace.
> +
> +After Failover:
> +Primary:
> +  The secondary host is down, so we should run the following qmp command
> +  to remove the nbd child from the quorum:
> +  { 'execute': 'x-blockdev-change',
> +    'arguments': {
> +        'parent': 'colo1',
> +        'child': 'children.1'
> +    }
> +  }
> +  Note: there is no qmp command to remove the blockdev now
> +
> +Secondary:
> +  The primary host is down, so we should do the following thing:
> +  { 'execute': 'nbd-server-stop' }
> +
> +TODO:
> +1. Continuous block replication
> +2. Shared disk
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org