From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40520) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aQckz-00022w-HY for qemu-devel@nongnu.org; Tue, 02 Feb 2016 10:20:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aQckw-0002n7-Qg for qemu-devel@nongnu.org; Tue, 02 Feb 2016 10:20:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40990) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aQckw-0002n0-I4 for qemu-devel@nongnu.org; Tue, 02 Feb 2016 10:20:38 -0500 References: <1452676712-24239-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1452676712-24239-6-git-send-email-xiecl.fnst@cn.fujitsu.com> From: Eric Blake Message-ID: <56B0C945.60100@redhat.com> Date: Tue, 2 Feb 2016 08:20:37 -0700 MIME-Version: 1.0 In-Reply-To: <1452676712-24239-6-git-send-email-xiecl.fnst@cn.fujitsu.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="F02x4TA4nPMrdOIaN2lPI9hG7B4qsW7gW" Subject: Re: [Qemu-devel] [PATCH v14 5/8] docs: block replication's description List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Changlong Xie , qemu devel , Fam Zheng , Max Reitz , Paolo Bonzini , Kevin Wolf , Stefan Hajnoczi Cc: Gonglei , zhanghailiang , fnstml-hwcolo@cn.fujitsu.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --F02x4TA4nPMrdOIaN2lPI9hG7B4qsW7gW Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 01/13/2016 02:18 AM, Changlong Xie wrote: > From: Wen Congyang >=20 > Signed-off-by: Wen Congyang > Signed-off-by: zhanghailiang > Signed-off-by: Gonglei > Signed-off-by: Changlong Xie > --- > docs/block-replication.txt | 229 +++++++++++++++++++++++++++++++++++++= ++++++++ > 1 file changed, 229 insertions(+) > create mode 100644 docs/block-replication.txt >=20 > diff --git a/docs/block-replication.txt b/docs/block-replication.txt > new file mode 100644 > index 0000000..d1a231e > --- /dev/null > +++ b/docs/block-replication.txt > @@ -0,0 +1,229 @@ > +Block replication > +---------------------------------------- > +Copyright Fujitsu, Corp. 2015 > +Copyright (c) 2015 Intel Corporation > +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD. Do you want to claim 2016 for any of this? > + > +This work is licensed under the terms of the GNU GPL, version 2 or lat= er. > +See the COPYING file in the top-level directory. > + > +Block replication is used for continuous checkpoints. It is designed > +for COLO (COarse-grain LOck-stepping) where the Secondary VM is runnin= g. > +It can also be applied for FT/HA (Fault-tolerance/High Assurance) scen= ario, > +where the Secondary VM is not running. > + > +This document gives an overview of block replication's design. > + > +=3D=3D Background =3D=3D > +High availability solutions such as micro checkpoint and COLO will do > +consecutive checkpoints. The VM state of Primary VM and Secondary VM i= s s/of Primary/of the Primary/ > +identical right after a VM checkpoint, but becomes different as the VM= > +executes till the next checkpoint. To support disk contents checkpoint= , > +the modified disk contents in the Secondary VM must be buffered, and a= re > +only dropped at next checkpoint time. To reduce the network transporta= tion > +effort at the time of checkpoint, the disk modification operations of s/at the time of/during a vmstate/ s/operations of/operations of the/ > +Primary disk are asynchronously forwarded to the Secondary node. > + > +=3D=3D Workflow =3D=3D > +=3D=3D Architecture =3D=3D > + > +6) The drive-backup job(sync=3Dnone) is run to allow hidden-disk to bu= ffer Space before ( in English description. > +any state that would otherwise be lost by the speculative write-throug= h > +of the NBD server into the secondary disk. So before block replication= , > +the primary disk and secondary disk should contain the same data. > + > +=3D=3D Failure Handling =3D=3D > +=3D=3D Usage =3D=3D > +Primary: > + -drive if=3Dxxx,driver=3Dquorum,read-pattern=3Dfifo,id=3Dcolo1,vote-= threshold=3D1,\ > + children.0.file.filename=3D1.raw,\ > + children.0.driver=3Draw > + > + Run qmp command in primary qemu: > + { 'execute': 'human-monitor-command', > + 'arguments': { > + 'command-line': 'drive_add buddy driver=3Dreplication,mode=3D= primary,file.driver=3Dnbd,file.host=3Dxxxx,file.port=3Dxxxx,file.export=3D= colo1,node-name=3Dnbd_client1,if=3Dnone' Eww. We shouldn't ever have to pack a command line as a single QMP string that needs reparsing. Instead, you should pass the information as a nested QMP dictionary, something like: 'arguments': { 'remote-command': { 'command': 'drive_add', 'name': 'buddy', 'driver': 'replication', 'mode': 'primary', 'file': { 'driver': 'nbd', 'host': 'xxxx', ... } } } > + } > + } > + { 'execute': 'x-blockdev-change', > + 'arguments': { > + 'parent': 'colo1', > + 'node': 'nbd_client1' > + } > + } > + Note: > + 1. There should be only one NBD Client for each primary disk. > + 2. host is the secondary physical machine's hostname or IP > + 3. Each disk must have its own export name. > + 4. It is all a single argument to -drive and you should ignore the > + leading whitespace. > + 5. The qmp command line must be run after running qmp command line i= n > + secondary qemu. > + > +Secondary: > + -drive if=3Dnone,driver=3Draw,file.filename=3D1.raw,id=3Dcolo1 \ > + -drive if=3Dxxx,driver=3Dreplication,mode=3Dsecondary,\ > + file.file.filename=3Dactive_disk.qcow2,\ > + file.driver=3Dqcow2,\ > + file.backing.file.filename=3Dhidden_disk.qcow2,\ > + file.backing.driver=3Dqcow2,\ > + file.backing.backing=3Dcolo1 > + > + Then run qmp command in secondary qemu: > + { 'execute': 'nbd-server-start', > + 'arguments': { > + 'addr': { > + 'type': 'inet', > + 'data': { > + 'host': 'xxx', > + 'port': 'xxx' > + } > + } > + } > + } > + { 'execute': 'nbd-server-add', > + 'arguments': { > + 'device': 'colo1', > + 'writable': true > + } > + } > + > + Note: > + 1. The export name in secondary QEMU command line is the secondary > + disk's id. > + 2. The export name for the same disk must be the same > + 3. The qmp command nbd-server-start and nbd-server-add must be run > + before running the qmp command migrate on primary QEMU > + 4. Active disk, hidden disk and nbd target's length should be the > + same. > + 5. It is better to put active disk and hidden disk in ramdisk. > + 6. It is all a single argument to -drive, and you should ignore > + the leading whitespace. > + > +After Failover: > +Primary: > + The secondary host is down, so we should run the following qmp comma= nd > + to remove the nbd child from the quorum: > + { 'execute': 'x-blockdev-change', > + 'arguments': { > + 'parent': 'colo1', > + 'child': 'children.1' > + } > + } > + Note: there is no qmp command to remove the blockdev now > + > +Secondary: > + The primary host is down, so we should do the following thing: > + { 'execute': 'nbd-server-stop' } > + > +TODO: > +1. Continuous block replication > +2. Shared disk >=20 --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --F02x4TA4nPMrdOIaN2lPI9hG7B4qsW7gW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWsMlFAAoJEKeha0olJ0Nqh/4IAJvI/obkD+hSLGxHcETPjkyp vR/kETDEKVHqWFSO0EiNH9UmpOZRSR7m2m3rBA+Di8AsF/cCUcaE4lFuhBe/dviG CbEdKDKVrvzP4bpRUvAbdjdb8p+Amvk3KMhx9JgaboVPpkfnN+KY7KfeJWKfWsgS 73vfG4JLJ+1X+DrVgKX4HmVXrEGNxDFoL6+vuDZIdoaaSgBBMsglCRbZ+miaAK5H ScTBchKsMBedTvwywI3FFBjiy65/Iu12SqBImiD4OrXdfAUnlqGco+P9iz6fOrTy g46ba1Ia12yS168CJwVqbpNJcoWehEqKcBiTRyjr61/bu+5WTfS2yVKzoTM0MIY= =Liba -----END PGP SIGNATURE----- --F02x4TA4nPMrdOIaN2lPI9hG7B4qsW7gW--