From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XRdfh-00018g-G3 for qemu-devel@nongnu.org; Wed, 10 Sep 2014 04:54:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XRdfa-0002Q3-G6 for qemu-devel@nongnu.org; Wed, 10 Sep 2014 04:54:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:1422) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XRdfa-0002PC-88 for qemu-devel@nongnu.org; Wed, 10 Sep 2014 04:54:30 -0400 Date: Wed, 10 Sep 2014 16:54:19 +0800 From: Fam Zheng Message-ID: <20140910085419.GB26233@fam-t430.nay.redhat.com> References: <1408723870-7826-1-git-send-email-benoit.canet@nodalink.com> <1408723870-7826-2-git-send-email-benoit.canet@nodalink.com> <20140825060424.GA17482@T430.nay.redhat.com> <20140825090611.GA18202@nodalink.com> <20140825093737.GA25434@T430.nay.redhat.com> <20140825121219.GB18202@nodalink.com> <20140826044204.GB2517@T430.nay.redhat.com> <20140826064554.GA13982@nodalink.com> <20140904204256.GF25226@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20140904204256.GF25226@stefanha-thinkpad.redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] block: Make op blockers recursive List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, jcody@redhat.com, stefanha@redhat.com, =?iso-8859-1?Q?Beno=EEt?= Canet , qemu-devel@nongnu.org On Thu, 09/04 21:42, Stefan Hajnoczi wrote: > On Tue, Aug 26, 2014 at 06:45:54AM +0000, Beno=EEt Canet wrote: > > On Tue, Aug 26, 2014 at 12:42:04PM +0800, Fam Zheng wrote: > > > On Mon, 08/25 12:12, Beno=EEt Canet wrote: > > > > On Mon, Aug 25, 2014 at 05:37:37PM +0800, Fam Zheng wrote: > > > > > On Mon, 08/25 09:06, Beno=EEt Canet wrote: > > > > > > On Mon, Aug 25, 2014 at 02:04:24PM +0800, Fam Zheng wrote: > > > > > > > On Fri, 08/22 18:11, Beno=EEt Canet wrote: > > > > > > > > Since the block layer code is starting to modify the BDS = graph right in the > > > > > > > > middle of BDS chains (block-mirror's replace parameter fo= r example) QEMU needs > > > > > > > > to properly block and unblock whole BDS subtrees; recursi= on is a neat way to > > > > > > > > achieve this task. > > > > > > > >=20 > > > > > > > > This patch also takes care of modifying the op blockers u= sers. > > > > > > >=20 > > > > > > > Is this going to replace backing_blocker? > > > > > > >=20 > > > > > > > I think it is too general an approach to control the operat= ion properly, > > > > > > > because the op blocker may not work in the same way for all= types of BDS > > > > > > > connections. In other words, the choosing of op blockers a= re likely > > > > > > > conditional on graph edge types, that's why backing_blocker= was added here. For > > > > > > > example, A VMDK extent connection will probably need a diff= erent set of > > > > > > > blockers than bs->file connection. > > > > > > >=20 > > > > > > > So could you explain in which cases is the recursive blocki= ng/unblocking > > > > > > > useful? > > > > > >=20 > > > > > > It's designed for the new crop of block operations operating = on BDS located in > > > > > > the middle of the backing chain: Jeff's patches, intermediate= live streaming or > > > > > > intermediate mirroring. > > > > > > Recursively blocking BDS allows to do these operations safely. > > > > >=20 > > > > > Sorry I may be slow on this, but it's still not clear to me. > > > > >=20 > > > > > That doesn't immediately show how backing_blocker doesn't work.= These > > > > > operations are in the category of operations that update graph = topology, > > > > > meaning that they drop, add or swap some nodes in the middle of= the chain. It > > > > > is not safe because there are used by the other nodes, but they= are supposed to > > > > > be protected by backing_blocker. Could you be more specific? > > > >=20 > > > > I don't know particularly about the backing blocker case. > > > >=20 > > > > >=20 > > > > > I can think of something more than backing_hd: there are also l= ink types other > > > > > than backing_hd, for example ->file, (vmdk)->extents or (quorum= )->qcrs, etc. > > > > > They should be protected as well. > > > >=20 > > > > This patch takes cares of recursing everywhere. > > > >=20 > > > > I can give you an example for quorum. > > > >=20 > > > > If a streaming operation is running on a quorum block backend the= recursive > > > > blocking will help to block any operation done directly on any of= the children. > > >=20 > > > At what points should block layer recursively block/unblock the ope= rations in > > > this quorum case? > >=20 > > When the streaming starts it should block all the top bs children. > > So after when an operation tries to operate on a child of the top bs = it will be > > forbidden. > >=20 > > The beauty of it is that recursive blockers can easily replace regula= r blockers. >=20 > Let's think of a situation that recursive blockers protect but > backing_blocker does not: >=20 > a <- b <- c <- d >=20 > c is the backing file and is therefore protected by the op blocker. >=20 > The block-commit command works with node-names, however, so we can > manipulate any nodes in the graph, not just the topmost one. Try this: >=20 > block-commit d > block-commit b >=20 > I haven't checked yet but I suspect it will launch two block-commit job= s > on the same partial chain (that's a bad thing because it can lead to > corruption). 1) Does block-commit work with node-names already? In other words, is block-commit b possible now? I only see drive-mirror works with it, bu= t not drive-backup, block-mirror or block-commit. 2) Regardless of the answer to 1), I think we could use a similar approac= h as drive-backup here: split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE,TARGET}, and only unblock BLOCK_OP_TYPE_COMMIT_TARGET in bdrv_set_backing_hd. Unblocking BLOCK_OP_TYPE_COMMIT on backing_hd is wrong as long as we expo= se "d" to block-commit, with node-name. As the next step, let's think about what it takes to safely allow commit = d to b with: a <- b <- c <- d <- e <- f <- g I think the answer is: if the commit job checks and sets blockers on rang= e [d, b], it is safe. Because from e, f and g's points of view, the whole backing c= hain is always complete and working, just like how device model thinks of the = top node when it's being active committed nowadays. Based on that, we can even start commit on non-intersecting partial chain= s at the same time: block-commit g up to e block-commit d up to b , if we want to. That is only possible with a more specific blocking logic, instead of rec= ursive blocker. Fam >=20 > With recursive blockers, not just c but also a and b are protected. >=20 > To me, this demonstrates that recursive op blockers are safer than > non-recursive backing_blocker and will prevent real-world cases that > could lead to corruption. >=20 > BTW I'm fairly happy with the patch itself, but like Fam I'm still > thinking through different cases to convince myself the design is sound. >=20 > Stefan