From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56207) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XRiit-0007Li-IW for qemu-devel@nongnu.org; Wed, 10 Sep 2014 10:18:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XRiin-0003Qo-DJ for qemu-devel@nongnu.org; Wed, 10 Sep 2014 10:18:15 -0400 Received: from dew.nodalink.com ([95.130.14.197]:43168) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XRiin-0003QF-3p for qemu-devel@nongnu.org; Wed, 10 Sep 2014 10:18:09 -0400 Date: Wed, 10 Sep 2014 14:18:06 +0000 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140910141806.GB30703@nodalink.com> References: <1408723870-7826-1-git-send-email-benoit.canet@nodalink.com> <1408723870-7826-2-git-send-email-benoit.canet@nodalink.com> <20140825060424.GA17482@T430.nay.redhat.com> <20140825090611.GA18202@nodalink.com> <20140825093737.GA25434@T430.nay.redhat.com> <20140825121219.GB18202@nodalink.com> <20140826044204.GB2517@T430.nay.redhat.com> <20140826064554.GA13982@nodalink.com> <20140904204256.GF25226@stefanha-thinkpad.redhat.com> <20140910085419.GB26233@fam-t430.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20140910085419.GB26233@fam-t430.nay.redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] block: Make op blockers recursive List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: kwolf@redhat.com, Stefan Hajnoczi , jcody@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, =?iso-8859-1?Q?Beno=EEt?= Canet On Wed, Sep 10, 2014 at 04:54:19PM +0800, Fam Zheng wrote: > On Thu, 09/04 21:42, Stefan Hajnoczi wrote: > > On Tue, Aug 26, 2014 at 06:45:54AM +0000, Beno=EEt Canet wrote: > > > On Tue, Aug 26, 2014 at 12:42:04PM +0800, Fam Zheng wrote: > > > > On Mon, 08/25 12:12, Beno=EEt Canet wrote: > > > > > On Mon, Aug 25, 2014 at 05:37:37PM +0800, Fam Zheng wrote: > > > > > > On Mon, 08/25 09:06, Beno=EEt Canet wrote: > > > > > > > On Mon, Aug 25, 2014 at 02:04:24PM +0800, Fam Zheng wrote: > > > > > > > > On Fri, 08/22 18:11, Beno=EEt Canet wrote: > > > > > > > > > Since the block layer code is starting to modify the BD= S graph right in the > > > > > > > > > middle of BDS chains (block-mirror's replace parameter = for example) QEMU needs > > > > > > > > > to properly block and unblock whole BDS subtrees; recur= sion is a neat way to > > > > > > > > > achieve this task. > > > > > > > > >=20 > > > > > > > > > This patch also takes care of modifying the op blockers= users. > > > > > > > >=20 > > > > > > > > Is this going to replace backing_blocker? > > > > > > > >=20 > > > > > > > > I think it is too general an approach to control the oper= ation properly, > > > > > > > > because the op blocker may not work in the same way for a= ll types of BDS > > > > > > > > connections. In other words, the choosing of op blockers= are likely > > > > > > > > conditional on graph edge types, that's why backing_block= er was added here. For > > > > > > > > example, A VMDK extent connection will probably need a di= fferent set of > > > > > > > > blockers than bs->file connection. > > > > > > > >=20 > > > > > > > > So could you explain in which cases is the recursive bloc= king/unblocking > > > > > > > > useful? > > > > > > >=20 > > > > > > > It's designed for the new crop of block operations operatin= g on BDS located in > > > > > > > the middle of the backing chain: Jeff's patches, intermedia= te live streaming or > > > > > > > intermediate mirroring. > > > > > > > Recursively blocking BDS allows to do these operations safe= ly. > > > > > >=20 > > > > > > Sorry I may be slow on this, but it's still not clear to me. > > > > > >=20 > > > > > > That doesn't immediately show how backing_blocker doesn't wor= k. These > > > > > > operations are in the category of operations that update grap= h topology, > > > > > > meaning that they drop, add or swap some nodes in the middle = of the chain. It > > > > > > is not safe because there are used by the other nodes, but th= ey are supposed to > > > > > > be protected by backing_blocker. Could you be more specific? > > > > >=20 > > > > > I don't know particularly about the backing blocker case. > > > > >=20 > > > > > >=20 > > > > > > I can think of something more than backing_hd: there are also= link types other > > > > > > than backing_hd, for example ->file, (vmdk)->extents or (quor= um)->qcrs, etc. > > > > > > They should be protected as well. > > > > >=20 > > > > > This patch takes cares of recursing everywhere. > > > > >=20 > > > > > I can give you an example for quorum. > > > > >=20 > > > > > If a streaming operation is running on a quorum block backend t= he recursive > > > > > blocking will help to block any operation done directly on any = of the children. > > > >=20 > > > > At what points should block layer recursively block/unblock the o= perations in > > > > this quorum case? > > >=20 > > > When the streaming starts it should block all the top bs children. > > > So after when an operation tries to operate on a child of the top b= s it will be > > > forbidden. > > >=20 > > > The beauty of it is that recursive blockers can easily replace regu= lar blockers. > >=20 > > Let's think of a situation that recursive blockers protect but > > backing_blocker does not: > >=20 > > a <- b <- c <- d > >=20 > > c is the backing file and is therefore protected by the op blocker. > >=20 > > The block-commit command works with node-names, however, so we can > > manipulate any nodes in the graph, not just the topmost one. Try thi= s: > >=20 > > block-commit d > > block-commit b > >=20 > > I haven't checked yet but I suspect it will launch two block-commit j= obs > > on the same partial chain (that's a bad thing because it can lead to > > corruption). >=20 > 1) Does block-commit work with node-names already? In other words, is > block-commit b possible now? I only see drive-mirror works with it, = but not > drive-backup, block-mirror or block-commit. >=20 > 2) Regardless of the answer to 1), I think we could use a similar appro= ach as > drive-backup here: split BLOCK_OP_TYPE_COMMIT to > BLOCK_OP_TYPE_COMMIT_{SOURCE,TARGET}, and only unblock > BLOCK_OP_TYPE_COMMIT_TARGET in bdrv_set_backing_hd. I'll try to implement this while being at it. >=20 > Unblocking BLOCK_OP_TYPE_COMMIT on backing_hd is wrong as long as we ex= pose "d" > to block-commit, with node-name. >=20 > As the next step, let's think about what it takes to safely allow commi= t d to b > with: >=20 > a <- b <- c <- d <- e <- f <- g >=20 > I think the answer is: if the commit job checks and sets blockers on ra= nge [d, b], > it is safe. Because from e, f and g's points of view, the whole backing= chain > is always complete and working, just like how device model thinks of th= e top node > when it's being active committed nowadays. >=20 > Based on that, we can even start commit on non-intersecting partial cha= ins at > the same time: >=20 > block-commit g up to e > block-commit d up to b >=20 > , if we want to. >=20 > That is only possible with a more specific blocking logic, instead of r= ecursive > blocker. Here your advice is the same as Kevin's one: make op blocker work on rang= es. I'll change them to do so. Thanks for the input. Beno=EEt >=20 > Fam >=20 > >=20 > > With recursive blockers, not just c but also a and b are protected. > >=20 > > To me, this demonstrates that recursive op blockers are safer than > > non-recursive backing_blocker and will prevent real-world cases that > > could lead to corruption. > >=20 > > BTW I'm fairly happy with the patch itself, but like Fam I'm still > > thinking through different cases to convince myself the design is sou= nd. > >=20 > > Stefan >=20 >=20