From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46834) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1byjay-0006rg-Aa for qemu-devel@nongnu.org; Mon, 24 Oct 2016 14:03:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1byjaw-0000AQ-W3 for qemu-devel@nongnu.org; Mon, 24 Oct 2016 14:03:36 -0400 References: <1475237406-26917-1-git-send-email-famz@redhat.com> <20161024101111.GB4374@noname.redhat.com> From: Max Reitz Message-ID: Date: Mon, 24 Oct 2016 20:03:22 +0200 MIME-Version: 1.0 In-Reply-To: <20161024101111.GB4374@noname.redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5oFippuE9rFGmnipOJCVANVNveWxBxd33" Subject: Re: [Qemu-devel] [PATCH v8 00/36] block: Image locking series List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Fam Zheng , qemu-devel@nongnu.org, berrange@redhat.com, John Snow , qemu-block@nongnu.org, rjones@redhat.com, Jeff Cody , Markus Armbruster , stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, eblake@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --5oFippuE9rFGmnipOJCVANVNveWxBxd33 From: Max Reitz To: Kevin Wolf Cc: Fam Zheng , qemu-devel@nongnu.org, berrange@redhat.com, John Snow , qemu-block@nongnu.org, rjones@redhat.com, Jeff Cody , Markus Armbruster , stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, eblake@redhat.com Message-ID: Subject: Re: [PATCH v8 00/36] block: Image locking series References: <1475237406-26917-1-git-send-email-famz@redhat.com> <20161024101111.GB4374@noname.redhat.com> In-Reply-To: <20161024101111.GB4374@noname.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 24.10.2016 12:11, Kevin Wolf wrote: [...] > Now, the big question is how to translate this into file locking. This > could become a little tricky. I had a few thoughts involving another > lock on byte 2, but none of them actually worked out so far, because > what we want is essentially a lock that can be shared by readers, that > can also be shared by writers, but not by readers and writers at the > same time. You can also share it between readers and writers, as long as everyone can cope with volatile data. I agree that it's very similar to the proposed op blocker style, but I can't really come up with a meaningful translation either. Maybe something like this (?): All readers who do not want the file to be modified grab a shared lock on byte 1. All writers who can deal with volatile data grab a shared lock on byte 2. Exclusive writers grab an exclusive lock on byte 1 and 2. Readers who can cope with volatile data get no lock at all. When opening, the first and second group would always have to test whether there is a lock on the other byte, respectively. E.g. sharing writers would first grab an exclusive lock on byte 1, then the shared lock on byte 2 and then release the exclusive lock again. Would that work? >> So as far as I understand it, those qdev properties should eventually = be >> changeable by the guest. And if that is so, the user probably shouldn'= t >> touch them at all because the guest OS really knows whether it can cop= e >> with volatile block devices or not, and it will tell qemu if that is s= o. >=20 > This is an interesting thought for PV devices, but it hasn't been part > of the plan so far. I'm not sure if the guest OS block device drivers > even have this information generally, because that's more a matter of > whether a cluster file system is used on the block device or not. >=20 >> That may be the reason why the qdev property doesn't make sense to me = at >> the first glance: It does make sense for the guest OS to set this >> property at that level, but it doesn't make sense for the user to do s= o. >> Or maybe it does, but the user is really asking for things breaking th= en >> ("YES, I AM SURE THAT MY GUEST WILL BE COMPLETELY FINE WITH FLOPPY DIS= KS >> JUST CHANGING RANDOMLY" -- but I guess we've all been at that point >> ourselves...). >> >> So after having convinced myself twice now, having the qdev property i= s >> probably correct. >=20 > Essentially the whole reason is that we need the information and the > guest doesn't tell us, so we need to ask someone who supposedly knows > enough about the guest - which is the user. Hm, OK, fair enough. >> >> >> But that's where the flags come in again: The guest may be fine with a= >> shared lock or none at all. But what about the block layer? Say you're= >> using a qcow2 image; that will not like sharing at all, independently = of >> what the guest things. So the block layer needs to have internal >> mechanisms to elevate the locks proposed by the guest devices to e.g. >> exclusive ones. I don't think that is something addressed by this seri= es >> in this version. Maybe you'll still need the flags for that. >=20 > I'm not completely sure about the mechanism and whether it should > involve flags, but yes, you need something to propagate the required > locking mode down recursively (just like op blockers will need to be > done - maybe we should really use the same infrastructure and only > do file locking as its implementation of the lowest layer). Sounds good to me, but that implies that we have the new op blockers rather soon. ;-) > What you don't need, however, is user input. We already know that qcow2= > doesn't like concurrent writes. >=20 >> Final thing: Say you have a user who's crazy. Maybe they want to debug= >> something. Anyway, they have some qcow2 image attached to some guest >> device, and they want to access that image simultaneously from some >> other process; as I said, crazy, but there may be legitimate uses for >> that so we should allow it. >> >> Now first that user of course has to set the device's lock_mode to >> shared or none, thus promising qemu that the guest will be fine with >> volatile data. But of course qcow2 will override that lock mode, becau= se >> qcow2 doesn't care about the guest: It primarily cares about its own >> metadata integrity. So at this point the user will need some way to >> override qcow2's choice, too. Therefore, I think we also need a per-no= de >> flag to override the locking mode. >> >> qcow2 would probably make this flag default to exclusive for its >> underlying node, and the user would then have to override that flag fo= r >> exactly that underlying node. >> >> Therefore I think we still need a block layer property for the lock >> mode. While I now agree that we do need a qdev property, I think that >> alone won't be sufficient. >=20 > Attaching a qcow2 image to two processes simply doesn't work unless bot= h > are read-only, so I disagree that we should allow it. Or can you really= > come up with a specific reasonable use case that requires this? Yes, it's more fun if you know that all of your data can suddenly be broken beyond repair. It gives you this great kick of excitement. > Users crazy enough to want something like this patch qemu. Both of us > have patched qemu for debugging guests before and we didn't do things a= s > crazy as writing to the same qcow2 image from multiple processes. Now that you mention it, indeed I have not. I wonder why. I should give it a try some day. OK, agreed that we don't need a block layer flag. If someone later finds out that we do for some weird reason, we can still add it. Max --5oFippuE9rFGmnipOJCVANVNveWxBxd33 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQEvBAEBCAAZBQJYDkzqEhxtcmVpdHpAcmVkaGF0LmNvbQAKCRD0B9sAYdXPQNXm CACL13GZWLfpb39qFvFJw5NJ8L8v4n3vkCHhOs7RcG1dNeG6e6/8PATmzq7dRHh+ btrYVioupCLL07rc1LRCFmCKINnlLORqZwh6HFV443F3CBt+kQElpl4H8Uf8ERb2 LdL+JBB7VDYwVzTQMyVg3+UpAjCKjy6uUbAN2aPnW+oScpjMcWcQWLFOdfc5ALYs JkfNck77DAGoKG3mq2gVgM+k+Sjox+ovpEWxz7wcURTe82bP8vvJ8IR+tuWeI1Rj 9aXo7XoqjYAe/NKrDmAqczmluZWHwkB9yeMhvxK1o56AujlMWn6bvtSN2GTJ62jP N16JjrVALkiLd/bIodtJn9ys =6wNl -----END PGP SIGNATURE----- --5oFippuE9rFGmnipOJCVANVNveWxBxd33--