From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60863) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fQWkv-0005u1-PB for qemu-devel@nongnu.org; Wed, 06 Jun 2018 07:37:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fQWkt-0004u7-Ol for qemu-devel@nongnu.org; Wed, 06 Jun 2018 07:37:33 -0400 Date: Wed, 6 Jun 2018 12:37:21 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20180606113720.GA2661@work-vm> References: <20180528183058.GG2209@redhat.com> <20180528183833.GJ4580@localhost.localdomain> <20180528212054.GH2209@redhat.com> <20180528212510.GC4660@redhat.com> <20180529064415.GA4756@localhost.localdomain> <2b3eef00-f326-c1e6-0e4b-b7602646eec4@redhat.com> <20180606123237.2235ae4a@kitsune.suse.cz> <20180606131929.44d0fd6b@kitsune.suse.cz> <93233bff-604b-c891-90ce-64fe1eaaaab5@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <93233bff-604b-c891-90ce-64fe1eaaaab5@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] storing machine data in qcow images? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: Michal =?iso-8859-1?Q?Such=E1nek?= , Kevin Wolf , ehabkost@redhat.com, qemu-block@nongnu.org, "Michael S. Tsirkin" , "Richard W.M. Jones" , qemu-devel@nongnu.org, stefanha@redhat.com * Max Reitz (mreitz@redhat.com) wrote: > On 2018-06-06 13:19, Michal Such=E1nek wrote: > > On Wed, 6 Jun 2018 13:02:53 +0200 > > Max Reitz wrote: > >=20 > >> On 2018-06-06 12:32, Michal Such=E1nek wrote: > >>> On Tue, 29 May 2018 12:14:15 +0200 > >>> Max Reitz wrote: > >>> =20 > >>>> On 2018-05-29 08:44, Kevin Wolf wrote: =20 > >>>>> Am 28.05.2018 um 23:25 hat Richard W.M. Jones geschrieben: =20 > >>>>>> On Mon, May 28, 2018 at 10:20:54PM +0100, Richard W.M. Jones > >>>>>> wrote: =20 > >>>>>>> On Mon, May 28, 2018 at 08:38:33PM +0200, Kevin Wolf wrote: =20 > >>>>>>>> Just accessing the image file within a tar archive is possible > >>>>>>>> and we could write a block driver for that (I actually think w= e > >>>>>>>> should do this), but it restricts you because certain > >>>>>>>> operations like resizing aren't really possible in tar. > >>>>>>>> Unfortunately, resizing is a really common operation for > >>>>>>>> non-raw image formats. =20 > >>>>>>> > >>>>>>> We do this already in virt-v2v (using file.offset and file.size > >>>>>>> parameters in the raw driver). > >>>>>>> > >>>>>>> For virt-v2v we only need to read the source so resizing isn't > >>>>>>> an issue. For most of the cases we're talking about the > >>>>>>> downloaded image would also be a template / base image, so I > >>>>>>> suppose only reading would be required too. > >>>>>>> > >>>>>>> I also wrote an nbdkit tar file driver (supports writes, but no= t > >>>>>>> resizing). > >>>>>>> https://manpages.debian.org/testing/nbdkit-plugin-perl/nbdkit-t= ar-plugin.1.en.html =20 > >>>>>> > >>>>>> I should add the other thorny issue with OVA files is that the > >>>>>> metadata contains a checksum (SHA1 or SHA256) of the disk images= . > >>>>>> If you modify the disk images in-place in the tar file then you > >>>>>> need to recalculate those. =20 > >>>>> > >>>>> All of this means that OVA isn't really well suited to be used as > >>>>> a native format for VM configuration + images. It's just for > >>>>> sharing read-only images that are converted into another native > >>>>> format before they are used. > >>>>> > >>>>> Which is probably fair for the use case it was made for, but mean= s > >>>>> that we need something else to solve our problem. =20 > >>>> > >>>> Maybe we should first narrow down our problem. Maybe you have don= e > >>>> that already, but I'm quite in the dark still. > >>>> > >>>> The original problem was that you need to supply a machine type to > >>>> qemu, and that multiple common architectures now have multiple > >>>> machine types and not necessarily all work with a single image. S= o > >>>> far so good, but I have two issues here already: > >>>> > >>>> (1) How is qemu supposed to interpret that information? If it's > >>>> stored in the image file, I don't see a nice way of retrieving it > >>>> before the machine is initialized, at least not with qemu's curren= t > >>>> architecture. Once we support configuring qemu solely through QMP, > >>>> sure, you can do a blockdev-add and then build the machine > >>>> accordingly. But that is not here today, and I'm not sure this is > >>>> a good idea either, because that would mean automagic defaults for > >>>> the machine-building QMP commands derived from the blockdev-add > >>>> earlier, which should get a plain "No". Also, having to use QMP to > >>>> build your machine wouldn't make anything easier; at least not > >>>> easier than just supplying a configuration file along with the > >>>> image. > >>>> > >>>> (Building the magic into -blockdev might be less horrible, but suc= h > >>>> magic (adding block devices influences machine defaults) to me > >>>> still doesn't seem worth not having to supply a config file along > >>>> with the disk image.) > >>>> > >>>> (2) Again, I personally just really don't like saving such > >>>> information in a disk image. One actual argument I can bring up > >>>> for that distaste is this: Suppose, you have multiple images > >>>> attached to your VM. Now the VM wants to store the machine type. > >>>> Where does it go? Into all of them? But some of those images may > >>>> only contain data and might be intended to be shared between > >>>> multiple VMs. So those shouldn't receive the mark. Only disks > >>>> with binaries should receive them. But what if those binaries are > >>>> just cross-compiled binaries for some other VM? Oh no, so not > >>>> even binaries are a sure indicator... So I have no idea where the > >>>> information is supposed to be stored. In any case, "the first > >>>> image" just gets an outright "no" from me, and "all images" gets > >>>> an "I don't think this is a good idea". > >>>> > >>>> Loading is fun, too. OK, so you attach multiple disk images to a > >>>> VM. Oops, they have varying machine type information... Now > >>>> what? Use the information from the first one? Definitely no. > >>>> Just ignore all of the information in such a case and have the > >>>> user supply the machine type again? Possible, but it seems weird > >>>> to me that qemu would usually guess the machine type, but once you > >>>> attach some random other image to it, it suddenly fails to do > >>>> that. But maybe it's just me who thinks this is weird. > >>>> > >>>> > >>>> OK, so let's go a step further. We have stored the machine type > >>>> information in order to not have to supply a config file with the > >>>> qcow2 image -- because if we did, it could just contain the machin= e > >>>> type and that would be it. > >>>> > >>>> So to me it follows naturally that just storing the machine type > >>>> doesn't make much sense if we cannot also store more VM > >>>> configuration in a qcow2 file, because I don't see why you should > >>>> be able to ship an image without a config file only if all you > >>>> need to supply is a machine type. Often, you also need to supply > >>>> how much memory the VM needs (which depends on the OS on the > >>>> image) or what storage controller to use (does the OS have virtio > >>>> drivers? (to be fair, it usually does, because you're supplying a > >>>> VM image in the first place)). > >>>> > >>>> So I think if we decide to store the machine type, that is kind of > >>>> a slippery slope and then there are good arguments for storing > >>>> even more configuration options in the file, too. But I really, > >>>> really don't like that. > >>>> > >>>> For one thing, I suspect it to get really ugly implementation-wise= . > >>>> Getting the machine type out of a disk image and actually > >>>> interpreting it automatically is bad enough, but getting possibly > >>>> everything out of it? It's not going to be any better. > >>>> > >>>> For another, how do we store the data? key-value seems wrong if w= e > >>>> want to store everything. JSON might be fine. But eventually we > >>>> just want basically a qemu configuration file in there, I would > >>>> think (which may support JSON at some point?). So basically we > >>>> would store the data as a binary blob and let the rest of qemu do > >>>> its thing with it. But then please tell me why I fought so > >>>> valiantly against storing random bitmaps in qcow2 files. =20 > >>> > >>> Yes, I wonder. Why did you? =20 > >> > >> That was mostly directed at Kevin. > >> > >> My reasoning was that a qcow2 file is a disk image. All data stored > >> therein should be immediately associated with the stored data. > >> Another reason was that from the perspective of qcow2 you don't lose > >> anything by tying the bitmaps directly to that data; all we lost was > >> the capability of storing bitmaps for unrelated raw files. > >> > >> (And the reasoning for that is "if you want features, use qcow2" -- > >> although R/W backing files may loosen that phrase.) > >> > >>>> I hate the idea of making qcow2 a random archive format. =20 > >>> > >>> What's wrong with that? =20 > >> > >> The fact that qcow2 isn't. > >> > >> From my perspective it would increase the format's complexity to a > >> point where you could just create a new format altogether. Well, > >> actually, all you do is design a filesystem (or reuse an existing > >> one). > >> > >>>> We have tar for that. =20 > >>> > >>> It does not support expanding the stored files. =20 > >> > >> Nor does qcow2, because it does not support storing files at all. > >=20 > > AFAICT from the previous discussion it already does allow storing > > multiple data streams that can be changed independently so it basical= ly > > is an archive format or filesystem except the streams are not named n= or > > easily accessible separately outside of qemu. >=20 > I don't quite understand what you are referring to. We have snapshots, > we have bitmaps, yes, but all of that are related directly to the store= d > guest disk data. >=20 > The only thing we currently have in qcow2 that is opaque is the VM stat= e > that can be stored in snapshots (and don't hold me responsible for that= ). >=20 > >> Secondly, that completely depends on how you use it. You can freely > >> expand the last file in the archive, for instance. Also I've seen > >> people store files in chunks so they can indeed resize it. > >> > >> (I'm wondering if we could write a block driver that could provide > >> such a chunk allocation transparently to qcow2... Note that a qcow2 > >> file does not need to be continuous, so you could in theory indeed > >> store the qcow2 file and its data in completely separate places in a > >> tar file.) > >=20 > > Which basically invents another new filesystem on top of tar for no > > good reason. Especially when we have already support for storage form= at > > that is capable enough. >=20 > No different from inventing a filesystem on top of qcow2. >=20 > I don't think qcow2 is any more capable than tar. >=20 > >> What I'm trying to get at is that qcow2 was not designed to be a > >> container format for arbitrary files. If you want to make it such, > >> I'm sure there are existing formats that work better. > >=20 > > Such as? >=20 > ext2? >=20 > It seems to me that you want to make qcow2 a filesystem. Sure, the FS > we'd end up with would probably be simpler than ext2, but I assume > thanks to feature creep we'd eventually end up with a qcow2 format that > is a worse FS than real FS (especially performance-wise), but that is > similarly complex. >=20 > >>>> Unless I have got something terribly wrong (which is indeed a > >>>> possibility!), to me this proposal means basically to turn qcow2 > >>>> into (1) a VM description format for qemu, and (2) to turn it into > >>>> an archive format on the way. =20 > >>> > >>> And if you go all the way you can store multiple disks along with > >>> the VM definition so you can have the whole appliance in one file. > >>> It conveniently solves the problem of synchronizing snapshots acros= s > >>> multiple disk images and the question where to store the machine > >>> state if you want to suspend it. =20 > >> > >> Yeah, but why make qcow2 that format? That's what I completely fail > >> to understand. > >> > >> If you want to have a single VM description file that contains the V= M > >> configuration and some qcow2/raw/whatever files along with it for th= e > >> guest disk data, sure, go ahead. But why does the format of the who= le > >> thing need to be qcow2? > >=20 > > Because then qemu can access the disk data from the image directly > > without any need for extraction, copying to different file, etc. >=20 > This does not explain why it needs to be qcow2. There is absolutely no > reason why you couldn't use qcow2 files in-place inside of another file= . Because then we'd have to change the whole stack to take advantage of that. Adding a feature into qcow2 means nothing else changes. Dave > Max >=20 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK