From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47963) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYNe7-0004KT-Jr for qemu-devel@nongnu.org; Tue, 23 Feb 2016 19:49:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYNe6-0004kB-3c for qemu-devel@nongnu.org; Tue, 23 Feb 2016 19:49:39 -0500 Date: Wed, 24 Feb 2016 08:49:28 +0800 From: Fam Zheng Message-ID: <20160224004928.GC749@ad.usersys.redhat.com> References: <20160222142415.GG5387@noname.str.redhat.com> <20160223034050.GD26360@ad.usersys.redhat.com> <20160223174330.GF8176@noname.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160223174330.GF8176@noname.redhat.com> Subject: Re: [Qemu-devel] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Alberto Garcia , qemu-block@nongnu.org, jsnow@redhat.com, Peter Lieven , qemu-devel@nongnu.org, Markus Armbruster , vsementsov@parallels.com, Stefan Hajnoczi , "Denis V. Lunev" , pbonzini@redhat.com, mreitz@redhat.com On Tue, 02/23 18:43, Kevin Wolf wrote: > Am 23.02.2016 um 04:40 hat Fam Zheng geschrieben: > > (I'm Cc'ing a few more people here just in case they have different visions > > about raw image use cases.) > > > > On Mon, 02/22 15:24, Kevin Wolf wrote: > > > Am 26.01.2016 um 11:38 hat Fam Zheng geschrieben: > > > > This series introduces a simple format to enable support of persistence of > > > > block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental > > > > backup, and persistence of block dirty bitmap makes incrememtal backup possible > > > > across VM shutdowns, where existing in-memory dirty bitmaps cannot survive. > > > > > > > > When user creates a "persisted" dirty bitmap, the QBM driver will create a > > > > binary file and synchronize it with the existing in-memory block dirty bitmap > > > > (BdrvDirtyBitmap). When the VM is powered down, the binary file has all the > > > > bits saved on disk, which will be loaded and used to initialize the in-memory > > > > block dirty bitmap next time the guest is started. > > > > > > > > The idea of the format is to reuse as much existing infrastructure as possible > > > > and avoid introducing complex data structures - it works with any image format, > > > > by gluing it together plain bitmap files with a json descriptor file. The > > > > advantage of this approach over extending existing formats, such as qcow2, is > > > > that the new feature is implemented by an orthogonal driver, in a format > > > > agnostic way. This way, even raw images can have their persistent dirty > > > > bitmaps. (And you will notice in this series, with a little forging to the > > > > spec, raw images can also have backing files through a QBM overlay!) > > > > > > > > Rather than superseding it, this intends to be coexistent in parallel with the > > > > qcow2 bitmap extension that Vladimir is working on. The block driver interface > > > > changes in this series also try to be generic and compatible for both drivers. > > > > > > So as I already told Fam last week, before we discuss any technical > > > details here, we first need to discuss whether this is even the right > > > thing to do. Currently I'm doubtful, as this is another attempt to > > > introduce a new native image format in qemu. > > > > > > Let's recap the image formats and what we tell users about them today: > > > > > > * qcow2: This is the default choice for disk images. It gives you access > > > to all of the features in qemu at a good performance. If it doesn't > > > perform well in your case, we'll fix it. > > > > > > * raw: Use this when you need absolute performance and don't need any > > > features from an image format, so you want to get any complexity just > > > out of the way and pass requests as directly as possible from the > > > guest device to the host kernel. > > > > > > * Anything else: Only use them to convert into raw or qcow2. > > > > > > Now using bitmaps is clearly on the "features" side, which suggests that > > > qcow2 is the format of choice for this. If you want to introduce a new > > > format, you need to justify it with evidence that... > > > > > > 1. there is a relevant use case that qcow2 doesn't cover > > > 2. qcow2 can't be fixed/enhanced to cover the use case > > > > > > The one thing that people have claimed in the past that qcow2 can't > > > provide is enough performance. This is where QED tried to come in and > > > promised a compromise between performance (then a bit faster than qcow2) > > > and features (almost none, but supports backing files). We all know that > > > it was a failure because you had to sacrifice features and still the > > > idea that qcow2 couldn't be fixed was wrong, so today we have a QED > > > driver that is much slower than qcow2 despite having less features. > > > > > > Now for QBM. First, let's have a look at the image format that it can be > > > used with. qcow2 doesn't need it if we continue with Vladimir's > > > extension. Other non-raw formats are only supposed to be used for > > > conversion. The only thing that's really left is raw. > > > > Yes, I agree with this point. > > > > > Now adding a > > > feature only for raw, as a compromise between features and performance, > > > looks an awful lot like what QED tried. We don't want to go there. > > > > > > Even if we wanted to support persistent dirty bitmaps with raw images > > > (which has to be discussed based on use cases), it's still questionable > > > whether we need a new image format with JSON descriptor files instead of > > > just raw bitmaps that can be added with a QMP command. > > > > > > > I don't think QMP interface alone is enough, in persistent backup use case, > > when starting a guest, command line interface is more appropriate to continue > > dirty trackings that were enabled during shutdown. > > Yes, I was sloppy. Maybe s/QMP command/runtime option/ gets closer. > > > I'd justify in two parts, one is "why" and the other is "how". > > > > So to answer why. The reason I worked on QBM is because I feel it wrong to > > leaving raw behind. Ceph and LVM users use raw format. You could technically > > use qcow2 with ceph but that is discouraged[1] or even refused by openstack[2]. > > We've seen qcow2 on top of LVs but that is not the dominance. > > Ceph is definitely a valid point. I think we agree that qcow2 can't > provide what we need there today. > > The question is whether qcow2 can be extended to provide it. As we > discussed last week internally and today on the call, a possible idea > would be to extend qcow2 to act as the filter driver here, where all I/O > is redirected to the backing file and only the bitmaps remain in the > qcow2 layer. > > > The scope of "features" for which we tell users they have to use qcow2 should > > those that are format specific, not "block features" in general. Backing file, > > internal/external snapshot, thin provisioning, compression and encryption are > > all great examples of format features, whereas things including throttling, > > statistics, migration, mirroring and backing up are IMHO not. Actually we > > already support snapshotting a raw image, with an qcow2 overlay. We've even > > implemented non-persistent incremental backup for raw today, through > > drive-backup. If we will decide qcow2 is the only possible format that can do > > persistent backup, I'm not really a huge fan of it. > > Yeah, but that's just a feeling, not a use case. > > > Then "how"? > > > > Actually, I thought we could do it in a way similar to quorum. The way quorum > > driver works is by specifying tediously long options. A snippet from > > qemu-iotests to build a quorum driver with 3 children is like this: > > > > quorum="driver=raw,file.driver=quorum,file.vote-threshold=2" > > quorum="$quorum,file.children.0.file.filename=$TEST_DIR/1.raw" > > quorum="$quorum,file.children.1.file.filename=$TEST_DIR/2.raw" > > quorum="$quorum,file.children.2.file.filename=$TEST_DIR/3.raw" > > quorum="$quorum,file.children.0.driver=raw" > > quorum="$quorum,file.children.1.driver=raw" > > quorum="$quorum,file.children.2.driver=raw" > > > > Though very repetitive, it is also very simple: all children are almost > > symmemtrical (identical in user data). The only thing for user/management tool > > to make sure is the images have the same data. > > By the way, the repetitiveness would be greatly reduced if the test case > were using the json: pseudo-protocol. > > > Unfortunately the logic is more complicated in an persistent incremental backup > > scenario. Manual users will have to specify bitmap file names and the > > granularities which they may have no clue anymore two weeks after they created > > the bitmap, and can get wrong. Management seems a must in this case, but the > > interface we provide to them still feels way too low level. Anyway, I do think > > we can consider a "banana" (dummy name) driver for persistent bitmap management > > which is structured like quorum: > > > > banana="driver=raw,file.driver=banana,file.mode=synchronous" > > banana="$banana,file.image.file.filename=$TEST_IMG" > > banana="$banana,file.bitmaps.0.file.filename=$TEST_DIR/bm0.raw" > > banana="$banana,file.bitmaps.0.granularity=65536" > > banana="$banana,file.bitmaps.0.name=bm0" > > banana="$banana,file.bitmaps.1.file.filename=$TEST_DIR/bm1.raw" > > banana="$banana,file.bitmaps.1.granularity=1048576" > > banana="$banana,file.bitmaps.1.name=bm1" > > ... > > > > But we're merely inlining the information from QBM JSON format into the command > > line. This is IMO only one step of differences in between. > > It wasn't as clear to me before I read this explanation, but is the QBM > on-disk file format really just reinventing qemu config files then? I agree they look alike on the surface, but are qemu config files updated by QEMU? An image is both read and more importantly written by the driver following a definite format specification, I think that is fundamentally different. Fam