From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:50643) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RzmIs-0000Z9-LN for qemu-devel@nongnu.org; Tue, 21 Feb 2012 04:46:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RzmIm-00038f-Iq for qemu-devel@nongnu.org; Tue, 21 Feb 2012 04:46:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:9773) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RzmIm-00038V-CI for qemu-devel@nongnu.org; Tue, 21 Feb 2012 04:46:28 -0500 Message-ID: <4F4368BF.4040707@redhat.com> Date: Tue, 21 Feb 2012 10:49:51 +0100 From: Kevin Wolf MIME-Version: 1.0 References: <73865e0ce364c40e0eb65ec6b22b819d@mail.gmail.com> <4F31153E.9010205@codemonkey.ws> <4F311839.9030709@redhat.com> <4F311BBA.8000708@codemonkey.ws> <4F312FD3.5020206@zerto.com> <4F3137DB.1040503@redhat.com> <4F3139CE.4040103@zerto.com> <4F314798.8010009@redhat.com> <4F3211D0.3070502@zerto.com> <4F323875.1000000@redhat.com> <4F3244C2.1040604@zerto.com> <4F32489A.80307@redhat.com> <4F32788C.60904@zerto.com> <4F40FBD6.2000500@zerto.com> <4F425987.20103@redhat.com> <4F435DD2.8080600@redhat.com> <4F4360C7.5080806@redhat.com> In-Reply-To: <4F4360C7.5080806@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] BlockDriverState stack and BlockListeners List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: =?UTF-8?B?16rXldee16gg15HXnyDXkNeV16g=?= , Stefan Hajnoczi , Jeff Cody , dlaor@redhat.com, qemu-devel@nongnu.org, Markus Armbruster , Zhi Yong Wu , Federico Simoncelli , Ori Mamluk , =?UTF-8?B?16LXldeT15Mg16fXk9ed?= , Yair Kuszpet Am 21.02.2012 10:15, schrieb Paolo Bonzini: > On 02/21/2012 10:03 AM, Kevin Wolf wrote: >> So let's check which features could make use of it: >> >> - Copy on read >> - I/O throttling >> - blkmirror for precopy storage migration >> - replication agent >> - Old style block migratiom > > More precisely, dirty bitmap handling. Yes, but is it used anywhere else? >> (btw, we should deprecate this) > > Yeah, but we need blkmirror to provide an alternative. Does block migration even work meanwhile without corrupting things left and right? >> - Maybe even bdrv_check_request and high watermark? However, they are >> not optional, so probably makes less sense. >> >> I think these are enough cases to justify it. Now, which operations do >> we need to intercept? >> >> - bdrv_co_read >> - bdrv_co_write >> - bdrv_drain (btw, we need a version for only one BDS) >> - Probably bdrv_co_discard as well > > Yes. > >> Anything I missed? > > bdrv_co_flush, I think. Right, we'll need bdrv_co_flush as well. >> Now the interesting question that comes to mind is: >> What is really the difference between the proposed BlockListener and a >> BlockDriver? Sure, a listener would implement much less functionality, >> but we also have BlockDrivers today that implement very few of the >> callbacks. > > The two differences that come to mind are: > > 1) BlockListener would be by-design coroutine based; I know we disagree > on this (you want to change raw to coroutines long term; I would like to > reintroduce some AIO fastpaths when there are no active listeners). I can't see a technical reason why a BlockListener could not be callback based. The only reason might be a "there are only coroutines" policy. But even then, just don't implement bdrv_aio_* like all other coroutine-based block drivers. > 2) BlockListener would be entirely an implementation detail, used in the > implementation of other commands. Depending on what you mean by command (presumably QMP commands?), I think I disagree. Management tools will want to start a VM with BlockListeners already applied (which would be possible via -blockdev). And on the other hand, protocols like file are entirely implementation details as well, and still they are BlockDrivers. > Third, perhaps the interface to BlockListener could be > bdrv_before/after_read. Cannot really say without writing one or two > BlockListeners whether this would be useful or a limitation. /* bdrv_before code here */ ret = bdrv_co_read(bs->file, ...); /* bdrv_after code here */ Should be pretty much equivalent, no? >> The main difference that I see is that the listeners stay always on top. >> For example, let's assume that if implemented a blkmirror driver in >> today's infrastructure, you would get a BlockDriverState stack like >> blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't >> want to have the blkmirror applied to the old top-level image, which is >> now a read-only backing file. Instead, it should move to the new >> top-level image. > > Yes, that's because a BlockListener always applies to a > BlockDriverState, and live snapshots close+reopen the BDS but do not > delete/recreate it. Hm, that's an interesting angle to look at it. The reasoning makes sense to me (though I would reword it as a BlockListener belongs to a drive/block device rather than a BDS, which is an implementation detail). However, live snapshots can't close and reopen the BDS in the future, because the reopen could fail and you must not have closed the old image yet in this case. So what Jeff and I were looking into recently is to change this so that new top-level images are opened first without backing file and then the backing file relationship is created with the existing BDS. Of course, we stumbled across the thing that you're telling me here, that devices refer to the same BDS as before. So their content must be swapped, but some data like the device name (and now BlockListeners) stay with the top-level image instead. Which in turn reminds me of a discussion I had with Stefan a while ago, where we came to the conclusion that we need to separate the representation of an image file and a "view" of it which represents a block device (as a whole). One of the reasons then was that one qcow2 image could offer multiple views (one r/w view and for each snapshot a r/o one). I think the separation that this would require might actually be the same as the separation of things that stay top-level and that belong to the image file. Isn't it cool how everything is connected with everything? :-) >> So maybe we just need to extend the current BlockDriverState stack to >> distinguish "normal" and "always on top" BlockDrivers, where the latter >> would roughly correspond to BlockListeners? > > I would prefer having separate objects. Even if you do not count fields > related to throttling or copy-on-read or other tasks in the list above, > there are many fields in BDS that do not really apply to BlockListeners. > Backing files, device ops, encryption, even size. Having extra methods > is not a big problem, but unwanted data items smell... Most other block drivers use only little of them. We can try to clean up some of them (and making the separation described above would probably help with it), but BlockListeners aren't really different here from existing drivers. Kevin