From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: Re: blktap: Sync with XCP, dropping zero-copy.
Date: Wed, 17 Nov 2010 12:35:59 +0000
Message-ID: <alpine.DEB.2.00.1011171139500.2373@kaball-desktop>
References: <1289604707-13378-1-git-send-email-daniel.stodden@citrix.com>
	<4CDDE0DA.2070303@goop.org>
	<1289620544.11102.373.camel@agari.van.xensource.com>
	<4CE17B80.7080606@goop.org> <1289898792.23890.214.camel@ramone>
	<alpine.DEB.2.00.1011161141410.2373@kaball-desktop>
	<1289961649.11102.1115.camel@agari.van.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <1289961649.11102.1115.camel@agari.van.xensource.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Daniel Stodden <Daniel.Stodden@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>, "Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>, Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
List-Id: xen-devel@lists.xenproject.org

On Wed, 17 Nov 2010, Daniel Stodden wrote:
> I'm not against reducing code and effort. But in order to switch to a
> different base we would need a drop-in match for VHD and at least a good
> match for all the control machinery on which xen-sm presently depends.
> There's also a lot of investment in filter drivers etc.
> 

I am hoping we don't actually have to rewrite all that code but that we
would be able to refactor it to fit the qemu driver APIs.


> Then there is SM control, stuff like pause/unpause to get guests off the
> storage nodes for snapshot/coalesce, more recently calls for statistics
> and monitoring, tweaking some physical I/O details, etc. Used to be a
> bitch, nowadays it's somewhat simpler, but that's all stuff we
> completely depend on.
> 

Upstream qemu has an RPC interface called QMP, that can be used to issue
commands and retrieve informations. For example snapshots are already
supported by this interface.
We need to support QMP one way or another, even only for the pure qemu
emulation use case, so we might as well exploit it.


> Moving blkback out of kernel space, into tapdisk, is predictable in size
> and complexity. Replacing tapdisks altogether would be quite a different
> story.
> 

This is not a one day project, we have time for this.
I was thinking about starting to make use of blkback qemu (with the
current qemu-xen) as a fallback when blkback2 is not present, mainly to
allow developers to work with upstream 2.6.37.
Meanwhile in the next few months upstream qemu should apply our patches
and therefore we could start using upstream qemu for development with
xen-unstable. Upstream qemu will offer much better aio support and QMP.
At that point we could start adding a VHD driver to upstream qemu and
slowly everything else we need. By the time 2.6.38/39 is out we could have
a proper backend with VHD support.
What do you think?


> The remainder below isn't fully qualified, just random bits coming to my
> mind, assuming you're not talking about sharing code/libs and
> frameworks, but actual processes.
> 
> 1st, what's the rationale with fully PV'd guests on xen? (That argument
> might not count if just taking qemu as the container process and
> stripping emulation for those.)
> 

PV guest needs qemu already for the console and the framebuffer
backends, so it just fits in the current picture without modifications.


> Related, there's the question of memory footprint. Kernel blkback is
> extremely lightweight. Moving the datapath into userland can create
> headaches, especially on 32bit dom0s with lot of guests and disks on
> backends which used to be bare LUNs under blkback. That's a problem
> tapdisk has to face too, just wondering about the size of the issue in
> qemu.

The PV qemu (the qemu run for PV guests) is very different from the HVM
qemu: it does very little, only runs the backends. I expect its memory
footprint to be really small.


> 
> Related, Xapi depends a lot on dom0 plugs, where the datapath can be
> somewhat hairy when it comes to blocking I/O and resource allocation.
> 

I am not sure what do you mean here.


> Then there is sharing. Storage activation normally doesn't operate in a
> specific VM context. It presently doesn't even relate to a particular
> VBD, much less a VM. For qemu alone, putting storage virtualization into
> the same address space is an obvious choice. For Xen, enforcing that
> sounds like a step backward.

Having the backend in qemu and running it in a VM context are two
different things. Qemu is very flexible in this regard, with one line
change (or maybe just different command line options) you can have qemu
doing hardware emulation, PV backends, both, or only some PV backends.
You could have:

- 1 qemu doing hardware emulation and PV backend;
- 1 qemu doing hardware emulation and another one doing PV backends;
- 1 qemu doing hardware emulation and 1 qemu doing some PV backends and
1 qemu doing the other PV backends;

and so on.
The only thing that cannot be easily split at the moment is the hardware
emulation, but the backends are completely modular.

> 
> >From the shared-framework perspective, and the amount of code involved:
> The ring path alone is too small to consider, and the more difficult
> parts on top of that like state machines for write ordering and syncing
> etc are hard to share because the depend on the queue implementation and
> image driver interface. 
> 

Yeah, if you are thinking about refactoring blktap2 in libraries and use
them in both the stand alone tapdisk case and the qemu case, it is
probably not worth it. 
In any case in qemu they do not depend on the image driver interface
because it is generic.


> Control might be a different story. As far as frontend/backend IPC via
> xenstore goes, right now I still feel like those backends could be
> managed by a single daemon, similar to what blktapctrl did (let's just
> make it stateless/restartable this time). I guess qemu processes run
> their xenstore trees already fine, but internally?

Yes, they do. There is a generic xen_backend interface that adds support
for Xen frontend/backend pairs. Using xen_backend each qemu instance
listens to its own xenstore backend path.