All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Block layer roadmap on wiki
@ 2011-08-22 13:34 Stefan Hajnoczi
  2011-08-22 14:27 ` Ryan Harper
  2011-08-22 19:04 ` Anthony Liguori
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2011-08-22 13:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Christoph Hellwig

At KVM Forum Kevin, Christoph, and I had an opportunity to get
together for a Block Layer BoF.  We went through the recent "roadmap"
mailing list thread and touched on each proposed feature.

Here is the block layer roadmap wiki page:
http://wiki.qemu.org/BlockRoadmap

Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
mentioned you want it for the next release.

My main take-away from the BoF was that integrating support for host
block devices and storage appliances will allow us to reduce the
amount of effort spent on image formats.  In order to make image
formats support the desired features and performance we end up
implementing much of the storage stack and file systems in userspace -
code that is duplicated and cannot take advantage of the existing
storage stack.

Storage management features are not just available in remote SAN and
NAS appliances anymore.  For local storage, btrfs has file-level
clones and thin-dev is significantly improving LVM snapshots.

Thin-dev is bringing a much more efficient and scalable snapshot model
to LVM.  This device-mapper feature will make LVM attractive for high
performance I/O without giving up snapshot and clone features.  It
also supports cloning off block devices that are not in the pool (e.g.
external storage, much like QEMU's backing files feature):
https://github.com/jthornber/linux-2.6/tree/thin-dev

This will not replace image formats overnight because image formats
are still widely used and will continue to be a useful for
transferring and sharing disk images.  But focussing on the larger
storage stack where either local LVM, btrfs, or storage appliances do
the storage management means we exploit those options instead of
implementing equivalent functionality ourselves.  QEMU then runs with
plain old raw in more cases.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 13:34 [Qemu-devel] Block layer roadmap on wiki Stefan Hajnoczi
@ 2011-08-22 14:27 ` Ryan Harper
  2011-08-22 17:58   ` Stefan Hajnoczi
  2011-08-22 19:04 ` Anthony Liguori
  1 sibling, 1 reply; 10+ messages in thread
From: Ryan Harper @ 2011-08-22 14:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Christoph Hellwig

* Stefan Hajnoczi <stefanha@gmail.com> [2011-08-22 08:35]:
> At KVM Forum Kevin, Christoph, and I had an opportunity to get
> together for a Block Layer BoF.  We went through the recent "roadmap"
> mailing list thread and touched on each proposed feature.
> 
> Here is the block layer roadmap wiki page:
> http://wiki.qemu.org/BlockRoadmap
> 
> Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
> mentioned you want it for the next release.
> 
> My main take-away from the BoF was that integrating support for host
> block devices and storage appliances will allow us to reduce the
> amount of effort spent on image formats.  In order to make image
> formats support the desired features and performance we end up
> implementing much of the storage stack and file systems in userspace -
> code that is duplicated and cannot take advantage of the existing
> storage stack.

+1

> 
> Storage management features are not just available in remote SAN and
> NAS appliances anymore.  For local storage, btrfs has file-level
> clones and thin-dev is significantly improving LVM snapshots.
> 
> Thin-dev is bringing a much more efficient and scalable snapshot model
> to LVM.  This device-mapper feature will make LVM attractive for high
> performance I/O without giving up snapshot and clone features.  It
> also supports cloning off block devices that are not in the pool (e.g.
> external storage, much like QEMU's backing files feature):
> https://github.com/jthornber/linux-2.6/tree/thin-dev
> 
> This will not replace image formats overnight because image formats
> are still widely used and will continue to be a useful for
> transferring and sharing disk images.  But focussing on the larger

Any thoughts on how to make this easily usable for LVM?  If there were
an export/import to/from file to LVM?  is that sufficient?  Anything
like this in existence?

> storage stack where either local LVM, btrfs, or storage appliances do
> the storage management means we exploit those options instead of
> implementing equivalent functionality ourselves.  QEMU then runs with
> plain old raw in more cases.
> 
> Stefan

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 14:27 ` Ryan Harper
@ 2011-08-22 17:58   ` Stefan Hajnoczi
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2011-08-22 17:58 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Kevin Wolf, qemu-devel, Christoph Hellwig

On Mon, Aug 22, 2011 at 09:27:12AM -0500, Ryan Harper wrote:
> * Stefan Hajnoczi <stefanha@gmail.com> [2011-08-22 08:35]:
> > At KVM Forum Kevin, Christoph, and I had an opportunity to get
> > together for a Block Layer BoF.  We went through the recent "roadmap"
> > mailing list thread and touched on each proposed feature.
> > 
> > Here is the block layer roadmap wiki page:
> > http://wiki.qemu.org/BlockRoadmap
> > 
> > Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
> > mentioned you want it for the next release.
> > 
> > My main take-away from the BoF was that integrating support for host
> > block devices and storage appliances will allow us to reduce the
> > amount of effort spent on image formats.  In order to make image
> > formats support the desired features and performance we end up
> > implementing much of the storage stack and file systems in userspace -
> > code that is duplicated and cannot take advantage of the existing
> > storage stack.
> 
> +1
> 
> > 
> > Storage management features are not just available in remote SAN and
> > NAS appliances anymore.  For local storage, btrfs has file-level
> > clones and thin-dev is significantly improving LVM snapshots.
> > 
> > Thin-dev is bringing a much more efficient and scalable snapshot model
> > to LVM.  This device-mapper feature will make LVM attractive for high
> > performance I/O without giving up snapshot and clone features.  It
> > also supports cloning off block devices that are not in the pool (e.g.
> > external storage, much like QEMU's backing files feature):
> > https://github.com/jthornber/linux-2.6/tree/thin-dev
> > 
> > This will not replace image formats overnight because image formats
> > are still widely used and will continue to be a useful for
> > transferring and sharing disk images.  But focussing on the larger
> 
> Any thoughts on how to make this easily usable for LVM?  If there were
> an export/import to/from file to LVM?  is that sufficient?  Anything
> like this in existence?

Forgot to mention a major advantage of a raw-oriented storage stack: we need
good support for raw + storage appliance anyway.  Users want to hook up their
SAN or NAS just like they can with other hypervisors.  Time spent on image
formats would be better spent fleshing out integration with LVM, btrfs, SAN,
NAS, and friends.

Back to import/export, it serves two purposes:
1. Efficient transport.  Uploading and downloading image files in a
   compact form that represents zero blocks efficiently and perhaps
   compresses data.
2. Compatibility with other hypervisors and external tools.  Here it's
   all about using a well-defined file format.

In order to pull off a raw-oriented storage stack we need to do
import/export well.  So this is an area where we have to focus.

Image streaming is a good approach for import because it allows the VM
to start instantly (even before the image is fully imported).  A
qemu-nbd process serves up image data and we stream into a logical
volume.

For export we can do a fuse file system that presents logical volumes as image
files.  That way existing applications can get at the data as if there were
real image files sitting on the file system.  Sequential read access is easy
for all formats, random read is more difficult but should be doable for most
formats (the exception would be stream compressed formats that are not designed
for random access).

So moving to a raw-oriented storage stack does not mean we get rid of
image formats.  We still need them but they are outside the critical I/O
path.  Their role is changed since we don't push features into the
formats anymore.

Side note: iSCSI vs NBD came up during the BoF.  Although NBD has not
seen maintenance or activity recently it's perfectly possible to build
on it.  The first feature we need is a flush command (so that NBD can do
non-O_DSYNC accesses for speed).  At that point we have a bare-bones
remote block protocol that can be used for migration and for connecting
up userspace image formats.  iSCSI is more complex and suited for
permanent storage, whereas NBD is simple but perhaps not a protocol we
want to access data over for a long period of time.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 13:34 [Qemu-devel] Block layer roadmap on wiki Stefan Hajnoczi
  2011-08-22 14:27 ` Ryan Harper
@ 2011-08-22 19:04 ` Anthony Liguori
  2011-08-22 20:31   ` Stefan Hajnoczi
  1 sibling, 1 reply; 10+ messages in thread
From: Anthony Liguori @ 2011-08-22 19:04 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Christoph Hellwig

On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:
> At KVM Forum Kevin, Christoph, and I had an opportunity to get
> together for a Block Layer BoF.  We went through the recent "roadmap"
> mailing list thread and touched on each proposed feature.
>
> Here is the block layer roadmap wiki page:
> http://wiki.qemu.org/BlockRoadmap
>
> Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
> mentioned you want it for the next release.
>
> My main take-away from the BoF was that integrating support for host
> block devices and storage appliances will allow us to reduce the
> amount of effort spent on image formats.  In order to make image
> formats support the desired features and performance we end up
> implementing much of the storage stack and file systems in userspace -
> code that is duplicated and cannot take advantage of the existing
> storage stack.

The flip side is, tighter integration either makes features hard to 
consume or makes QEMU enter a space it currently hasn't.  Many features 
require root privileges to configure and a system-wide scope.  That's 
not QEMU today.

In addition, it makes QEMU tied to a specific platform (most likely Linux).

None of this is especially bad I guess, but none of it is a simple problem.

You could certainly rm -rf block/* and still be able to accomplish much 
of what's done today but it would be extremely painful to do in 
practice.  We have to find a balance of not reinventing things and 
making sure that simple things are simple to do.

That may require tighter integration and more focus on the higher up 
pieces in the stack to really enable this.

Regards,

Anthony Liguori

>
> Storage management features are not just available in remote SAN and
> NAS appliances anymore.  For local storage, btrfs has file-level
> clones and thin-dev is significantly improving LVM snapshots.
>
> Thin-dev is bringing a much more efficient and scalable snapshot model
> to LVM.  This device-mapper feature will make LVM attractive for high
> performance I/O without giving up snapshot and clone features.  It
> also supports cloning off block devices that are not in the pool (e.g.
> external storage, much like QEMU's backing files feature):
> https://github.com/jthornber/linux-2.6/tree/thin-dev
>
> This will not replace image formats overnight because image formats
> are still widely used and will continue to be a useful for
> transferring and sharing disk images.  But focussing on the larger
> storage stack where either local LVM, btrfs, or storage appliances do
> the storage management means we exploit those options instead of
> implementing equivalent functionality ourselves.  QEMU then runs with
> plain old raw in more cases.
>
> Stefan
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 19:04 ` Anthony Liguori
@ 2011-08-22 20:31   ` Stefan Hajnoczi
  2011-08-22 20:48     ` Ryan Harper
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2011-08-22 20:31 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, qemu-devel, Christoph Hellwig

On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:
>>
>> At KVM Forum Kevin, Christoph, and I had an opportunity to get
>> together for a Block Layer BoF.  We went through the recent "roadmap"
>> mailing list thread and touched on each proposed feature.
>>
>> Here is the block layer roadmap wiki page:
>> http://wiki.qemu.org/BlockRoadmap
>>
>> Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
>> mentioned you want it for the next release.
>>
>> My main take-away from the BoF was that integrating support for host
>> block devices and storage appliances will allow us to reduce the
>> amount of effort spent on image formats.  In order to make image
>> formats support the desired features and performance we end up
>> implementing much of the storage stack and file systems in userspace -
>> code that is duplicated and cannot take advantage of the existing
>> storage stack.
>
> The flip side is, tighter integration either makes features hard to consume
> or makes QEMU enter a space it currently hasn't.  Many features require root
> privileges to configure and a system-wide scope.  That's not QEMU today.

QEMU itself should be about emulation and virtualization.  Storage
management needs to be done outside of QEMU.  Today you can already
take an LVM snapshot - it happens outside of QEMU.  It's at the
libvirt level where different storage systems get abstracted (LVM,
directory, iSCSI, etc) and there is a single API/command set to invoke
management functions.  But even without libvirt you can do it
yourself, and I think this separation makes sense so that QEMU can be
focussed on running a single VM rather than managing storage.

> In addition, it makes QEMU tied to a specific platform (most likely Linux).

QEMU will still work but certain features might not be available.  For
example, this is true today if you're using a storage appliance that
does deduplication - that's a feature you're getting on top of the
emulation/virtualization that QEMU does.  But it doesn't tie QEMU to a
particular platform.

> You could certainly rm -rf block/* and still be able to accomplish much of
> what's done today but it would be extremely painful to do in practice.  We
> have to find a balance of not reinventing things and making sure that simple
> things are simple to do.

We wouldn't rm -rf block/* because we still need qemu-nbd.  It
probably makes sense to keep what we have today.  I'm talking more
about a shift from writing our own image format to integrating
existing storage support.

> That may require tighter integration and more focus on the higher up pieces
> in the stack to really enable this.

Yes, exactly.  Much of it shouldn't be inside QEMU.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 20:31   ` Stefan Hajnoczi
@ 2011-08-22 20:48     ` Ryan Harper
  2011-08-22 21:01       ` Anthony Liguori
  0 siblings, 1 reply; 10+ messages in thread
From: Ryan Harper @ 2011-08-22 20:48 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Christoph Hellwig

* Stefan Hajnoczi <stefanha@gmail.com> [2011-08-22 15:32]:
> On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> > On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:
> >>
> >> At KVM Forum Kevin, Christoph, and I had an opportunity to get
> >> together for a Block Layer BoF.  We went through the recent "roadmap"
> >> mailing list thread and touched on each proposed feature.
> >>
> >> Here is the block layer roadmap wiki page:
> >> http://wiki.qemu.org/BlockRoadmap
> >>
> >> Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
> >> mentioned you want it for the next release.
> >>
> >> My main take-away from the BoF was that integrating support for host
> >> block devices and storage appliances will allow us to reduce the
> >> amount of effort spent on image formats.  In order to make image
> >> formats support the desired features and performance we end up
> >> implementing much of the storage stack and file systems in userspace -
> >> code that is duplicated and cannot take advantage of the existing
> >> storage stack.
> >
> > The flip side is, tighter integration either makes features hard to consume
> > or makes QEMU enter a space it currently hasn't.  Many features require root
> > privileges to configure and a system-wide scope.  That's not QEMU today.
> 
> QEMU itself should be about emulation and virtualization.  Storage
> management needs to be done outside of QEMU.  Today you can already
> take an LVM snapshot - it happens outside of QEMU.  It's at the
> libvirt level where different storage systems get abstracted (LVM,
> directory, iSCSI, etc) and there is a single API/command set to invoke
> management functions.  But even without libvirt you can do it
> yourself, and I think this separation makes sense so that QEMU can be
> focussed on running a single VM rather than managing storage.
> 
> > In addition, it makes QEMU tied to a specific platform (most likely Linux).
> 
> QEMU will still work but certain features might not be available.  For
> example, this is true today if you're using a storage appliance that
> does deduplication - that's a feature you're getting on top of the
> emulation/virtualization that QEMU does.  But it doesn't tie QEMU to a
> particular platform.
> 
> > You could certainly rm -rf block/* and still be able to accomplish much of
> > what's done today but it would be extremely painful to do in practice.  We
> > have to find a balance of not reinventing things and making sure that simple
> > things are simple to do.
> 
> We wouldn't rm -rf block/* because we still need qemu-nbd.  It
> probably makes sense to keep what we have today.  I'm talking more
> about a shift from writing our own image format to integrating
> existing storage support.

I think this is a key point.  While I do like the idea of keeping QEMU
focused on single VM, I think we don't help ourselves by not consuming
the hypervisor platform services and integrating/exploiting those
features to make using QEMU easier.

That said, it does mean that some things like system-wide config and
privs are hard and aren't strictly virtualization issues, but that
doesn't mean we can't integrate some sort of solution.

> 
> > That may require tighter integration and more focus on the higher up pieces
> > in the stack to really enable this.
> 
> Yes, exactly.  Much of it shouldn't be inside QEMU.
> 
> Stefan

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 20:48     ` Ryan Harper
@ 2011-08-22 21:01       ` Anthony Liguori
  2011-08-23  7:59         ` Stefan Hajnoczi
  2011-08-23 11:25         ` Kevin Wolf
  0 siblings, 2 replies; 10+ messages in thread
From: Anthony Liguori @ 2011-08-22 21:01 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Christoph Hellwig

On 08/22/2011 03:48 PM, Ryan Harper wrote:
> * Stefan Hajnoczi<stefanha@gmail.com>  [2011-08-22 15:32]:
>> We wouldn't rm -rf block/* because we still need qemu-nbd.  It
>> probably makes sense to keep what we have today.  I'm talking more
>> about a shift from writing our own image format to integrating
>> existing storage support.
>
> I think this is a key point.  While I do like the idea of keeping QEMU
> focused on single VM, I think we don't help ourselves by not consuming
> the hypervisor platform services and integrating/exploiting those
> features to make using QEMU easier.

Let's avoid the h-word here as it's not terribly relevant to the discussion.

Configuring block devices is fundamentally a privileged operation.  QEMU 
fundamentally is designed to be useful as an unprivileged user.

That's the trouble with something like LVM.  Only root can create LVM 
snapshots and it's an all-or-nothing security model.

If you want to get QEMU out of the snapshot business, you need a file 
system that's widely available that allows non-privileged users to take 
snapshots of individual files.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 21:01       ` Anthony Liguori
@ 2011-08-23  7:59         ` Stefan Hajnoczi
  2011-08-23 11:25         ` Kevin Wolf
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2011-08-23  7:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, Ryan Harper, qemu-devel, Christoph Hellwig

On Mon, Aug 22, 2011 at 04:01:08PM -0500, Anthony Liguori wrote:
> On 08/22/2011 03:48 PM, Ryan Harper wrote:
> >* Stefan Hajnoczi<stefanha@gmail.com>  [2011-08-22 15:32]:
> >>We wouldn't rm -rf block/* because we still need qemu-nbd.  It
> >>probably makes sense to keep what we have today.  I'm talking more
> >>about a shift from writing our own image format to integrating
> >>existing storage support.
> >
> >I think this is a key point.  While I do like the idea of keeping QEMU
> >focused on single VM, I think we don't help ourselves by not consuming
> >the hypervisor platform services and integrating/exploiting those
> >features to make using QEMU easier.
> 
> Let's avoid the h-word here as it's not terribly relevant to the discussion.
> 
> Configuring block devices is fundamentally a privileged operation.
> QEMU fundamentally is designed to be useful as an unprivileged user.
> 
> That's the trouble with something like LVM.  Only root can create
> LVM snapshots and it's an all-or-nothing security model.
> 
> If you want to get QEMU out of the snapshot business, you need a
> file system that's widely available that allows non-privileged users
> to take snapshots of individual files.

I don't think we should remove qcow2 internal snapshots or
blockdev_snapshot.  But they have performance limitations where it makes
sense to start using existing storage support instead of reimplementing
efficient and scalable snapshots ourselves.

btrfs is maturing and its BTRFS_IOC_CLONE ioctl is unprivileged.  So we
can offer that option for unprivileged users.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-22 21:01       ` Anthony Liguori
  2011-08-23  7:59         ` Stefan Hajnoczi
@ 2011-08-23 11:25         ` Kevin Wolf
  2011-08-23 12:21           ` Stefan Hajnoczi
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2011-08-23 11:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Ryan Harper, qemu-devel, Christoph Hellwig

Am 22.08.2011 23:01, schrieb Anthony Liguori:
> On 08/22/2011 03:48 PM, Ryan Harper wrote:
>> * Stefan Hajnoczi<stefanha@gmail.com>  [2011-08-22 15:32]:
>>> We wouldn't rm -rf block/* because we still need qemu-nbd.  It
>>> probably makes sense to keep what we have today.  I'm talking more
>>> about a shift from writing our own image format to integrating
>>> existing storage support.
>>
>> I think this is a key point.  While I do like the idea of keeping QEMU
>> focused on single VM, I think we don't help ourselves by not consuming
>> the hypervisor platform services and integrating/exploiting those
>> features to make using QEMU easier.
> 
> Let's avoid the h-word here as it's not terribly relevant to the discussion.
> 
> Configuring block devices is fundamentally a privileged operation.  QEMU 
> fundamentally is designed to be useful as an unprivileged user.
> 
> That's the trouble with something like LVM.  Only root can create LVM 
> snapshots and it's an all-or-nothing security model.
> 
> If you want to get QEMU out of the snapshot business, you need a file 
> system that's widely available that allows non-privileged users to take 
> snapshots of individual files.

I agree with you there (and it's interesting how different perception of
the BoF results can be ;-))

It's probably true that there are ways to do certain things on host
block devices and we should definitely support such use cases better
(where we means mostly the management layer, but we can possibly
integrate things into qemu like a file-btrfs or lvm_device backend that
supports snapshots or something).

It isn't for everyone, though, and this is why I tried to point out in
the BoF that image formats aren't going to go away and we still need
good support for them. Providing only raw for running VMs and declaring
the rest of the formats to be intended for import/export only doesn't work.

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Block layer roadmap on wiki
  2011-08-23 11:25         ` Kevin Wolf
@ 2011-08-23 12:21           ` Stefan Hajnoczi
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2011-08-23 12:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Ryan Harper, qemu-devel, Christoph Hellwig

On Tue, Aug 23, 2011 at 12:25 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 22.08.2011 23:01, schrieb Anthony Liguori:
>> On 08/22/2011 03:48 PM, Ryan Harper wrote:
>>> * Stefan Hajnoczi<stefanha@gmail.com>  [2011-08-22 15:32]:
>>>> We wouldn't rm -rf block/* because we still need qemu-nbd.  It
>>>> probably makes sense to keep what we have today.  I'm talking more
>>>> about a shift from writing our own image format to integrating
>>>> existing storage support.
>>>
>>> I think this is a key point.  While I do like the idea of keeping QEMU
>>> focused on single VM, I think we don't help ourselves by not consuming
>>> the hypervisor platform services and integrating/exploiting those
>>> features to make using QEMU easier.
>>
>> Let's avoid the h-word here as it's not terribly relevant to the discussion.
>>
>> Configuring block devices is fundamentally a privileged operation.  QEMU
>> fundamentally is designed to be useful as an unprivileged user.
>>
>> That's the trouble with something like LVM.  Only root can create LVM
>> snapshots and it's an all-or-nothing security model.
>>
>> If you want to get QEMU out of the snapshot business, you need a file
>> system that's widely available that allows non-privileged users to take
>> snapshots of individual files.
>
> I agree with you there (and it's interesting how different perception of
> the BoF results can be ;-))
>
> It's probably true that there are ways to do certain things on host
> block devices and we should definitely support such use cases better
> (where we means mostly the management layer, but we can possibly
> integrate things into qemu like a file-btrfs or lvm_device backend that
> supports snapshots or something).
>
> It isn't for everyone, though, and this is why I tried to point out in
> the BoF that image formats aren't going to go away and we still need
> good support for them. Providing only raw for running VMs and declaring
> the rest of the formats to be intended for import/export only doesn't work.

I have said that block/*.c doesn't go away.  But we need to look at
exploiting storage features rather than reinventing them.

Snapshots are an example: we do not have a scalable snapshot mechanism
in QEMU.  External snapshots are inefficient when you build up
multiple levels (due to having to follow the backing file chain) and
when you delete a snapshot (due to copying data back into the backing
file).  Internal snapshots in qcow2 involve operations that traverse
the image metadata.  This traversal becomes a problem when image files
grow large (e.g. 1 TB and beyond) because the I/O required can take
more than 1 second which is problematic for taking snapshots while the
VM is running.

There are known ways of doing better internal snapshots along the
lines of what ZFS, btrfs, and thin-dev do.  But that means redesigning
the image metadata and reimplementing these storage systems in
userspace.

What I'm suggesting is that we draw the line here.  Keep what we've
got and continue the optimizations that we have in the pipeline.  But
when we hit significant new features, work with existing storage
systems.  Why?  Because we need to support existing storage anyway and
therefore reinventing our own is not a good use of resources.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-08-23 12:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-22 13:34 [Qemu-devel] Block layer roadmap on wiki Stefan Hajnoczi
2011-08-22 14:27 ` Ryan Harper
2011-08-22 17:58   ` Stefan Hajnoczi
2011-08-22 19:04 ` Anthony Liguori
2011-08-22 20:31   ` Stefan Hajnoczi
2011-08-22 20:48     ` Ryan Harper
2011-08-22 21:01       ` Anthony Liguori
2011-08-23  7:59         ` Stefan Hajnoczi
2011-08-23 11:25         ` Kevin Wolf
2011-08-23 12:21           ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.