All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] live snapshot wiki updated
@ 2011-07-15 14:58 Jes Sorensen
  2011-07-18 14:08 ` Stefan Hajnoczi
  0 siblings, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-15 14:58 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Stefan Hajnoczi

Hi,

I have been updating the live snapshot wiki for qemu to try and cover
the commands we will want for async snapshot handling too.

http://wiki.qemu.org/Features/Snapshots

Cheers,
Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-15 14:58 [Qemu-devel] live snapshot wiki updated Jes Sorensen
@ 2011-07-18 14:08 ` Stefan Hajnoczi
  2011-07-19  7:24   ` Jes Sorensen
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-18 14:08 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: QEMU Developers, Stefan Hajnoczi

On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> I have been updating the live snapshot wiki for qemu to try and cover
> the commands we will want for async snapshot handling too.
>
> http://wiki.qemu.org/Features/Snapshots

Regarding fd passing, do we even support SELinux today with backing files?

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-18 14:08 ` Stefan Hajnoczi
@ 2011-07-19  7:24   ` Jes Sorensen
  2011-07-19 13:23     ` Stefan Hajnoczi
  0 siblings, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-19  7:24 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: QEMU Developers, Stefan Hajnoczi

On 07/18/11 16:08, Stefan Hajnoczi wrote:
> On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
>> I have been updating the live snapshot wiki for qemu to try and cover
>> the commands we will want for async snapshot handling too.
>>
>> http://wiki.qemu.org/Features/Snapshots
> 
> Regarding fd passing, do we even support SELinux today with backing files?

Not sure I understand what you mean. The current code should be happy to
take an existing file or a raw device for the snapshot.

Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19  7:24   ` Jes Sorensen
@ 2011-07-19 13:23     ` Stefan Hajnoczi
  2011-07-19 13:27       ` Jes Sorensen
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-19 13:23 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: QEMU Developers, Stefan Hajnoczi

On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> On 07/18/11 16:08, Stefan Hajnoczi wrote:
>> On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
>>> I have been updating the live snapshot wiki for qemu to try and cover
>>> the commands we will want for async snapshot handling too.
>>>
>>> http://wiki.qemu.org/Features/Snapshots
>>
>> Regarding fd passing, do we even support SELinux today with backing files?
>
> Not sure I understand what you mean. The current code should be happy to
> take an existing file or a raw device for the snapshot.

Sorry, I was off on a tangent.

I think today QEMU does not support opening image files with a backing
file purely using file descriptors.  We currently require the ability
to open files.

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 13:23     ` Stefan Hajnoczi
@ 2011-07-19 13:27       ` Jes Sorensen
  2011-07-19 13:58         ` Eric Blake
  0 siblings, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-19 13:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Eric Blake, QEMU Developers, Stefan Hajnoczi

On 07/19/11 15:23, Stefan Hajnoczi wrote:
> On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
>> On 07/18/11 16:08, Stefan Hajnoczi wrote:
>>> On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
>>>> I have been updating the live snapshot wiki for qemu to try and cover
>>>> the commands we will want for async snapshot handling too.
>>>>
>>>> http://wiki.qemu.org/Features/Snapshots
>>>
>>> Regarding fd passing, do we even support SELinux today with backing files?
>>
>> Not sure I understand what you mean. The current code should be happy to
>> take an existing file or a raw device for the snapshot.
> 
> Sorry, I was off on a tangent.
> 
> I think today QEMU does not support opening image files with a backing
> file purely using file descriptors.  We currently require the ability
> to open files.

I see what you mean - I don't actually know how that would work, since
the backing file specified in the front image will be a file name.

Eric, what happens if libvirt in an selinux environment tells QEMU to
launch using an image file that is backed by backing file(s)?

Cheers,
Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 13:27       ` Jes Sorensen
@ 2011-07-19 13:58         ` Eric Blake
  2011-07-19 14:09           ` Jes Sorensen
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Blake @ 2011-07-19 13:58 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Stefan Hajnoczi, QEMU Developers, Stefan Hajnoczi

On 07/19/2011 07:27 AM, Jes Sorensen wrote:
> On 07/19/11 15:23, Stefan Hajnoczi wrote:
>> On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
>>> On 07/18/11 16:08, Stefan Hajnoczi wrote:
>>>> On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
>>>>> I have been updating the live snapshot wiki for qemu to try and cover
>>>>> the commands we will want for async snapshot handling too.
>>>>>
>>>>> http://wiki.qemu.org/Features/Snapshots
>>>>
>>>> Regarding fd passing, do we even support SELinux today with backing files?
>>>
>>> Not sure I understand what you mean. The current code should be happy to
>>> take an existing file or a raw device for the snapshot.
>>
>> Sorry, I was off on a tangent.
>>
>> I think today QEMU does not support opening image files with a backing
>> file purely using file descriptors.  We currently require the ability
>> to open files.
>
> I see what you mean - I don't actually know how that would work, since
> the backing file specified in the front image will be a file name.
>
> Eric, what happens if libvirt in an selinux environment tells QEMU to
> launch using an image file that is backed by backing file(s)?

Before starting qemu, libvirt first parses all the image files, to see 
if any of them have backing images.  For every qcow2 or qed image with a 
backing file, libvirt sets the SELinux context of both the qcow2 image 
and its backing file so that qemu will be able to successfully open() 
them.  But if any of those files reside on NFS, then it is not possible 
to label individual files, so it requires setting the SELinux bool 
virt_use_nfs, which thus gives qemu the power to open() arbitrary files 
on NFS, and you've lost security.

It would be nice if libvirt had a way to pass fds for every disk and 
backing file up front; then, SELinux can work around the lack of NFS 
per-file labelling by blocking open() in qemu.  In fact, this has 
already been proposed:

http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02072.html
http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01992.html

That thread mentioned both a command-line syntax for passing in fds for 
backing files, as well as an extension to the getfd monitor command to 
allow association of a runtime fd with a filename.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 13:58         ` Eric Blake
@ 2011-07-19 14:09           ` Jes Sorensen
  2011-07-19 14:24             ` Eric Blake
  0 siblings, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-19 14:09 UTC (permalink / raw)
  To: Eric Blake; +Cc: Stefan Hajnoczi, QEMU Developers, Stefan Hajnoczi

On 07/19/11 15:58, Eric Blake wrote:
> On 07/19/2011 07:27 AM, Jes Sorensen wrote:
>> Eric, what happens if libvirt in an selinux environment tells QEMU to
>> launch using an image file that is backed by backing file(s)?
> 
> Before starting qemu, libvirt first parses all the image files, to see
> if any of them have backing images.  For every qcow2 or qed image with a
> backing file, libvirt sets the SELinux context of both the qcow2 image
> and its backing file so that qemu will be able to successfully open()
> them.  But if any of those files reside on NFS, then it is not possible
> to label individual files, so it requires setting the SELinux bool
> virt_use_nfs, which thus gives qemu the power to open() arbitrary files
> on NFS, and you've lost security.

Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.

> It would be nice if libvirt had a way to pass fds for every disk and
> backing file up front; then, SELinux can work around the lack of NFS
> per-file labelling by blocking open() in qemu.  In fact, this has
> already been proposed:

A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.

If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 14:09           ` Jes Sorensen
@ 2011-07-19 14:24             ` Eric Blake
  2011-07-19 14:30               ` Jes Sorensen
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Blake @ 2011-07-19 14:24 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: libvir-list, Stefan Hajnoczi, QEMU Developers, Stefan Hajnoczi

[adding the libvir-list]

On 07/19/2011 08:09 AM, Jes Sorensen wrote:
> On 07/19/11 15:58, Eric Blake wrote:
>> On 07/19/2011 07:27 AM, Jes Sorensen wrote:
>>> Eric, what happens if libvirt in an selinux environment tells QEMU to
>>> launch using an image file that is backed by backing file(s)?
>>
>> Before starting qemu, libvirt first parses all the image files, to see
>> if any of them have backing images.  For every qcow2 or qed image with a
>> backing file, libvirt sets the SELinux context of both the qcow2 image
>> and its backing file so that qemu will be able to successfully open()
>> them.  But if any of those files reside on NFS, then it is not possible
>> to label individual files, so it requires setting the SELinux bool
>> virt_use_nfs, which thus gives qemu the power to open() arbitrary files
>> on NFS, and you've lost security.
>
> Urgh, libvirt parsing image files is really unfortunate, it really
> doesn't give me warm fuzzy feelings :( libvirt really should not know
> about internals of image formats.

But even if you add new features to qemu to avoid needing this in the 
future, it doesn't change the past - libvirt will always have to know 
how to parse image files understood by older qemu, and so as long as 
libvirt already knows how to do that parsing, we might as well take 
advantage of it.

Besides, I feel that having a well-documented file format, so that 
independent applications can both parse the same file with the same 
semantics by obeying the file format specification, is a good design goal.

>
>> It would be nice if libvirt had a way to pass fds for every disk and
>> backing file up front; then, SELinux can work around the lack of NFS
>> per-file labelling by blocking open() in qemu.  In fact, this has
>> already been proposed:
>
> A cleaner solution seems to have libvirt provide a call-back allowing
> QEMU to call out and have libvirt open a file descriptor instead. This
> way libvirt can validate it and open it for QEMU and pass it back.

Yes, that could probably be made to work with libvirt.

>
> If we cannot do something like this, I would prefer to have backing
> files on NFS should simply not be supported when running in an selinux
> setup.

As nice as that sentiment is, it will never fly, because it would be a 
regression in current behavior.  The whole reason that the virt_use_nfs 
SELinux bool exists is that some people are willing to make the partial 
security tradeoff.  Besides, the use of sVirt via SELinux is more than 
just open() protection - while the current virt_use_nfs bool makes NFS 
less secure than otherwise possible, it still gives some nice guarantees 
to the rest of the qemu process such as passthrough accesses to local 
pci devices.

Just because it is currently not as secure to mix NFS shared storage 
with backing files doesn't stop some people from wanting to do it [in 
fact, that's my current development setup - I use qcow2 images on NFS 
shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. 
  This discussion is about adding enhancements that make SELinux even 
more powerful when using NFS shared storage, by adding fd passing 
(whether libvirt parses in advance, or whether qemu raises an event and 
requires feedback from libvirt), and not about crippling the existing 
capability to use the virt_use_nfs selinux bool.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 14:24             ` Eric Blake
@ 2011-07-19 14:30               ` Jes Sorensen
  2011-07-19 15:14                 ` Stefan Hajnoczi
                                   ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Jes Sorensen @ 2011-07-19 14:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, Stefan Hajnoczi, QEMU Developers, Stefan Hajnoczi

On 07/19/11 16:24, Eric Blake wrote:
> [adding the libvir-list]
> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>> Urgh, libvirt parsing image files is really unfortunate, it really
>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>> about internals of image formats.
> 
> But even if you add new features to qemu to avoid needing this in the
> future, it doesn't change the past - libvirt will always have to know
> how to parse image files understood by older qemu, and so as long as
> libvirt already knows how to do that parsing, we might as well take
> advantage of it.

What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.

> Besides, I feel that having a well-documented file format, so that
> independent applications can both parse the same file with the same
> semantics by obeying the file format specification, is a good design goal.

We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.


>>> It would be nice if libvirt had a way to pass fds for every disk and
>>> backing file up front; then, SELinux can work around the lack of NFS
>>> per-file labelling by blocking open() in qemu.  In fact, this has
>>> already been proposed:
>>
>> A cleaner solution seems to have libvirt provide a call-back allowing
>> QEMU to call out and have libvirt open a file descriptor instead. This
>> way libvirt can validate it and open it for QEMU and pass it back.
> 
> Yes, that could probably be made to work with libvirt.

I am a little frustrated this approach wasn't taken up front instead of
the evil hack of having libvirt attempt to parse image files.

>> If we cannot do something like this, I would prefer to have backing
>> files on NFS should simply not be supported when running in an selinux
>> setup.
> 
> As nice as that sentiment is, it will never fly, because it would be a
> regression in current behavior.  The whole reason that the virt_use_nfs
> SELinux bool exists is that some people are willing to make the partial
> security tradeoff.  Besides, the use of sVirt via SELinux is more than
> just open() protection - while the current virt_use_nfs bool makes NFS
> less secure than otherwise possible, it still gives some nice guarantees
> to the rest of the qemu process such as passthrough accesses to local
> pci devices.

Well leaving things at status quo is not making it worse, it just leaves
an evil in place.

> Just because it is currently not as secure to mix NFS shared storage
> with backing files doesn't stop some people from wanting to do it [in
> fact, that's my current development setup - I use qcow2 images on NFS
> shared storage, keep SELinux enabled, and enable the virt_use_nfs bool].
>  This discussion is about adding enhancements that make SELinux even
> more powerful when using NFS shared storage, by adding fd passing
> (whether libvirt parses in advance, or whether qemu raises an event and
> requires feedback from libvirt), and not about crippling the existing
> capability to use the virt_use_nfs selinux bool.

I do not believe we should try and add extra interfaces to support
something which is inherently broken. This really boils down to whether
we should support fd passing for snapshots in the first place. If it is
to support the broken setup of libvirt parsing image files, then I am
totally against it, if we work on a proper solution that involves this
in some way, then we can discuss it.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 14:30               ` Jes Sorensen
@ 2011-07-19 15:14                 ` Stefan Hajnoczi
  2011-07-19 16:46                   ` Daniel P. Berrange
  2011-07-19 16:14                 ` Anthony Liguori
  2011-07-19 16:47                 ` Daniel P. Berrange
  2 siblings, 1 reply; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-19 15:14 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: libvir-list, Eric Blake, QEMU Developers, Stefan Hajnoczi

On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> On 07/19/11 16:24, Eric Blake wrote:
>> [adding the libvir-list]
>> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>>> Urgh, libvirt parsing image files is really unfortunate, it really
>>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>>> about internals of image formats.
>>
>> But even if you add new features to qemu to avoid needing this in the
>> future, it doesn't change the past - libvirt will always have to know
>> how to parse image files understood by older qemu, and so as long as
>> libvirt already knows how to do that parsing, we might as well take
>> advantage of it.
>
> What has been done here in the past is plain wrong. Continuing to do it
> isn't the right thing to do here.
>
>> Besides, I feel that having a well-documented file format, so that
>> independent applications can both parse the same file with the same
>> semantics by obeying the file format specification, is a good design goal.
>
> We all know that documentation is rarely uptodate, new features may not
> get added and libvirt will never be able to keep up. The driver for a
> file format belongs in QEMU and nowhere else.

It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers.  Having both QEMU and libvirt
know the internals of image files is such a multi-dependency.  If I
want to add a new format or change an existing format I have to touch
both layers.

For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 14:30               ` Jes Sorensen
  2011-07-19 15:14                 ` Stefan Hajnoczi
@ 2011-07-19 16:14                 ` Anthony Liguori
  2011-07-20  8:25                   ` Jes Sorensen
  2011-07-20 13:50                   ` Cleber Rosa
  2011-07-19 16:47                 ` Daniel P. Berrange
  2 siblings, 2 replies; 55+ messages in thread
From: Anthony Liguori @ 2011-07-19 16:14 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On 07/19/2011 09:30 AM, Jes Sorensen wrote:
> On 07/19/11 16:24, Eric Blake wrote:
>> [adding the libvir-list]
>> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>>> Urgh, libvirt parsing image files is really unfortunate, it really
>>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>>> about internals of image formats.
>>
>> But even if you add new features to qemu to avoid needing this in the
>> future, it doesn't change the past - libvirt will always have to know
>> how to parse image files understood by older qemu, and so as long as
>> libvirt already knows how to do that parsing, we might as well take
>> advantage of it.
>
> What has been done here in the past is plain wrong. Continuing to do it
> isn't the right thing to do here.
>
>> Besides, I feel that having a well-documented file format, so that
>> independent applications can both parse the same file with the same
>> semantics by obeying the file format specification, is a good design goal.
>
> We all know that documentation is rarely uptodate, new features may not
> get added and libvirt will never be able to keep up. The driver for a
> file format belongs in QEMU and nowhere else.
>
>
>>>> It would be nice if libvirt had a way to pass fds for every disk and
>>>> backing file up front; then, SELinux can work around the lack of NFS
>>>> per-file labelling by blocking open() in qemu.  In fact, this has
>>>> already been proposed:
>>>
>>> A cleaner solution seems to have libvirt provide a call-back allowing
>>> QEMU to call out and have libvirt open a file descriptor instead. This
>>> way libvirt can validate it and open it for QEMU and pass it back.
>>
>> Yes, that could probably be made to work with libvirt.
>
> I am a little frustrated this approach wasn't taken up front instead of
> the evil hack of having libvirt attempt to parse image files.
>
>>> If we cannot do something like this, I would prefer to have backing
>>> files on NFS should simply not be supported when running in an selinux
>>> setup.
>>
>> As nice as that sentiment is, it will never fly, because it would be a
>> regression in current behavior.  The whole reason that the virt_use_nfs
>> SELinux bool exists is that some people are willing to make the partial
>> security tradeoff.  Besides, the use of sVirt via SELinux is more than
>> just open() protection - while the current virt_use_nfs bool makes NFS
>> less secure than otherwise possible, it still gives some nice guarantees
>> to the rest of the qemu process such as passthrough accesses to local
>> pci devices.
>
> Well leaving things at status quo is not making it worse, it just leaves
> an evil in place.

NFS and SELinux is a fundamental problem with SELinux and NFS.  We can 
piss and moan as much as we want about it but it's reality.  SELinux 
fundamentally requires extended attributes.  By the time NFS adds 
extended attribute support, we'll all be flying around in hover cars.

As terrible as NFS is, people use it all of the time.

It would be nice if libvirt had the ability to make better use of DAC to 
support isolation.  The fact that MAC is the only way you can do 
isolation between guests is pretty unfortunate.  If I could assign 
specific UIDs to a guest and use that to enforce isolation, it would go 
a long ways to solving this problem.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 15:14                 ` Stefan Hajnoczi
@ 2011-07-19 16:46                   ` Daniel P. Berrange
  2011-07-20  7:30                     ` Markus Armbruster
                                       ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Daniel P. Berrange @ 2011-07-19 16:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: libvir-list, Jes Sorensen, Eric Blake, QEMU Developers, Stefan Hajnoczi

On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> > On 07/19/11 16:24, Eric Blake wrote:
> >> [adding the libvir-list]
> >> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
> >>> Urgh, libvirt parsing image files is really unfortunate, it really
> >>> doesn't give me warm fuzzy feelings :( libvirt really should not know
> >>> about internals of image formats.
> >>
> >> But even if you add new features to qemu to avoid needing this in the
> >> future, it doesn't change the past - libvirt will always have to know
> >> how to parse image files understood by older qemu, and so as long as
> >> libvirt already knows how to do that parsing, we might as well take
> >> advantage of it.
> >
> > What has been done here in the past is plain wrong. Continuing to do it
> > isn't the right thing to do here.
> >
> >> Besides, I feel that having a well-documented file format, so that
> >> independent applications can both parse the same file with the same
> >> semantics by obeying the file format specification, is a good design goal.
> >
> > We all know that documentation is rarely uptodate, new features may not
> > get added and libvirt will never be able to keep up. The driver for a
> > file format belongs in QEMU and nowhere else.
> 
> It should be a goal to avoid dependencies in multiple layers of the
> stack because it becomes are to add new features - they require
> coordinated changes in multiple layers.  Having both QEMU and libvirt
> know the internals of image files is such a multi-dependency.  If I
> want to add a new format or change an existing format I have to touch
> both layers.
> 
> For fd-passing perhaps we have an opportunity to use a callback
> mechanism (QEMU request: filename -> libvirt response: fd) and do all
> the image format parsing in QEMU.

The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.

NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient & reliable than invoking
qemu-img info & parsing the output. Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 14:30               ` Jes Sorensen
  2011-07-19 15:14                 ` Stefan Hajnoczi
  2011-07-19 16:14                 ` Anthony Liguori
@ 2011-07-19 16:47                 ` Daniel P. Berrange
  2011-07-20  8:26                   ` Jes Sorensen
                                     ` (2 more replies)
  2 siblings, 3 replies; 55+ messages in thread
From: Daniel P. Berrange @ 2011-07-19 16:47 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
> On 07/19/11 16:24, Eric Blake wrote:
> > [adding the libvir-list]
> > On 07/19/2011 08:09 AM, Jes Sorensen wrote:
> >> Urgh, libvirt parsing image files is really unfortunate, it really
> >> doesn't give me warm fuzzy feelings :( libvirt really should not know
> >> about internals of image formats.
> > 
> > But even if you add new features to qemu to avoid needing this in the
> > future, it doesn't change the past - libvirt will always have to know
> > how to parse image files understood by older qemu, and so as long as
> > libvirt already knows how to do that parsing, we might as well take
> > advantage of it.
> 
> What has been done here in the past is plain wrong. Continuing to do it
> isn't the right thing to do here.
> 
> > Besides, I feel that having a well-documented file format, so that
> > independent applications can both parse the same file with the same
> > semantics by obeying the file format specification, is a good design goal.
> 
> We all know that documentation is rarely uptodate, new features may not
> get added and libvirt will never be able to keep up. The driver for a
> file format belongs in QEMU and nowhere else.

This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:46                   ` Daniel P. Berrange
@ 2011-07-20  7:30                     ` Markus Armbruster
  2011-07-20  8:23                     ` Jes Sorensen
  2011-07-20  9:50                     ` Kevin Wolf
  2 siblings, 0 replies; 55+ messages in thread
From: Markus Armbruster @ 2011-07-20  7:30 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Stefan Hajnoczi, libvir-list, Stefan Hajnoczi, QEMU Developers,
	Jes Sorensen, Eric Blake

"Daniel P. Berrange" <berrange@redhat.com> writes:

> On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
>> > On 07/19/11 16:24, Eric Blake wrote:
>> >> [adding the libvir-list]
>> >> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>> >>> Urgh, libvirt parsing image files is really unfortunate, it really
>> >>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>> >>> about internals of image formats.
>> >>
>> >> But even if you add new features to qemu to avoid needing this in the
>> >> future, it doesn't change the past - libvirt will always have to know
>> >> how to parse image files understood by older qemu, and so as long as
>> >> libvirt already knows how to do that parsing, we might as well take
>> >> advantage of it.
>> >
>> > What has been done here in the past is plain wrong. Continuing to do it
>> > isn't the right thing to do here.
>> >
>> >> Besides, I feel that having a well-documented file format, so that
>> >> independent applications can both parse the same file with the same
>> >> semantics by obeying the file format specification, is a good design goal.
>> >
>> > We all know that documentation is rarely uptodate, new features may not
>> > get added and libvirt will never be able to keep up. The driver for a
>> > file format belongs in QEMU and nowhere else.
>> 
>> It should be a goal to avoid dependencies in multiple layers of the
>> stack because it becomes are to add new features - they require
>> coordinated changes in multiple layers.  Having both QEMU and libvirt
>> know the internals of image files is such a multi-dependency.  If I
>> want to add a new format or change an existing format I have to touch
>> both layers.
>> 
>> For fd-passing perhaps we have an opportunity to use a callback
>> mechanism (QEMU request: filename -> libvirt response: fd) and do all
>> the image format parsing in QEMU.
>
> The reason why libvirt does the parsing of file headers to determine
> backing files is to maintain the trust boundary. Everything run from
> the exec() of QEMU onwards is considered untrusted code. So having
> QEMU parsing the file headers & passing back open() requests to libvirt
> is breaking the trust boundary.

Exactly.

The block drivers form a tree.  Each driver opens its children.  For
convenience, some of them have information on how to open their children
encoded in their image (e.g. COW backing images).  Others receive it as
configuration, encoded in their "filename" string (e.g. blkdebug).

Thus, we have direct control only over the root block driver.  We can
pass it fds easily.

We control the non-root block drivers only indirectly, through the block
drivers along the path from the root.  We don't have a way to pass them
fds.

If we had a way to construct the tree bottom-up, we could directly
control all the block drivers.

Requires knowing the number of children, and extracting child
configuration from images where applicable.

[...]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:46                   ` Daniel P. Berrange
  2011-07-20  7:30                     ` Markus Armbruster
@ 2011-07-20  8:23                     ` Jes Sorensen
  2011-07-20  9:36                       ` Daniel P. Berrange
  2011-07-20  9:50                     ` Kevin Wolf
  2 siblings, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-20  8:23 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On 07/19/11 18:46, Daniel P. Berrange wrote:
> On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
>> For fd-passing perhaps we have an opportunity to use a callback
>> mechanism (QEMU request: filename -> libvirt response: fd) and do all
>> the image format parsing in QEMU.
> 
> The reason why libvirt does the parsing of file headers to determine
> backing files is to maintain the trust boundary. Everything run from
> the exec() of QEMU onwards is considered untrusted code. So having
> QEMU parsing the file headers & passing back open() requests to libvirt
> is breaking the trust boundary.

Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.

> NB, i'm not happy about libvirt having to have knowledge of file format
> headers, but we needed something more efficient & reliable than invoking
> qemu-img info & parsing the output. Ideally QEMU (or something else)
> would provide a library libblockformat.so with stable APIs for at least
> reading metadata about image formats. If it had APIs for image creation,
> etc too that would be a bonus, but we're more or less ok spawning qemu-img
> for those cases currently.

Even having a library for libvirt to link against is suboptimal here.
Two processes shouldn't be fighting over the internals of metadata, the
ownership of the metadata belongs solely with QEMU. In addition you have
the constant issue of dependencies there, hence if QEMU is updated and
it provides a newer block format library, it may prevent libvirt from
running forcing an update of libvirt as well. That is not acceptable for
development.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:14                 ` Anthony Liguori
@ 2011-07-20  8:25                   ` Jes Sorensen
  2011-07-20 10:01                     ` Kevin Wolf
  2011-07-20 13:50                   ` Cleber Rosa
  1 sibling, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-20  8:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On 07/19/11 18:14, Anthony Liguori wrote:
>>> As nice as that sentiment is, it will never fly, because it would be a
>>> regression in current behavior.  The whole reason that the virt_use_nfs
>>> SELinux bool exists is that some people are willing to make the partial
>>> security tradeoff.  Besides, the use of sVirt via SELinux is more than
>>> just open() protection - while the current virt_use_nfs bool makes NFS
>>> less secure than otherwise possible, it still gives some nice guarantees
>>> to the rest of the qemu process such as passthrough accesses to local
>>> pci devices.
>>
>> Well leaving things at status quo is not making it worse, it just leaves
>> an evil in place.
> 
> NFS and SELinux is a fundamental problem with SELinux and NFS.  We can
> piss and moan as much as we want about it but it's reality.  SELinux
> fundamentally requires extended attributes.  By the time NFS adds
> extended attribute support, we'll all be flying around in hover cars.
> 
> As terrible as NFS is, people use it all of the time.
> 
> It would be nice if libvirt had the ability to make better use of DAC to
> support isolation.  The fact that MAC is the only way you can do
> isolation between guests is pretty unfortunate.  If I could assign
> specific UIDs to a guest and use that to enforce isolation, it would go
> a long ways to solving this problem.

Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.

My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.

Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:47                 ` Daniel P. Berrange
@ 2011-07-20  8:26                   ` Jes Sorensen
  2011-07-20  9:38                     ` Daniel P. Berrange
  2011-07-20 14:35                   ` Anthony Liguori
  2011-07-21 18:56                   ` Michael Roth
  2 siblings, 1 reply; 55+ messages in thread
From: Jes Sorensen @ 2011-07-20  8:26 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On 07/19/11 18:47, Daniel P. Berrange wrote:
> On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
>> On 07/19/11 16:24, Eric Blake wrote:
>>> Besides, I feel that having a well-documented file format, so that
>>> independent applications can both parse the same file with the same
>>> semantics by obeying the file format specification, is a good design goal.
>>
>> We all know that documentation is rarely uptodate, new features may not
>> get added and libvirt will never be able to keep up. The driver for a
>> file format belongs in QEMU and nowhere else.
> 
> This would be possible if QEMU to provide a libblockformat.so library
> which allowed apps to extract metadata from file formats using a stable
> API.

There is no reason for libvirt or any external process to mess about
with the internals of image files. You have the same problem if the
image file is encrypted.

Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20  8:23                     ` Jes Sorensen
@ 2011-07-20  9:36                       ` Daniel P. Berrange
  2011-07-20 10:15                         ` [Qemu-devel] [libvirt] " Nicolas Sebrecht
       [not found]                         ` <4E27E5A2.2030208@redhat.com>
  0 siblings, 2 replies; 55+ messages in thread
From: Daniel P. Berrange @ 2011-07-20  9:36 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote:
> On 07/19/11 18:46, Daniel P. Berrange wrote:
> > On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
> >> For fd-passing perhaps we have an opportunity to use a callback
> >> mechanism (QEMU request: filename -> libvirt response: fd) and do all
> >> the image format parsing in QEMU.
> > 
> > The reason why libvirt does the parsing of file headers to determine
> > backing files is to maintain the trust boundary. Everything run from
> > the exec() of QEMU onwards is considered untrusted code. So having
> > QEMU parsing the file headers & passing back open() requests to libvirt
> > is breaking the trust boundary.
> 
> Pardon, but I fail to see the issue here. If QEMU passes a filename back
> to libvirt, libvirt still gets to make the decision whether or not it is
> legitimate for QEMU to get that file descriptor or not. It doesn't
> change anything wrt who actually opens the file, hence the 'trust' is
> unchanged.

To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.

> > NB, i'm not happy about libvirt having to have knowledge of file format
> > headers, but we needed something more efficient & reliable than invoking
> > qemu-img info & parsing the output. Ideally QEMU (or something else)
> > would provide a library libblockformat.so with stable APIs for at least
> > reading metadata about image formats. If it had APIs for image creation,
> > etc too that would be a bonus, but we're more or less ok spawning qemu-img
> > for those cases currently.
> 
> Even having a library for libvirt to link against is suboptimal here.
> Two processes shouldn't be fighting over the internals of metadata, the
> ownership of the metadata belongs solely with QEMU. In addition you have
> the constant issue of dependencies there, hence if QEMU is updated and
> it provides a newer block format library, it may prevent libvirt from
> running forcing an update of libvirt as well. That is not acceptable for
> development.

We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has & what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20  8:26                   ` Jes Sorensen
@ 2011-07-20  9:38                     ` Daniel P. Berrange
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel P. Berrange @ 2011-07-20  9:38 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: libvir-list, Stefan Hajnoczi, Eric Blake, QEMU Developers,
	Stefan Hajnoczi

On Wed, Jul 20, 2011 at 10:26:49AM +0200, Jes Sorensen wrote:
> On 07/19/11 18:47, Daniel P. Berrange wrote:
> > On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
> >> On 07/19/11 16:24, Eric Blake wrote:
> >>> Besides, I feel that having a well-documented file format, so that
> >>> independent applications can both parse the same file with the same
> >>> semantics by obeying the file format specification, is a good design goal.
> >>
> >> We all know that documentation is rarely uptodate, new features may not
> >> get added and libvirt will never be able to keep up. The driver for a
> >> file format belongs in QEMU and nowhere else.
> > 
> > This would be possible if QEMU to provide a libblockformat.so library
> > which allowed apps to extract metadata from file formats using a stable
> > API.
> 
> There is no reason for libvirt or any external process to mess about
> with the internals of image files. You have the same problem if the
> image file is encrypted.

Just repeating "libvirt doesn't need todo this" many times doesn't make
it true. I have described why we need to read the disk image to determine
its backing files ahead of QEMU being launched quite clearly.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:46                   ` Daniel P. Berrange
  2011-07-20  7:30                     ` Markus Armbruster
  2011-07-20  8:23                     ` Jes Sorensen
@ 2011-07-20  9:50                     ` Kevin Wolf
  2011-07-20 10:18                       ` Daniel P. Berrange
  2 siblings, 1 reply; 55+ messages in thread
From: Kevin Wolf @ 2011-07-20  9:50 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Stefan Hajnoczi, libvir-list, Stefan Hajnoczi, QEMU Developers,
	Jes Sorensen, Eric Blake

Am 19.07.2011 18:46, schrieb Daniel P. Berrange:
> On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
>>> On 07/19/11 16:24, Eric Blake wrote:
>>>> [adding the libvir-list]
>>>> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>>>>> Urgh, libvirt parsing image files is really unfortunate, it really
>>>>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>>>>> about internals of image formats.
>>>>
>>>> But even if you add new features to qemu to avoid needing this in the
>>>> future, it doesn't change the past - libvirt will always have to know
>>>> how to parse image files understood by older qemu, and so as long as
>>>> libvirt already knows how to do that parsing, we might as well take
>>>> advantage of it.
>>>
>>> What has been done here in the past is plain wrong. Continuing to do it
>>> isn't the right thing to do here.
>>>
>>>> Besides, I feel that having a well-documented file format, so that
>>>> independent applications can both parse the same file with the same
>>>> semantics by obeying the file format specification, is a good design goal.
>>>
>>> We all know that documentation is rarely uptodate, new features may not
>>> get added and libvirt will never be able to keep up. The driver for a
>>> file format belongs in QEMU and nowhere else.
>>
>> It should be a goal to avoid dependencies in multiple layers of the
>> stack because it becomes are to add new features - they require
>> coordinated changes in multiple layers.  Having both QEMU and libvirt
>> know the internals of image files is such a multi-dependency.  If I
>> want to add a new format or change an existing format I have to touch
>> both layers.
>>
>> For fd-passing perhaps we have an opportunity to use a callback
>> mechanism (QEMU request: filename -> libvirt response: fd) and do all
>> the image format parsing in QEMU.
> 
> The reason why libvirt does the parsing of file headers to determine
> backing files is to maintain the trust boundary. Everything run from
> the exec() of QEMU onwards is considered untrusted code. So having
> QEMU parsing the file headers & passing back open() requests to libvirt
> is breaking the trust boundary.
> 
> NB, i'm not happy about libvirt having to have knowledge of file format
> headers, but we needed something more efficient & reliable than invoking
> qemu-img info & parsing the output. 

What's the real problem with this approach? Parsing the data meant for
humans, from an interface that is potentially unstable? If this is it,
it should be easy enough to add a JSON output mode to qemu-img info.

> Ideally QEMU (or something else)
> would provide a library libblockformat.so with stable APIs for at least
> reading metadata about image formats. If it had APIs for image creation,
> etc too that would be a bonus, but we're more or less ok spawning qemu-img
> for those cases currently.

I'm afraid the block drivers have too many dependencies on the qemu core
for this to be an option without investing a lot of effort.

Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20  8:25                   ` Jes Sorensen
@ 2011-07-20 10:01                     ` Kevin Wolf
  2011-07-20 13:25                       ` Jes Sorensen
  0 siblings, 1 reply; 55+ messages in thread
From: Kevin Wolf @ 2011-07-20 10:01 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Stefan Hajnoczi, libvir-list, Stefan Hajnoczi, QEMU Developers,
	Eric Blake

Am 20.07.2011 10:25, schrieb Jes Sorensen:
> On 07/19/11 18:14, Anthony Liguori wrote:
>>>> As nice as that sentiment is, it will never fly, because it would be a
>>>> regression in current behavior.  The whole reason that the virt_use_nfs
>>>> SELinux bool exists is that some people are willing to make the partial
>>>> security tradeoff.  Besides, the use of sVirt via SELinux is more than
>>>> just open() protection - while the current virt_use_nfs bool makes NFS
>>>> less secure than otherwise possible, it still gives some nice guarantees
>>>> to the rest of the qemu process such as passthrough accesses to local
>>>> pci devices.
>>>
>>> Well leaving things at status quo is not making it worse, it just leaves
>>> an evil in place.
>>
>> NFS and SELinux is a fundamental problem with SELinux and NFS.  We can
>> piss and moan as much as we want about it but it's reality.  SELinux
>> fundamentally requires extended attributes.  By the time NFS adds
>> extended attribute support, we'll all be flying around in hover cars.
>>
>> As terrible as NFS is, people use it all of the time.
>>
>> It would be nice if libvirt had the ability to make better use of DAC to
>> support isolation.  The fact that MAC is the only way you can do
>> isolation between guests is pretty unfortunate.  If I could assign
>> specific UIDs to a guest and use that to enforce isolation, it would go
>> a long ways to solving this problem.
> 
> Right, we're stuck with the two horros of NFS and selinux, so we need
> something that gets around the problem. In a sane world we would simply
> say 'no NFS, no selinux', but as you say that will never happen.
> 
> My suggestion of a callback mechanism where libvirt registers the
> callback with QEMU for open() calls, allowing libvirt to perform the
> open and return the open file descriptor would get around this problem.

To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.

I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.

Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] [libvirt] Re:  live snapshot wiki updated
  2011-07-20  9:36                       ` Daniel P. Berrange
@ 2011-07-20 10:15                         ` Nicolas Sebrecht
  2011-07-20 10:28                           ` Daniel P. Berrange
       [not found]                         ` <4E27E5A2.2030208@redhat.com>
  1 sibling, 1 reply; 55+ messages in thread
From: Nicolas Sebrecht @ 2011-07-20 10:15 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Jes Sorensen, Nicolas Sebrecht, QEMU Developers,
	Stefan Hajnoczi

The 20/07/11, Daniel P. Berrange wrote:

> To make the decision whether the filename from QEMU is valid, we have
> to parse the master image header data to see if the filename actually
> matches the backing file required by the image assigned to the guest.

Actually, libvirt should not have to worry if the filename provided by
QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
information others can trust; it should be fixed at QEMU side.

> We're not fighting over the internals of metadata. We just need to know
> ahead of launching QEMU, what backing files an image has & what format
> they are in. We do that by reading at the metadata headers of the disk
> images. We never attempt to write to the disk images. Either someone
> provides a library todo that, or we write the probing code for each
> file format in libvirt. Currently we have the latter.

This is what I would call "fighting with QEMU internals". How do you
prevent from concurrency access and modifications? Ideally speacking
libvirt should be able to co-exist with foreign implementations, all
requesting QEMU.

-- 
Nicolas Sebrecht

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20  9:50                     ` Kevin Wolf
@ 2011-07-20 10:18                       ` Daniel P. Berrange
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel P. Berrange @ 2011-07-20 10:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, libvir-list, Stefan Hajnoczi, QEMU Developers,
	Jes Sorensen, Eric Blake

On Wed, Jul 20, 2011 at 11:50:53AM +0200, Kevin Wolf wrote:
> Am 19.07.2011 18:46, schrieb Daniel P. Berrange:
> > On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
> >> On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> >>> On 07/19/11 16:24, Eric Blake wrote:
> >>>> [adding the libvir-list]
> >>>> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
> >>>>> Urgh, libvirt parsing image files is really unfortunate, it really
> >>>>> doesn't give me warm fuzzy feelings :( libvirt really should not know
> >>>>> about internals of image formats.
> >>>>
> >>>> But even if you add new features to qemu to avoid needing this in the
> >>>> future, it doesn't change the past - libvirt will always have to know
> >>>> how to parse image files understood by older qemu, and so as long as
> >>>> libvirt already knows how to do that parsing, we might as well take
> >>>> advantage of it.
> >>>
> >>> What has been done here in the past is plain wrong. Continuing to do it
> >>> isn't the right thing to do here.
> >>>
> >>>> Besides, I feel that having a well-documented file format, so that
> >>>> independent applications can both parse the same file with the same
> >>>> semantics by obeying the file format specification, is a good design goal.
> >>>
> >>> We all know that documentation is rarely uptodate, new features may not
> >>> get added and libvirt will never be able to keep up. The driver for a
> >>> file format belongs in QEMU and nowhere else.
> >>
> >> It should be a goal to avoid dependencies in multiple layers of the
> >> stack because it becomes are to add new features - they require
> >> coordinated changes in multiple layers.  Having both QEMU and libvirt
> >> know the internals of image files is such a multi-dependency.  If I
> >> want to add a new format or change an existing format I have to touch
> >> both layers.
> >>
> >> For fd-passing perhaps we have an opportunity to use a callback
> >> mechanism (QEMU request: filename -> libvirt response: fd) and do all
> >> the image format parsing in QEMU.
> > 
> > The reason why libvirt does the parsing of file headers to determine
> > backing files is to maintain the trust boundary. Everything run from
> > the exec() of QEMU onwards is considered untrusted code. So having
> > QEMU parsing the file headers & passing back open() requests to libvirt
> > is breaking the trust boundary.
> > 
> > NB, i'm not happy about libvirt having to have knowledge of file format
> > headers, but we needed something more efficient & reliable than invoking
> > qemu-img info & parsing the output. 
> 
> What's the real problem with this approach? Parsing the data meant for
> humans, from an interface that is potentially unstable? If this is it,
> it should be easy enough to add a JSON output mode to qemu-img info.

It is a really heavyweight solution to have to spawn qemu-img every
time we need to access this data, when it can be done with a trivial
open+read+close sequence. In addition the output data format is not
entirely pleasant for machine reading (some fields only have data
rounded up to MB, not the raw byte count). Finally, we also wanted
to be able to extract some basic metdata about disk image formats on
non-QEMU hosts, for our storage management APIs which are used on Xen
or VMWare hosts where many of these same disk image formats are also
used. A JSON output mode would be helpful, but unfortunately can't
really address the other issues.

> > Ideally QEMU (or something else)
> > would provide a library libblockformat.so with stable APIs for at least
> > reading metadata about image formats. If it had APIs for image creation,
> > etc too that would be a bonus, but we're more or less ok spawning qemu-img
> > for those cases currently.
> 
> I'm afraid the block drivers have too many dependencies on the qemu core
> for this to be an option without investing a lot of effort.

That's why I sort of think there is value in having someone provide a
standalone  library API for querying some core set of block format
metadata. QEMU is but one project with virtual disk formats, there are
plenty of others out there in existance, so while reusing QEMU block
code would be nice, it isn't leading to any significant reduction in
copies of block format parsing code amongst all the virt projects in
existance.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [libvirt] Re:  live snapshot wiki updated
  2011-07-20 10:15                         ` [Qemu-devel] [libvirt] " Nicolas Sebrecht
@ 2011-07-20 10:28                           ` Daniel P. Berrange
  2011-07-20 11:40                             ` [Qemu-devel] [libvirt] " Stefan Hajnoczi
       [not found]                             ` <4E27E610.7090502@redhat.com>
  0 siblings, 2 replies; 55+ messages in thread
From: Daniel P. Berrange @ 2011-07-20 10:28 UTC (permalink / raw)
  To: Nicolas Sebrecht
  Cc: libvir-list, Jes Sorensen, QEMU Developers, Stefan Hajnoczi

On Wed, Jul 20, 2011 at 12:15:02PM +0200, Nicolas Sebrecht wrote:
> The 20/07/11, Daniel P. Berrange wrote:
> 
> > To make the decision whether the filename from QEMU is valid, we have
> > to parse the master image header data to see if the filename actually
> > matches the backing file required by the image assigned to the guest.
> 
> Actually, libvirt should not have to worry if the filename provided by
> QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
> information others can trust; it should be fixed at QEMU side.

The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.

> > We're not fighting over the internals of metadata. We just need to know
> > ahead of launching QEMU, what backing files an image has & what format
> > they are in. We do that by reading at the metadata headers of the disk
> > images. We never attempt to write to the disk images. Either someone
> > provides a library todo that, or we write the probing code for each
> > file format in libvirt. Currently we have the latter.
> 
> This is what I would call "fighting with QEMU internals". How do you
> prevent from concurrency access and modifications? Ideally speacking
> libvirt should be able to co-exist with foreign implementations, all
> requesting QEMU.

QEMU is *not* yet running at the time libvirt reads the file metadata.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [libvirt]  live snapshot wiki updated
  2011-07-20 10:28                           ` Daniel P. Berrange
@ 2011-07-20 11:40                             ` Stefan Hajnoczi
       [not found]                             ` <4E27E610.7090502@redhat.com>
  1 sibling, 0 replies; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-20 11:40 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Jes Sorensen, QEMU Developers, Stefan Hajnoczi,
	Nicolas Sebrecht

On Wed, Jul 20, 2011 at 11:28 AM, Daniel P. Berrange
<berrange@redhat.com> wrote:
> On Wed, Jul 20, 2011 at 12:15:02PM +0200, Nicolas Sebrecht wrote:
>> The 20/07/11, Daniel P. Berrange wrote:
>>
>> > To make the decision whether the filename from QEMU is valid, we have
>> > to parse the master image header data to see if the filename actually
>> > matches the backing file required by the image assigned to the guest.
>>
>> Actually, libvirt should not have to worry if the filename provided by
>> QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
>> information others can trust; it should be fixed at QEMU side.
>
> The security goal of libvirt is to protect the host from a compromised
> QEMU, therefore QEMU is, by definition, untrusted.

This is a very reasonable goal.  QEMU is constantly dealing with the
untrusted guest.  The whole point of SELinux isolation of QEMU is to
contain any compromise to a single VM and reduce the capabilities of
that process to the minimum.

libvirt needs to help set the boundaries of what the QEMU process can do.

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 10:01                     ` Kevin Wolf
@ 2011-07-20 13:25                       ` Jes Sorensen
  2011-07-20 13:46                         ` Eric Blake
  2011-07-20 13:51                         ` Kevin Wolf
  0 siblings, 2 replies; 55+ messages in thread
From: Jes Sorensen @ 2011-07-20 13:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, libvir-list, Stefan Hajnoczi, QEMU Developers,
	Eric Blake

On 07/20/11 12:01, Kevin Wolf wrote:
>> > Right, we're stuck with the two horros of NFS and selinux, so we need
>> > something that gets around the problem. In a sane world we would simply
>> > say 'no NFS, no selinux', but as you say that will never happen.
>> > 
>> > My suggestion of a callback mechanism where libvirt registers the
>> > callback with QEMU for open() calls, allowing libvirt to perform the
>> > open and return the open file descriptor would get around this problem.
> To me this sounds more like a problem than a solution. It basically
> means that during an open (which may even be initiated by a monitor
> command), you need monitor interaction. It basically means that open
> becomes asynchronous, and requires clients to deal with that, which
> sounds at least "interesting"... Also you have to add some magic to all
> places opening something.
> 
> I think if libvirt wants qemu to use an fd instead of a file name, it
> shouldn't pass a file name but an fd in the first place. Which means
> that the two that we need are support for an fd: protocol (patches on
> the list, need review), and a way for libvirt to override the backing
> file of an image.

The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?

Jes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 13:25                       ` Jes Sorensen
@ 2011-07-20 13:46                         ` Eric Blake
  2011-07-20 17:27                           ` Blue Swirl
  2011-07-20 13:51                         ` Kevin Wolf
  1 sibling, 1 reply; 55+ messages in thread
From: Eric Blake @ 2011-07-20 13:46 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Stefan Hajnoczi,
	QEMU Developers

On 07/20/2011 07:25 AM, Jes Sorensen wrote:
>> I think if libvirt wants qemu to use an fd instead of a file name, it
>> shouldn't pass a file name but an fd in the first place. Which means
>> that the two that we need are support for an fd: protocol (patches on
>> the list, need review), and a way for libvirt to override the backing
>> file of an image.
>
> The problem is that QEMU will find backing file file names inside the
> images which it will be unable to open. How do you suggest we get around
> that?

We've already told you - qemu must have a way to be passed fds which are 
associated with names, and when a file refers to another backing file by 
name, then qemu falls back on its fd/name mapping to use the 
already-passed fd instead.  Which implies that someone else, either 
libvirt or a qemu-maintained libblockformat.so, needs to have a stable 
interface for parsing the backing file name out of an arbitrary qcow2 
file, and that this interface must work no matter how many other 
extensions are added to qcow2.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:14                 ` Anthony Liguori
  2011-07-20  8:25                   ` Jes Sorensen
@ 2011-07-20 13:50                   ` Cleber Rosa
  2011-07-20 14:34                     ` Anthony Liguori
  1 sibling, 1 reply; 55+ messages in thread
From: Cleber Rosa @ 2011-07-20 13:50 UTC (permalink / raw)
  To: qemu-devel

On 07/19/2011 12:14 PM, Anthony Liguori wrote:
> On 07/19/2011 09:30 AM, Jes Sorensen wrote:
>> On 07/19/11 16:24, Eric Blake wrote:
>>> [adding the libvir-list]
>>> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>>>> Urgh, libvirt parsing image files is really unfortunate, it really
>>>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>>>> about internals of image formats.
>>>
>>> But even if you add new features to qemu to avoid needing this in the
>>> future, it doesn't change the past - libvirt will always have to know
>>> how to parse image files understood by older qemu, and so as long as
>>> libvirt already knows how to do that parsing, we might as well take
>>> advantage of it.
>>
>> What has been done here in the past is plain wrong. Continuing to do it
>> isn't the right thing to do here.
>>
>>> Besides, I feel that having a well-documented file format, so that
>>> independent applications can both parse the same file with the same
>>> semantics by obeying the file format specification, is a good design 
>>> goal.
>>
>> We all know that documentation is rarely uptodate, new features may not
>> get added and libvirt will never be able to keep up. The driver for a
>> file format belongs in QEMU and nowhere else.
>>
>>
>>>>> It would be nice if libvirt had a way to pass fds for every disk and
>>>>> backing file up front; then, SELinux can work around the lack of NFS
>>>>> per-file labelling by blocking open() in qemu.  In fact, this has
>>>>> already been proposed:
>>>>
>>>> A cleaner solution seems to have libvirt provide a call-back allowing
>>>> QEMU to call out and have libvirt open a file descriptor instead. This
>>>> way libvirt can validate it and open it for QEMU and pass it back.
>>>
>>> Yes, that could probably be made to work with libvirt.
>>
>> I am a little frustrated this approach wasn't taken up front instead of
>> the evil hack of having libvirt attempt to parse image files.
>>
>>>> If we cannot do something like this, I would prefer to have backing
>>>> files on NFS should simply not be supported when running in an selinux
>>>> setup.
>>>
>>> As nice as that sentiment is, it will never fly, because it would be a
>>> regression in current behavior.  The whole reason that the virt_use_nfs
>>> SELinux bool exists is that some people are willing to make the partial
>>> security tradeoff.  Besides, the use of sVirt via SELinux is more than
>>> just open() protection - while the current virt_use_nfs bool makes NFS
>>> less secure than otherwise possible, it still gives some nice 
>>> guarantees
>>> to the rest of the qemu process such as passthrough accesses to local
>>> pci devices.
>>
>> Well leaving things at status quo is not making it worse, it just leaves
>> an evil in place.
>
> NFS and SELinux is a fundamental problem with SELinux and NFS.  We can 
> piss and moan as much as we want about it but it's reality.  SELinux 
> fundamentally requires extended attributes.  By the time NFS adds 
> extended attribute support, we'll all be flying around in hover cars.
>
> As terrible as NFS is, people use it all of the time.
>
> It would be nice if libvirt had the ability to make better use of DAC 
> to support isolation.  The fact that MAC is the only way you can do 
> isolation between guests is pretty unfortunate.  If I could assign 
> specific UIDs to a guest and use that to enforce isolation, it would 
> go a long ways to solving this problem.

Just as a reminder: with DAC, if a guest is compromised and somehow 
escalates to QEMU, it could disable its isolation (ie, by setting their 
own image files world readable). I guess we shouldn't try to fix the DAC 
model, but fix what's preventing us from fully using MAC, even though 
it's outside of QEMU.

CR.

>
> Regards,
>
> Anthony Liguori
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 13:25                       ` Jes Sorensen
  2011-07-20 13:46                         ` Eric Blake
@ 2011-07-20 13:51                         ` Kevin Wolf
  2011-07-20 17:20                           ` Blue Swirl
  2011-07-22  7:36                           ` Avi Kivity
  1 sibling, 2 replies; 55+ messages in thread
From: Kevin Wolf @ 2011-07-20 13:51 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Stefan Hajnoczi, libvir-list, Stefan Hajnoczi, QEMU Developers,
	Eric Blake

Am 20.07.2011 15:25, schrieb Jes Sorensen:
> On 07/20/11 12:01, Kevin Wolf wrote:
>>>> Right, we're stuck with the two horros of NFS and selinux, so we need
>>>> something that gets around the problem. In a sane world we would simply
>>>> say 'no NFS, no selinux', but as you say that will never happen.
>>>>
>>>> My suggestion of a callback mechanism where libvirt registers the
>>>> callback with QEMU for open() calls, allowing libvirt to perform the
>>>> open and return the open file descriptor would get around this problem.
>> To me this sounds more like a problem than a solution. It basically
>> means that during an open (which may even be initiated by a monitor
>> command), you need monitor interaction. It basically means that open
>> becomes asynchronous, and requires clients to deal with that, which
>> sounds at least "interesting"... Also you have to add some magic to all
>> places opening something.
>>
>> I think if libvirt wants qemu to use an fd instead of a file name, it
>> shouldn't pass a file name but an fd in the first place. Which means
>> that the two that we need are support for an fd: protocol (patches on
>> the list, need review), and a way for libvirt to override the backing
>> file of an image.
> 
> The problem is that QEMU will find backing file file names inside the
> images which it will be unable to open. How do you suggest we get around
> that?

This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.

Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 13:50                   ` Cleber Rosa
@ 2011-07-20 14:34                     ` Anthony Liguori
  2011-07-20 18:34                       ` Cleber Rosa
  0 siblings, 1 reply; 55+ messages in thread
From: Anthony Liguori @ 2011-07-20 14:34 UTC (permalink / raw)
  To: Cleber Rosa; +Cc: qemu-devel

On 07/20/2011 08:50 AM, Cleber Rosa wrote:
> Just as a reminder: with DAC, if a guest is compromised and somehow
> escalates to QEMU, it could disable its isolation (ie, by setting their
> own image files world readable). I guess we shouldn't try to fix the DAC
> model, but fix what's preventing us from fully using MAC, even though
> it's outside of QEMU.

I don't see how a guest making its data world readable is a fundamental 
problem.

DAC is a fundamental part of the Unix design and is something that 
administrators understand very well.  I completely understand the value 
of MAC but to argue that we shouldn't present DAC as an option I think 
is fundamentally wrong.

Regards,

Anthony Liguori

>
> CR.
>
>>
>> Regards,
>>
>> Anthony Liguori
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:47                 ` Daniel P. Berrange
  2011-07-20  8:26                   ` Jes Sorensen
@ 2011-07-20 14:35                   ` Anthony Liguori
  2011-07-21 18:56                   ` Michael Roth
  2 siblings, 0 replies; 55+ messages in thread
From: Anthony Liguori @ 2011-07-20 14:35 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Stefan Hajnoczi, Eric Blake

On 07/19/2011 11:47 AM, Daniel P. Berrange wrote:
> This would be possible if QEMU to provide a libblockformat.so library
> which allowed apps to extract metadata from file formats using a stable
> API.

I'm in 100% agreement that we need to provide the equivalent of a 
libblockformat.so down the road.

But the block layer needs some work before we can support a stable 
interface.

Regards,

Anthony Liguori

>
> Daniel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 13:51                         ` Kevin Wolf
@ 2011-07-20 17:20                           ` Blue Swirl
  2011-07-20 17:41                             ` Eric Blake
       [not found]                             ` <4E27E280.2060306@redhat.com>
  2011-07-22  7:36                           ` Avi Kivity
  1 sibling, 2 replies; 55+ messages in thread
From: Blue Swirl @ 2011-07-20 17:20 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Stefan Hajnoczi, Eric Blake

On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 20.07.2011 15:25, schrieb Jes Sorensen:
>> On 07/20/11 12:01, Kevin Wolf wrote:
>>>>> Right, we're stuck with the two horros of NFS and selinux, so we need
>>>>> something that gets around the problem. In a sane world we would simply
>>>>> say 'no NFS, no selinux', but as you say that will never happen.
>>>>>
>>>>> My suggestion of a callback mechanism where libvirt registers the
>>>>> callback with QEMU for open() calls, allowing libvirt to perform the
>>>>> open and return the open file descriptor would get around this problem.
>>> To me this sounds more like a problem than a solution. It basically
>>> means that during an open (which may even be initiated by a monitor
>>> command), you need monitor interaction. It basically means that open
>>> becomes asynchronous, and requires clients to deal with that, which
>>> sounds at least "interesting"... Also you have to add some magic to all
>>> places opening something.
>>>
>>> I think if libvirt wants qemu to use an fd instead of a file name, it
>>> shouldn't pass a file name but an fd in the first place. Which means
>>> that the two that we need are support for an fd: protocol (patches on
>>> the list, need review), and a way for libvirt to override the backing
>>> file of an image.
>>
>> The problem is that QEMU will find backing file file names inside the
>> images which it will be unable to open. How do you suggest we get around
>> that?
>
> This is the part with allowing libvirt to override the backing file. Of
> course, this is not something that we can add with five lines of code,
> it requires -blockdev.

There could still be some issues:
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?

This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C, but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 13:46                         ` Eric Blake
@ 2011-07-20 17:27                           ` Blue Swirl
  2011-07-20 17:47                             ` Eric Blake
  0 siblings, 1 reply; 55+ messages in thread
From: Blue Swirl @ 2011-07-20 17:27 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On Wed, Jul 20, 2011 at 4:46 PM, Eric Blake <eblake@redhat.com> wrote:
> On 07/20/2011 07:25 AM, Jes Sorensen wrote:
>>>
>>> I think if libvirt wants qemu to use an fd instead of a file name, it
>>> shouldn't pass a file name but an fd in the first place. Which means
>>> that the two that we need are support for an fd: protocol (patches on
>>> the list, need review), and a way for libvirt to override the backing
>>> file of an image.
>>
>> The problem is that QEMU will find backing file file names inside the
>> images which it will be unable to open. How do you suggest we get around
>> that?
>
> We've already told you - qemu must have a way to be passed fds which are
> associated with names, and when a file refers to another backing file by
> name, then qemu falls back on its fd/name mapping to use the already-passed
> fd instead.  Which implies that someone else, either libvirt or a
> qemu-maintained libblockformat.so, needs to have a stable interface for
> parsing the backing file name out of an arbitrary qcow2 file, and that this
> interface must work no matter how many other extensions are added to qcow2.

I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 17:20                           ` Blue Swirl
@ 2011-07-20 17:41                             ` Eric Blake
  2011-07-20 18:00                               ` Blue Swirl
       [not found]                             ` <4E27E280.2060306@redhat.com>
  1 sibling, 1 reply; 55+ messages in thread
From: Eric Blake @ 2011-07-20 17:41 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On 07/20/2011 11:20 AM, Blue Swirl wrote:
> There could still be some issues:
> Let's have files A, B, C etc. with backing files AA etc. How would
> libvirt know that when QEMU wants to write to file CA, this is because
> it's needed to access C, or is it just trickery by a devious guest to
> corrupt storage?

The fix for CVE-2010-2238 already deals with this: if primary image C 
refers to backing file CA of raw format, but does not state what file 
format CA contains, then a malicious guest can modify the contents of CA 
to appear to be yet another qcow2 image.  At which point, if libvirt 
follows the backing file specified in CA, then yes, the malicious guest 
really can cause libvirt to expose arbitrary file CB for manipulation by 
the guest.  But that security hole was already plugged - by default, 
libvirt refuses to probe backing files parsed from qcow2 headers for 
file format, but instead requires the outer qcow2 header to also include 
the a file format designation for the backing file.  At which point, you 
then have a safe chain: if C refers to CA, then libvirt knows that both 
C and CA are essential to the storage presented by giving qemu the file 
name C, and the guest will already be modifying CA, but there is no 
storage corruption involved.

That is, as long as libvirt can already accurately read the chain of 
backing files from any starting point, then it can hand that entire 
chain of backing files (whether by the topmost file name as it does now, 
or whether by a series of fds as is being proposed) to qemu.

>
> This could be handled so that instead of naming the backing file, QEMU
> asks for a descriptor for the backing file by presenting the
> descriptor to main file C, but I think the real solution is that
> libvirt should handle the storage formats completely and it should
> present QEMU with only a raw file like interface (read/write/seek) for
> the data. Then any backing files would be handled within libvirt.
> Performance could suffer, though.

The monitor interface was not designed to throw the 
read()/write()/seek() burden back on libvirt, and indeed that would kill 
performance so it is a non-starter idea.  All we need for security is 
the open() burden to be shifted out of qemu and into libvirt.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 17:27                           ` Blue Swirl
@ 2011-07-20 17:47                             ` Eric Blake
  2011-07-20 19:51                               ` Blue Swirl
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Blake @ 2011-07-20 17:47 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On 07/20/2011 11:27 AM, Blue Swirl wrote:
>> We've already told you - qemu must have a way to be passed fds which are
>> associated with names, and when a file refers to another backing file by
>> name, then qemu falls back on its fd/name mapping to use the already-passed
>> fd instead.  Which implies that someone else, either libvirt or a
>> qemu-maintained libblockformat.so, needs to have a stable interface for
>> parsing the backing file name out of an arbitrary qcow2 file, and that this
>> interface must work no matter how many other extensions are added to qcow2.
>
> I'd avoid any name based access in this case. If QEMU has write access
> to main file, it could forge the backing file name in main file to
> point to for example /etc/shadow and then request libvirt to perform
> the opening.

Won't work.  Well, it might work within the context of a single qemu 
process.  But when that process ends, then libvirt would have to touch 
up the qcow2 headers of that file to replace the /etc/shadow name with 
the real backing file name, otherwise, the next time you restart 
qemu-img or a new qemu guest using the same image, the information has 
been lost, since the fd has been closed in the meantime.

We really _do_ need a way to give qemu both an fd and the name of the 
file that the fd is tied to.  On Linux, qemu could use /proc/self/fd to 
reconstruct the name from fd, but that's not portable to other OS.  And 
we've already discussed how in the libvirt model, that libvirt is deemed 
more secure than qemu.  Therefore, I think it is reasonable for qemu to 
make the assumptions that if it exposes a monitor command where the 
supervisor (libvirt or otherwise) can pass in both an fd and a file 
name, that either the supervisor is passing in correct information, or 
that the bug is in the supervisor and not in qemu if the supervisor 
passes in wrong information and things blow up.

And the snapshot_blkdev monitor command is a case where qemu needs to 
create a new qcow2 image on the fly, while referencing the name of an 
existing file.  What backing name do you put in the new qcow2 file 
unless you already have a name association for all fds already open for 
the existing backing file?

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 17:41                             ` Eric Blake
@ 2011-07-20 18:00                               ` Blue Swirl
  2011-07-20 18:17                                 ` Eric Blake
  0 siblings, 1 reply; 55+ messages in thread
From: Blue Swirl @ 2011-07-20 18:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On Wed, Jul 20, 2011 at 8:41 PM, Eric Blake <eblake@redhat.com> wrote:
> On 07/20/2011 11:20 AM, Blue Swirl wrote:
>>
>> There could still be some issues:
>> Let's have files A, B, C etc. with backing files AA etc. How would
>> libvirt know that when QEMU wants to write to file CA, this is because
>> it's needed to access C, or is it just trickery by a devious guest to
>> corrupt storage?
>
> The fix for CVE-2010-2238 already deals with this: if primary image C refers
> to backing file CA of raw format, but does not state what file format CA
> contains, then a malicious guest can modify the contents of CA to appear to
> be yet another qcow2 image.  At which point, if libvirt follows the backing
> file specified in CA, then yes, the malicious guest really can cause libvirt
> to expose arbitrary file CB for manipulation by the guest.  But that
> security hole was already plugged - by default, libvirt refuses to probe
> backing files parsed from qcow2 headers for file format, but instead
> requires the outer qcow2 header to also include the a file format
> designation for the backing file.  At which point, you then have a safe
> chain: if C refers to CA, then libvirt knows that both C and CA are
> essential to the storage presented by giving qemu the file name C, and the
> guest will already be modifying CA, but there is no storage corruption
> involved.

But what if CA is accessed even if C is not? For example, QEMU opens C
(to determine CA and write new information about the path), closes it
and then requests CA?

> That is, as long as libvirt can already accurately read the chain of backing
> files from any starting point, then it can hand that entire chain of backing
> files (whether by the topmost file name as it does now, or whether by a
> series of fds as is being proposed) to qemu.
>
>>
>> This could be handled so that instead of naming the backing file, QEMU
>> asks for a descriptor for the backing file by presenting the
>> descriptor to main file C, but I think the real solution is that
>> libvirt should handle the storage formats completely and it should
>> present QEMU with only a raw file like interface (read/write/seek) for
>> the data. Then any backing files would be handled within libvirt.
>> Performance could suffer, though.
>
> The monitor interface was not designed to throw the read()/write()/seek()
> burden back on libvirt, and indeed that would kill performance so it is a
> non-starter idea.  All we need for security is the open() burden to be
> shifted out of qemu and into libvirt.

Obviously the interface should be faster than monitor, for example a
pair of sockets with some efficient protocol. Monitor could still be
used to set up these.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 18:00                               ` Blue Swirl
@ 2011-07-20 18:17                                 ` Eric Blake
  2011-07-20 20:01                                   ` Blue Swirl
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Blake @ 2011-07-20 18:17 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On 07/20/2011 12:00 PM, Blue Swirl wrote:
>>> Let's have files A, B, C etc. with backing files AA etc. How would
>>> libvirt know that when QEMU wants to write to file CA, this is because
>>> it's needed to access C, or is it just trickery by a devious guest to
>>> corrupt storage?
>>
>> The fix for CVE-2010-2238 already deals with this: if primary image C refers
>> to backing file CA of raw format, but does not state what file format CA
>> contains, then a malicious guest can modify the contents of CA to appear to
>> be yet another qcow2 image.  At which point, if libvirt follows the backing
>> file specified in CA, then yes, the malicious guest really can cause libvirt
>> to expose arbitrary file CB for manipulation by the guest.  But that
>> security hole was already plugged - by default, libvirt refuses to probe
>> backing files parsed from qcow2 headers for file format, but instead
>> requires the outer qcow2 header to also include the a file format
>> designation for the backing file.  At which point, you then have a safe
>> chain: if C refers to CA, then libvirt knows that both C and CA are
>> essential to the storage presented by giving qemu the file name C, and the
>> guest will already be modifying CA, but there is no storage corruption
>> involved.
>
> But what if CA is accessed even if C is not? For example, QEMU opens C
> (to determine CA and write new information about the path), closes it
> and then requests CA?

Why is qemu trying to access CA?

Either because CA was mentioned as a backing file for C (in which case 
libvirt already knows about it, because either libvirt handed C to qemu 
at startup time after already parsing C's headers to learn that CA is a 
backing file, or because libvirt called the snapshot_blkdev command with 
the intent of having qemu populate CA with C as its backing file), or 
because qemu has a bug (in which case, libvirt should refuse the access 
to CA).

Libvirt is already perfectly capable of tracking all files that qemu 
might need to access, and whether it is qemu or libvirt that does the 
open() of those files, we can still have libvirt validate whether each 
request for a file makes sense given the context of all previous files 
in use from the time the qemu command line was invoked and across all 
monitor commands in the meantime.

On non-NFS solutions, where every file can have a SELinux label, then 
the security is then present by merely having libvirt relabel all such 
files to a unique label for that particular qemu process, and SELinux 
merely enforces that qemu cannot open() anything but what libvirt has 
already labeled.  And since libvirt already knows which files to label 
in the non-NFS scenario, it already knows which fds to pass in the NFS 
scenario, at which point the ability to prevent qemu from open()ing an 
NFS file is a security enhancement.

Your question about qemu wanting to use CA is thus answered 
independently of whether the fd management solution is solved by libvirt 
handing an fd for CA to qemu prior to any monitor command where qemu 
will then need to use CA, or whether qemu is taught to asynchronously 
ask libvirt to open an fd for CA on qemu's behalf.  The answer is that 
libvirt already tracks whether qemu should access CA, and just needs a 
way to enforce that knowledge.  The enforcement already exists for 
non-NFS via SELinux labels, and the proposal to add fd handling will 
expand that enforcement to also cover NFS.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 14:34                     ` Anthony Liguori
@ 2011-07-20 18:34                       ` Cleber Rosa
  0 siblings, 0 replies; 55+ messages in thread
From: Cleber Rosa @ 2011-07-20 18:34 UTC (permalink / raw)
  To: qemu-devel

On 07/20/2011 10:34 AM, Anthony Liguori wrote:
> On 07/20/2011 08:50 AM, Cleber Rosa wrote:
>> Just as a reminder: with DAC, if a guest is compromised and somehow
>> escalates to QEMU, it could disable its isolation (ie, by setting their
>> own image files world readable). I guess we shouldn't try to fix the DAC
>> model, but fix what's preventing us from fully using MAC, even though
>> it's outside of QEMU.
>
> I don't see how a guest making its data world readable is a 
> fundamental problem.

Well, if we're discussing security models and how to provide the best 
isolation we can to VMs/QEMU instances, then a VM being able to read (or 
even write) data of another VM *is* a fundamental problem. "setting 
their own imagine files world readable" is just one example of how that 
could be accomplished.

>
> DAC is a fundamental part of the Unix design and is something that 
> administrators understand very well.

That's is a true sentence, but it does not make DAC the most appropriate 
solution here.

>   I completely understand the value of MAC but to argue that we 
> shouldn't present DAC as an option I think is fundamentally wrong.

I never said, and really don't think we shouldn't provide other security 
options/models, this is actually part of the well accepted "security in 
multiple layers" strategy.

I did assume, though, we were aiming for the best isolation level, and 
that is definitely MAC. DAC may indeed be good enough for some, but 
definitely not good enough for many others.

CR.

>
> Regards,
>
> Anthony Liguori
>
>>
>> CR.
>>
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 17:47                             ` Eric Blake
@ 2011-07-20 19:51                               ` Blue Swirl
       [not found]                                 ` <4E27DE5D.5050502@redhat.com>
  0 siblings, 1 reply; 55+ messages in thread
From: Blue Swirl @ 2011-07-20 19:51 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On Wed, Jul 20, 2011 at 8:47 PM, Eric Blake <eblake@redhat.com> wrote:
> On 07/20/2011 11:27 AM, Blue Swirl wrote:
>>>
>>> We've already told you - qemu must have a way to be passed fds which are
>>> associated with names, and when a file refers to another backing file by
>>> name, then qemu falls back on its fd/name mapping to use the
>>> already-passed
>>> fd instead.  Which implies that someone else, either libvirt or a
>>> qemu-maintained libblockformat.so, needs to have a stable interface for
>>> parsing the backing file name out of an arbitrary qcow2 file, and that
>>> this
>>> interface must work no matter how many other extensions are added to
>>> qcow2.
>>
>> I'd avoid any name based access in this case. If QEMU has write access
>> to main file, it could forge the backing file name in main file to
>> point to for example /etc/shadow and then request libvirt to perform
>> the opening.
>
> Won't work.  Well, it might work within the context of a single qemu
> process.  But when that process ends, then libvirt would have to touch up
> the qcow2 headers of that file to replace the /etc/shadow name with the real
> backing file name, otherwise, the next time you restart qemu-img or a new
> qemu guest using the same image, the information has been lost, since the fd
> has been closed in the meantime.

How would libvirt know to do this touch up?

> We really _do_ need a way to give qemu both an fd and the name of the file
> that the fd is tied to.  On Linux, qemu could use /proc/self/fd to
> reconstruct the name from fd, but that's not portable to other OS.  And
> we've already discussed how in the libvirt model, that libvirt is deemed
> more secure than qemu.  Therefore, I think it is reasonable for qemu to make
> the assumptions that if it exposes a monitor command where the supervisor
> (libvirt or otherwise) can pass in both an fd and a file name, that either
> the supervisor is passing in correct information, or that the bug is in the
> supervisor and not in qemu if the supervisor passes in wrong information and
> things blow up.

Yes, I'm not worried about QEMU getting confused by libvirt.

> And the snapshot_blkdev monitor command is a case where qemu needs to create
> a new qcow2 image on the fly, while referencing the name of an existing
> file.  What backing name do you put in the new qcow2 file unless you already
> have a name association for all fds already open for the existing backing
> file?

For backing file with original name of "/path/in/storage", QEMU could
present the fd and the backin path by requesting something like
"fd:12,/path/in/storage". The next file in chain "/path2/file" would
be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing
/path/in/storage with spaces and funny characters escaped etc.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 18:17                                 ` Eric Blake
@ 2011-07-20 20:01                                   ` Blue Swirl
  2011-07-20 20:10                                     ` Eric Blake
  0 siblings, 1 reply; 55+ messages in thread
From: Blue Swirl @ 2011-07-20 20:01 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On Wed, Jul 20, 2011 at 9:17 PM, Eric Blake <eblake@redhat.com> wrote:
> On 07/20/2011 12:00 PM, Blue Swirl wrote:
>>>>
>>>> Let's have files A, B, C etc. with backing files AA etc. How would
>>>> libvirt know that when QEMU wants to write to file CA, this is because
>>>> it's needed to access C, or is it just trickery by a devious guest to
>>>> corrupt storage?
>>>
>>> The fix for CVE-2010-2238 already deals with this: if primary image C
>>> refers
>>> to backing file CA of raw format, but does not state what file format CA
>>> contains, then a malicious guest can modify the contents of CA to appear
>>> to
>>> be yet another qcow2 image.  At which point, if libvirt follows the
>>> backing
>>> file specified in CA, then yes, the malicious guest really can cause
>>> libvirt
>>> to expose arbitrary file CB for manipulation by the guest.  But that
>>> security hole was already plugged - by default, libvirt refuses to probe
>>> backing files parsed from qcow2 headers for file format, but instead
>>> requires the outer qcow2 header to also include the a file format
>>> designation for the backing file.  At which point, you then have a safe
>>> chain: if C refers to CA, then libvirt knows that both C and CA are
>>> essential to the storage presented by giving qemu the file name C, and
>>> the
>>> guest will already be modifying CA, but there is no storage corruption
>>> involved.
>>
>> But what if CA is accessed even if C is not? For example, QEMU opens C
>> (to determine CA and write new information about the path), closes it
>> and then requests CA?
>
> Why is qemu trying to access CA?
>
> Either because CA was mentioned as a backing file for C (in which case
> libvirt already knows about it, because either libvirt handed C to qemu at
> startup time after already parsing C's headers to learn that CA is a backing
> file, or because libvirt called the snapshot_blkdev command with the intent
> of having qemu populate CA with C as its backing file), or because qemu has
> a bug (in which case, libvirt should refuse the access to CA).

So no new backing files can be introduced by QEMU after it has started
without libvirt knowing it?

> Libvirt is already perfectly capable of tracking all files that qemu might
> need to access, and whether it is qemu or libvirt that does the open() of
> those files, we can still have libvirt validate whether each request for a
> file makes sense given the context of all previous files in use from the
> time the qemu command line was invoked and across all monitor commands in
> the meantime.
>
> On non-NFS solutions, where every file can have a SELinux label, then the
> security is then present by merely having libvirt relabel all such files to
> a unique label for that particular qemu process, and SELinux merely enforces
> that qemu cannot open() anything but what libvirt has already labeled.  And
> since libvirt already knows which files to label in the non-NFS scenario, it
> already knows which fds to pass in the NFS scenario, at which point the
> ability to prevent qemu from open()ing an NFS file is a security
> enhancement.
>
> Your question about qemu wanting to use CA is thus answered independently of
> whether the fd management solution is solved by libvirt handing an fd for CA
> to qemu prior to any monitor command where qemu will then need to use CA, or
> whether qemu is taught to asynchronously ask libvirt to open an fd for CA on
> qemu's behalf.  The answer is that libvirt already tracks whether qemu
> should access CA, and just needs a way to enforce that knowledge.  The
> enforcement already exists for non-NFS via SELinux labels, and the proposal
> to add fd handling will expand that enforcement to also cover NFS.

OK. I think fds would be useful internally in a privilege separation
mode in plain QEMU too.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 20:01                                   ` Blue Swirl
@ 2011-07-20 20:10                                     ` Eric Blake
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Blake @ 2011-07-20 20:10 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Stefan Hajnoczi

On 07/20/2011 02:01 PM, Blue Swirl wrote:
>> Either because CA was mentioned as a backing file for C (in which case
>> libvirt already knows about it, because either libvirt handed C to qemu at
>> startup time after already parsing C's headers to learn that CA is a backing
>> file, or because libvirt called the snapshot_blkdev command with the intent
>> of having qemu populate CA with C as its backing file), or because qemu has
>> a bug (in which case, libvirt should refuse the access to CA).
>
> So no new backing files can be introduced by QEMU after it has started
> without libvirt knowing it?

No, you missed my point.  A new backing file can only be introduced by 
qemu after it has started by libvirt using a finite set of monitor 
commands.  These include disk hotplug (libvirt adds to the list of files 
known to be accessed by qemu, by reading the image headers of the new 
disk to be hot-plugged prior to issuing the monitor command), by disk 
hot-unplug (libvirt revokes the access to the files making up that disk, 
which it remembers from before the disk was added), and snapshot_blkdev 
(libvirt is explicitly requesting a new qcow2 file with the old file as 
the backing image, so it knows the new relationship of files to be 
accessed by that block device).  Since libvirt issued the monitor 
commands, libvirt always knows which files qemu should be trying to 
access when servicing those block devices to the guest.

>
> OK. I think fds would be useful internally in a privilege separation
> mode in plain QEMU too.

Yes, there's more than one reason to add fd support to all possible 
situations where qemu is currently resorting to open().

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [libvirt]  live snapshot wiki updated
       [not found]                                 ` <4E283554.4080903@redhat.com>
@ 2011-07-21 14:51                                   ` Eric Blake
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Blake @ 2011-07-21 14:51 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: libvir-list, QEMU Developers, Stefan Hajnoczi

On 07/21/2011 08:19 AM, Jes Sorensen wrote:
> There is a difference between upgrading to pick up additional features
> and forced upgrade.
>
> It is not just a question of libvirt possibly reviewing a file format,
> it is also having two codebases that needs to be implemented and maintained.

Then give us a libblockid.so that provides a stable interface, and which 
is also used by qemu to manage QED, qcow3, ..., so that libvirt can rely 
on the stable library interface rather than having to learn new file 
formats all the time.

But yes, we already had to upgrade libvirt code to parse qed formats, 
and anyone wanting to use qed via libvirt has to do a lockstep upgrade 
of both qemu and libvirt, because no one has yet written a nice shared 
library that can do it on our behalf through a stable API.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
       [not found]                           ` <4E28317D.9020502@redhat.com>
@ 2011-07-21 15:01                             ` Stefan Hajnoczi
  2011-07-21 19:42                               ` Blue Swirl
  2011-07-22  7:22                               ` Kevin Wolf
  0 siblings, 2 replies; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-21 15:01 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, Jes Sorensen, QEMU Developers, Stefan Hajnoczi

On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
> Thank you for persisting - you've found another hole that needs to be
> plugged.  It sounds like you are proposing that after a qemu process dies,
> that libvirt re-reads the qcow2 metadata headers, and validates that the
> backing file information has not changed in a manner unexpected by libvirt.
>  If it has, then the qemu process that just died was compromised to the
> point that restarting a new qemu process from the old image is now a
> security risk.  So this is _yet another_ security aspect that needs to be
> coded into libvirt as part of hardening sVirt.

The backing file information changes when image streaming completes.

Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)

The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.

I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
Today we seem to be sprinkling a little bit of storage management into
QEMU and a little bit into libvirt :(.

In that spirit it is much nicer to think of storage like a SAN
appliance where you have LUNs that you access as block devices.  It
also provides an API for snapshotting, cloning LUNs, etc.

Let's move to that model instead of worrying about how to spread
storage logic across QEMU and libvirt.

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-19 16:47                 ` Daniel P. Berrange
  2011-07-20  8:26                   ` Jes Sorensen
  2011-07-20 14:35                   ` Anthony Liguori
@ 2011-07-21 18:56                   ` Michael Roth
  2 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2011-07-21 18:56 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Stefan Hajnoczi, Eric Blake

On 07/19/2011 11:47 AM, Daniel P. Berrange wrote:
> On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
>> On 07/19/11 16:24, Eric Blake wrote:
>>> [adding the libvir-list]
>>> On 07/19/2011 08:09 AM, Jes Sorensen wrote:
>>>> Urgh, libvirt parsing image files is really unfortunate, it really
>>>> doesn't give me warm fuzzy feelings :( libvirt really should not know
>>>> about internals of image formats.
>>>
>>> But even if you add new features to qemu to avoid needing this in the
>>> future, it doesn't change the past - libvirt will always have to know
>>> how to parse image files understood by older qemu, and so as long as
>>> libvirt already knows how to do that parsing, we might as well take
>>> advantage of it.
>>
>> What has been done here in the past is plain wrong. Continuing to do it
>> isn't the right thing to do here.
>>
>>> Besides, I feel that having a well-documented file format, so that
>>> independent applications can both parse the same file with the same
>>> semantics by obeying the file format specification, is a good design goal.
>>
>> We all know that documentation is rarely uptodate, new features may not
>> get added and libvirt will never be able to keep up. The driver for a
>> file format belongs in QEMU and nowhere else.
>
> This would be possible if QEMU to provide a libblockformat.so library
> which allowed apps to extract metadata from file formats using a stable
> API.

How wrong would it be to call out to qemu-img to handle this instead? 
Seems like a more stable interface (assuming the output of `qemu-img 
info` is treated as an API of sorts, or perhaps some other output mode 
is added to qemu-img that is considered stable).

>
> Daniel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
       [not found]                             ` <4E27E280.2060306@redhat.com>
@ 2011-07-21 19:01                               ` Blue Swirl
  0 siblings, 0 replies; 55+ messages in thread
From: Blue Swirl @ 2011-07-21 19:01 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Stefan Hajnoczi, Eric Blake

On Thu, Jul 21, 2011 at 11:25 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 20.07.2011 19:20, schrieb Blue Swirl:
>> On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 20.07.2011 15:25, schrieb Jes Sorensen:
>>>> On 07/20/11 12:01, Kevin Wolf wrote:
>>>>>>> Right, we're stuck with the two horros of NFS and selinux, so we need
>>>>>>> something that gets around the problem. In a sane world we would simply
>>>>>>> say 'no NFS, no selinux', but as you say that will never happen.
>>>>>>>
>>>>>>> My suggestion of a callback mechanism where libvirt registers the
>>>>>>> callback with QEMU for open() calls, allowing libvirt to perform the
>>>>>>> open and return the open file descriptor would get around this problem.
>>>>> To me this sounds more like a problem than a solution. It basically
>>>>> means that during an open (which may even be initiated by a monitor
>>>>> command), you need monitor interaction. It basically means that open
>>>>> becomes asynchronous, and requires clients to deal with that, which
>>>>> sounds at least "interesting"... Also you have to add some magic to all
>>>>> places opening something.
>>>>>
>>>>> I think if libvirt wants qemu to use an fd instead of a file name, it
>>>>> shouldn't pass a file name but an fd in the first place. Which means
>>>>> that the two that we need are support for an fd: protocol (patches on
>>>>> the list, need review), and a way for libvirt to override the backing
>>>>> file of an image.
>>>>
>>>> The problem is that QEMU will find backing file file names inside the
>>>> images which it will be unable to open. How do you suggest we get around
>>>> that?
>>>
>>> This is the part with allowing libvirt to override the backing file. Of
>>> course, this is not something that we can add with five lines of code,
>>> it requires -blockdev.
>>
>> There could still be some issues:
>> Let's have files A, B, C etc. with backing files AA etc. How would
>> libvirt know that when QEMU wants to write to file CA, this is because
>> it's needed to access C, or is it just trickery by a devious guest to
>> corrupt storage?
>>
>> This could be handled so that instead of naming the backing file, QEMU
>> asks for a descriptor for the backing file by presenting the
>> descriptor to main file C,
>
> qemu shouldn't ask for anything. libvirt shouldn't give it a filename in
> the first place. It should do something like this:
>
> { "execute": "blockdev_add", "arguments"= {
>  "driver": "fd," "fd": "4", "backing-file": {
>    "driver": "fd," "fd": "5"
>  }
> }}
>
> And then qemu doesn't even have a reason to know that there is something
> called CA.

Yes, that's better.

>> but I think the real solution is that
>> libvirt should handle the storage formats completely and it should
>> present QEMU with only a raw file like interface (read/write/seek) for
>> the data. Then any backing files would be handled within libvirt.
>> Performance could suffer, though.
>
> I like your humour. :-)

Well, for some applications, security is more important than
performance or convenience.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
       [not found]                                 ` <4E27DE5D.5050502@redhat.com>
@ 2011-07-21 19:34                                   ` Blue Swirl
  0 siblings, 0 replies; 55+ messages in thread
From: Blue Swirl @ 2011-07-21 19:34 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Stefan Hajnoczi,
	QEMU Developers, Eric Blake

On Thu, Jul 21, 2011 at 11:07 AM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> On 07/20/11 21:51, Blue Swirl wrote:
>>> And the snapshot_blkdev monitor command is a case where qemu needs to create
>>> > a new qcow2 image on the fly, while referencing the name of an existing
>>> > file.  What backing name do you put in the new qcow2 file unless you already
>>> > have a name association for all fds already open for the existing backing
>>> > file?
>> For backing file with original name of "/path/in/storage", QEMU could
>> present the fd and the backin path by requesting something like
>> "fd:12,/path/in/storage". The next file in chain "/path2/file" would
>> be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing
>> /path/in/storage with spaces and funny characters escaped etc.
>
> Rather than trying to do this by mangling files on the disk, I would
> suggest we allow registering a call-back open function, which calls back
> into libvirt and requests it to open a given file. It can then do all
> it's security foo to decide whether or not to allow the file to be open.

Just to clarify: I was not proposing any mangling of the files.

> This is relatively clean and avoids the mess of relying on outside
> processes messing about in the images.
>
> Cheers,
> Jes
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-21 15:01                             ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-07-21 19:42                               ` Blue Swirl
  2011-07-22  5:06                                 ` Stefan Hajnoczi
  2011-07-22  7:22                               ` Kevin Wolf
  1 sibling, 1 reply; 55+ messages in thread
From: Blue Swirl @ 2011-07-21 19:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: libvir-list, Jes Sorensen, Eric Blake, QEMU Developers, Stefan Hajnoczi

On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
>> Thank you for persisting - you've found another hole that needs to be
>> plugged.  It sounds like you are proposing that after a qemu process dies,
>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>> backing file information has not changed in a manner unexpected by libvirt.
>>  If it has, then the qemu process that just died was compromised to the
>> point that restarting a new qemu process from the old image is now a
>> security risk.  So this is _yet another_ security aspect that needs to be
>> coded into libvirt as part of hardening sVirt.
>
> The backing file information changes when image streaming completes.
>
> Before: fedora.img <- my_vm.qed
> After: my_vm.qed (fedora.img is no longer referenced)
>
> The image streaming operation copies data out of fedora.img and
> populates my_vm.qed.  When image streaming completes, the backing file
> is no longer needed and my_vm.qed is updated to drop the backing file.
>
> I think we need to design carefully to prevent QEMU and libvirt making
> incorrect assumptions about who does what.  I really wish that all
> this image file business was outside QEMU and libvirt - that we had a
> separate storage management service which handled the details.  QEMU
> would only do block device operations (no image format manipulation),
> and libvirt would only delegate to the storage management service.
> Today we seem to be sprinkling a little bit of storage management into
> QEMU and a little bit into libvirt :(.
>
> In that spirit it is much nicer to think of storage like a SAN
> appliance where you have LUNs that you access as block devices.  It
> also provides an API for snapshotting, cloning LUNs, etc.
>
> Let's move to that model instead of worrying about how to spread
> storage logic across QEMU and libvirt.

Would NBD protocol fit to this purpose, or is it too simple? Then
libvirt would handle the storage format completely and present an NBD
interface to QEMU (or give an fd to an external service) and QEMU
would not care about the storage format in this mode at all.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-21 19:42                               ` Blue Swirl
@ 2011-07-22  5:06                                 ` Stefan Hajnoczi
  2011-07-22 15:49                                   ` Blue Swirl
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-22  5:06 UTC (permalink / raw)
  To: Blue Swirl
  Cc: libvir-list, Jes Sorensen, Eric Blake, QEMU Developers, Stefan Hajnoczi

On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
>>> Thank you for persisting - you've found another hole that needs to be
>>> plugged.  It sounds like you are proposing that after a qemu process dies,
>>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>>> backing file information has not changed in a manner unexpected by libvirt.
>>>  If it has, then the qemu process that just died was compromised to the
>>> point that restarting a new qemu process from the old image is now a
>>> security risk.  So this is _yet another_ security aspect that needs to be
>>> coded into libvirt as part of hardening sVirt.
>>
>> The backing file information changes when image streaming completes.
>>
>> Before: fedora.img <- my_vm.qed
>> After: my_vm.qed (fedora.img is no longer referenced)
>>
>> The image streaming operation copies data out of fedora.img and
>> populates my_vm.qed.  When image streaming completes, the backing file
>> is no longer needed and my_vm.qed is updated to drop the backing file.
>>
>> I think we need to design carefully to prevent QEMU and libvirt making
>> incorrect assumptions about who does what.  I really wish that all
>> this image file business was outside QEMU and libvirt - that we had a
>> separate storage management service which handled the details.  QEMU
>> would only do block device operations (no image format manipulation),
>> and libvirt would only delegate to the storage management service.
>> Today we seem to be sprinkling a little bit of storage management into
>> QEMU and a little bit into libvirt :(.
>>
>> In that spirit it is much nicer to think of storage like a SAN
>> appliance where you have LUNs that you access as block devices.  It
>> also provides an API for snapshotting, cloning LUNs, etc.
>>
>> Let's move to that model instead of worrying about how to spread
>> storage logic across QEMU and libvirt.
>
> Would NBD protocol fit to this purpose, or is it too simple? Then
> libvirt would handle the storage format completely and present an NBD
> interface to QEMU (or give an fd to an external service) and QEMU
> would not care about the storage format in this mode at all.

NBD does not support flush (fdatasync).  Therefore it only supports
the slow cache=writethrough mode in a safe manner.

It would be neat to use virtio-blk as the interface because it can be
passed through to the guest.  The guest talks directly to the storage
management service without going through QEMU.  The trick is to do
something like vhost:
1. An ioeventfd for virtqueue (guest->host) kicks
2. An irqfd for host->guest kicks
3. Shared memory for vring and zero-copy data access

The storage management service provides a UNIX domain socket over
which fds can be passed to set up the vhost-like virtio-blk interface.

Moving the image format code into a separate program makes it possible
to safely write to a backing file while VMs are using it because the
storage service can be host-wide, not per-VM.  For example, streaming
a shared backing file over NFS while running VMs using copy-on-write
images.  If we ever want to do deduplication or other global
operations, then this approach is nice too.

To summarize:
The storage service manages image files including creation, deletion,
snapshotting, and actual I/O.  QEMU uses a vhost-like virtio-blk
interface and can pass it directly into the guest.  libvirt uses the
storage service API without needing to parse image files or keep track
of backing file relationships.

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-21 15:01                             ` [Qemu-devel] " Stefan Hajnoczi
  2011-07-21 19:42                               ` Blue Swirl
@ 2011-07-22  7:22                               ` Kevin Wolf
  2011-07-22  9:11                                 ` Stefan Hajnoczi
  1 sibling, 1 reply; 55+ messages in thread
From: Kevin Wolf @ 2011-07-22  7:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: libvir-list, Jes Sorensen, Eric Blake, QEMU Developers, Stefan Hajnoczi

Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
>> Thank you for persisting - you've found another hole that needs to be
>> plugged.  It sounds like you are proposing that after a qemu process dies,
>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>> backing file information has not changed in a manner unexpected by libvirt.
>>  If it has, then the qemu process that just died was compromised to the
>> point that restarting a new qemu process from the old image is now a
>> security risk.  So this is _yet another_ security aspect that needs to be
>> coded into libvirt as part of hardening sVirt.
> 
> The backing file information changes when image streaming completes.
> 
> Before: fedora.img <- my_vm.qed
> After: my_vm.qed (fedora.img is no longer referenced)
> 
> The image streaming operation copies data out of fedora.img and
> populates my_vm.qed.  When image streaming completes, the backing file
> is no longer needed and my_vm.qed is updated to drop the backing file.
> 
> I think we need to design carefully to prevent QEMU and libvirt making
> incorrect assumptions about who does what.  I really wish that all
> this image file business was outside QEMU and libvirt - that we had a
> separate storage management service which handled the details.  QEMU
> would only do block device operations (no image format manipulation),
> and libvirt would only delegate to the storage management service.

And how do you implement that in a way that works on all platforms, and
without root privileges? I can't see this happen unless it stays
completely optional.

Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-20 13:51                         ` Kevin Wolf
  2011-07-20 17:20                           ` Blue Swirl
@ 2011-07-22  7:36                           ` Avi Kivity
  2011-07-22  8:11                             ` Kevin Wolf
  1 sibling, 1 reply; 55+ messages in thread
From: Avi Kivity @ 2011-07-22  7:36 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Stefan Hajnoczi, Eric Blake

On 07/20/2011 04:51 PM, Kevin Wolf wrote:
> >
> >  The problem is that QEMU will find backing file file names inside the
> >  images which it will be unable to open. How do you suggest we get around
> >  that?
>
> This is the part with allowing libvirt to override the backing file. Of
> course, this is not something that we can add with five lines of code,
> it requires -blockdev.

It can be done without blockdev.  Have a dictionary that translates 
filenames, and populate it from the command line (for a bonus, translate 
a filename to a file descriptor inherited from the caller or passed via 
the monitor).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-22  7:36                           ` Avi Kivity
@ 2011-07-22  8:11                             ` Kevin Wolf
  2011-07-22 16:09                               ` Blue Swirl
  0 siblings, 1 reply; 55+ messages in thread
From: Kevin Wolf @ 2011-07-22  8:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Stefan Hajnoczi, Eric Blake

Am 22.07.2011 09:36, schrieb Avi Kivity:
> On 07/20/2011 04:51 PM, Kevin Wolf wrote:
>>>
>>>  The problem is that QEMU will find backing file file names inside the
>>>  images which it will be unable to open. How do you suggest we get around
>>>  that?
>>
>> This is the part with allowing libvirt to override the backing file. Of
>> course, this is not something that we can add with five lines of code,
>> it requires -blockdev.
> 
> It can be done without blockdev.  Have a dictionary that translates 
> filenames, and populate it from the command line (for a bonus, translate 
> a filename to a file descriptor inherited from the caller or passed via 
> the monitor).

Sure, you can always add ugly hacks, but it isn't the right solution
that we want to use for all times. However, once we use it, it will show
up in the external API and we'll never get rid of it again.

Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-22  7:22                               ` Kevin Wolf
@ 2011-07-22  9:11                                 ` Stefan Hajnoczi
  2011-07-22 16:05                                   ` Blue Swirl
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Hajnoczi @ 2011-07-22  9:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: libvir-list, Jes Sorensen, Eric Blake, QEMU Developers, Stefan Hajnoczi

On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
>> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
>>> Thank you for persisting - you've found another hole that needs to be
>>> plugged.  It sounds like you are proposing that after a qemu process dies,
>>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>>> backing file information has not changed in a manner unexpected by libvirt.
>>>  If it has, then the qemu process that just died was compromised to the
>>> point that restarting a new qemu process from the old image is now a
>>> security risk.  So this is _yet another_ security aspect that needs to be
>>> coded into libvirt as part of hardening sVirt.
>>
>> The backing file information changes when image streaming completes.
>>
>> Before: fedora.img <- my_vm.qed
>> After: my_vm.qed (fedora.img is no longer referenced)
>>
>> The image streaming operation copies data out of fedora.img and
>> populates my_vm.qed.  When image streaming completes, the backing file
>> is no longer needed and my_vm.qed is updated to drop the backing file.
>>
>> I think we need to design carefully to prevent QEMU and libvirt making
>> incorrect assumptions about who does what.  I really wish that all
>> this image file business was outside QEMU and libvirt - that we had a
>> separate storage management service which handled the details.  QEMU
>> would only do block device operations (no image format manipulation),
>> and libvirt would only delegate to the storage management service.
>
> And how do you implement that in a way that works on all platforms, and
> without root privileges? I can't see this happen unless it stays
> completely optional.

The cross-platform way would be an iSCSI target that understands image
formats.  But iSCSI requires copying when doing I/O and we can't pass
through virtio-blk.

I'm not sure I see the root privilege issue.

Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-22  5:06                                 ` Stefan Hajnoczi
@ 2011-07-22 15:49                                   ` Blue Swirl
  0 siblings, 0 replies; 55+ messages in thread
From: Blue Swirl @ 2011-07-22 15:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: libvir-list, Jes Sorensen, Eric Blake, QEMU Developers, Stefan Hajnoczi

On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
>>>> Thank you for persisting - you've found another hole that needs to be
>>>> plugged.  It sounds like you are proposing that after a qemu process dies,
>>>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>>>> backing file information has not changed in a manner unexpected by libvirt.
>>>>  If it has, then the qemu process that just died was compromised to the
>>>> point that restarting a new qemu process from the old image is now a
>>>> security risk.  So this is _yet another_ security aspect that needs to be
>>>> coded into libvirt as part of hardening sVirt.
>>>
>>> The backing file information changes when image streaming completes.
>>>
>>> Before: fedora.img <- my_vm.qed
>>> After: my_vm.qed (fedora.img is no longer referenced)
>>>
>>> The image streaming operation copies data out of fedora.img and
>>> populates my_vm.qed.  When image streaming completes, the backing file
>>> is no longer needed and my_vm.qed is updated to drop the backing file.
>>>
>>> I think we need to design carefully to prevent QEMU and libvirt making
>>> incorrect assumptions about who does what.  I really wish that all
>>> this image file business was outside QEMU and libvirt - that we had a
>>> separate storage management service which handled the details.  QEMU
>>> would only do block device operations (no image format manipulation),
>>> and libvirt would only delegate to the storage management service.
>>> Today we seem to be sprinkling a little bit of storage management into
>>> QEMU and a little bit into libvirt :(.
>>>
>>> In that spirit it is much nicer to think of storage like a SAN
>>> appliance where you have LUNs that you access as block devices.  It
>>> also provides an API for snapshotting, cloning LUNs, etc.
>>>
>>> Let's move to that model instead of worrying about how to spread
>>> storage logic across QEMU and libvirt.
>>
>> Would NBD protocol fit to this purpose, or is it too simple? Then
>> libvirt would handle the storage format completely and present an NBD
>> interface to QEMU (or give an fd to an external service) and QEMU
>> would not care about the storage format in this mode at all.
>
> NBD does not support flush (fdatasync).  Therefore it only supports
> the slow cache=writethrough mode in a safe manner.

Maybe NBD could still be used in networked setups as a secondary alternative.

> It would be neat to use virtio-blk as the interface because it can be
> passed through to the guest.  The guest talks directly to the storage
> management service without going through QEMU.  The trick is to do
> something like vhost:
> 1. An ioeventfd for virtqueue (guest->host) kicks
> 2. An irqfd for host->guest kicks
> 3. Shared memory for vring and zero-copy data access
>
> The storage management service provides a UNIX domain socket over
> which fds can be passed to set up the vhost-like virtio-blk interface.
>
> Moving the image format code into a separate program makes it possible
> to safely write to a backing file while VMs are using it because the
> storage service can be host-wide, not per-VM.  For example, streaming
> a shared backing file over NFS while running VMs using copy-on-write
> images.  If we ever want to do deduplication or other global
> operations, then this approach is nice too.
>
> To summarize:
> The storage service manages image files including creation, deletion,
> snapshotting, and actual I/O.  QEMU uses a vhost-like virtio-blk
> interface and can pass it directly into the guest.  libvirt uses the
> storage service API without needing to parse image files or keep track
> of backing file relationships.

Excellent plan. If one day kernel provides builtin virtio-blk services
which can be passed via libvirt and QEMU to the guest, we'll even have
zero copy all the way.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-22  9:11                                 ` Stefan Hajnoczi
@ 2011-07-22 16:05                                   ` Blue Swirl
  0 siblings, 0 replies; 55+ messages in thread
From: Blue Swirl @ 2011-07-22 16:05 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Jes Sorensen,
	QEMU Developers, Eric Blake

On Fri, Jul 22, 2011 at 12:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
>>> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
>>>> Thank you for persisting - you've found another hole that needs to be
>>>> plugged.  It sounds like you are proposing that after a qemu process dies,
>>>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>>>> backing file information has not changed in a manner unexpected by libvirt.
>>>>  If it has, then the qemu process that just died was compromised to the
>>>> point that restarting a new qemu process from the old image is now a
>>>> security risk.  So this is _yet another_ security aspect that needs to be
>>>> coded into libvirt as part of hardening sVirt.
>>>
>>> The backing file information changes when image streaming completes.
>>>
>>> Before: fedora.img <- my_vm.qed
>>> After: my_vm.qed (fedora.img is no longer referenced)
>>>
>>> The image streaming operation copies data out of fedora.img and
>>> populates my_vm.qed.  When image streaming completes, the backing file
>>> is no longer needed and my_vm.qed is updated to drop the backing file.
>>>
>>> I think we need to design carefully to prevent QEMU and libvirt making
>>> incorrect assumptions about who does what.  I really wish that all
>>> this image file business was outside QEMU and libvirt - that we had a
>>> separate storage management service which handled the details.  QEMU
>>> would only do block device operations (no image format manipulation),
>>> and libvirt would only delegate to the storage management service.
>>
>> And how do you implement that in a way that works on all platforms, and
>> without root privileges? I can't see this happen unless it stays
>> completely optional.
>
> The cross-platform way would be an iSCSI target that understands image
> formats.  But iSCSI requires copying when doing I/O and we can't pass
> through virtio-blk.

The guest could use iSCSI directly using the network interface without
virtio-blk. This setup wouldn't give max performance in local use but
it could also be useful in some networked setups and probably more
useful than NBD.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] live snapshot wiki updated
  2011-07-22  8:11                             ` Kevin Wolf
@ 2011-07-22 16:09                               ` Blue Swirl
  0 siblings, 0 replies; 55+ messages in thread
From: Blue Swirl @ 2011-07-22 16:09 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, libvir-list, Jes Sorensen, QEMU Developers,
	Avi Kivity, Stefan Hajnoczi, Eric Blake

On Fri, Jul 22, 2011 at 11:11 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 22.07.2011 09:36, schrieb Avi Kivity:
>> On 07/20/2011 04:51 PM, Kevin Wolf wrote:
>>>>
>>>>  The problem is that QEMU will find backing file file names inside the
>>>>  images which it will be unable to open. How do you suggest we get around
>>>>  that?
>>>
>>> This is the part with allowing libvirt to override the backing file. Of
>>> course, this is not something that we can add with five lines of code,
>>> it requires -blockdev.
>>
>> It can be done without blockdev.  Have a dictionary that translates
>> filenames, and populate it from the command line (for a bonus, translate
>> a filename to a file descriptor inherited from the caller or passed via
>> the monitor).
>
> Sure, you can always add ugly hacks, but it isn't the right solution
> that we want to use for all times. However, once we use it, it will show
> up in the external API and we'll never get rid of it again.

Fully agree. This would also be a highly specific API for QCOW2 and
similar formats.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2011-07-22 16:09 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-15 14:58 [Qemu-devel] live snapshot wiki updated Jes Sorensen
2011-07-18 14:08 ` Stefan Hajnoczi
2011-07-19  7:24   ` Jes Sorensen
2011-07-19 13:23     ` Stefan Hajnoczi
2011-07-19 13:27       ` Jes Sorensen
2011-07-19 13:58         ` Eric Blake
2011-07-19 14:09           ` Jes Sorensen
2011-07-19 14:24             ` Eric Blake
2011-07-19 14:30               ` Jes Sorensen
2011-07-19 15:14                 ` Stefan Hajnoczi
2011-07-19 16:46                   ` Daniel P. Berrange
2011-07-20  7:30                     ` Markus Armbruster
2011-07-20  8:23                     ` Jes Sorensen
2011-07-20  9:36                       ` Daniel P. Berrange
2011-07-20 10:15                         ` [Qemu-devel] [libvirt] " Nicolas Sebrecht
2011-07-20 10:28                           ` Daniel P. Berrange
2011-07-20 11:40                             ` [Qemu-devel] [libvirt] " Stefan Hajnoczi
     [not found]                             ` <4E27E610.7090502@redhat.com>
     [not found]                               ` <4E282DE6.1020603@redhat.com>
     [not found]                                 ` <4E283554.4080903@redhat.com>
2011-07-21 14:51                                   ` Eric Blake
     [not found]                         ` <4E27E5A2.2030208@redhat.com>
     [not found]                           ` <4E28317D.9020502@redhat.com>
2011-07-21 15:01                             ` [Qemu-devel] " Stefan Hajnoczi
2011-07-21 19:42                               ` Blue Swirl
2011-07-22  5:06                                 ` Stefan Hajnoczi
2011-07-22 15:49                                   ` Blue Swirl
2011-07-22  7:22                               ` Kevin Wolf
2011-07-22  9:11                                 ` Stefan Hajnoczi
2011-07-22 16:05                                   ` Blue Swirl
2011-07-20  9:50                     ` Kevin Wolf
2011-07-20 10:18                       ` Daniel P. Berrange
2011-07-19 16:14                 ` Anthony Liguori
2011-07-20  8:25                   ` Jes Sorensen
2011-07-20 10:01                     ` Kevin Wolf
2011-07-20 13:25                       ` Jes Sorensen
2011-07-20 13:46                         ` Eric Blake
2011-07-20 17:27                           ` Blue Swirl
2011-07-20 17:47                             ` Eric Blake
2011-07-20 19:51                               ` Blue Swirl
     [not found]                                 ` <4E27DE5D.5050502@redhat.com>
2011-07-21 19:34                                   ` Blue Swirl
2011-07-20 13:51                         ` Kevin Wolf
2011-07-20 17:20                           ` Blue Swirl
2011-07-20 17:41                             ` Eric Blake
2011-07-20 18:00                               ` Blue Swirl
2011-07-20 18:17                                 ` Eric Blake
2011-07-20 20:01                                   ` Blue Swirl
2011-07-20 20:10                                     ` Eric Blake
     [not found]                             ` <4E27E280.2060306@redhat.com>
2011-07-21 19:01                               ` Blue Swirl
2011-07-22  7:36                           ` Avi Kivity
2011-07-22  8:11                             ` Kevin Wolf
2011-07-22 16:09                               ` Blue Swirl
2011-07-20 13:50                   ` Cleber Rosa
2011-07-20 14:34                     ` Anthony Liguori
2011-07-20 18:34                       ` Cleber Rosa
2011-07-19 16:47                 ` Daniel P. Berrange
2011-07-20  8:26                   ` Jes Sorensen
2011-07-20  9:38                     ` Daniel P. Berrange
2011-07-20 14:35                   ` Anthony Liguori
2011-07-21 18:56                   ` Michael Roth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.