All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] block-commit & dropping privs
@ 2015-03-27  9:07 Michael Tokarev
  2015-03-27 14:49 ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Tokarev @ 2015-03-27  9:07 UTC (permalink / raw)
  To: qemu-devel

Hello.

I tried to experiment with block-commit command, which propagates
changes accumulated in an overlay (qcow2) block image file back to
the base image file.

And immediately faced a problem.  All my VMs are run chrooted into
an empty dir and with low-priv user (using -runsa and -chroot options,
initially started as root).  Ofcourse this low-priv qemu process
can't open the base image anymore, because it doesn't have the
necessary permissions and because the base file is inaccessible
within the chroot.

So I wonder if we can avoid reopening the base img by always opening
it read-write (using a command-line option), does it make sense?

Or maybe there's some other possible solution to this, for example,
passing in a filedescriptor for the new base img over a unix socket?

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-03-27  9:07 [Qemu-devel] block-commit & dropping privs Michael Tokarev
@ 2015-03-27 14:49 ` Eric Blake
  2015-03-27 15:36   ` Michael Tokarev
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2015-03-27 14:49 UTC (permalink / raw)
  To: Michael Tokarev, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

On 03/27/2015 03:07 AM, Michael Tokarev wrote:
> Hello.
> 
> I tried to experiment with block-commit command, which propagates
> changes accumulated in an overlay (qcow2) block image file back to
> the base image file.
> 
> And immediately faced a problem.  All my VMs are run chrooted into
> an empty dir and with low-priv user (using -runsa and -chroot options,
> initially started as root).  Ofcourse this low-priv qemu process
> can't open the base image anymore, because it doesn't have the
> necessary permissions and because the base file is inaccessible
> within the chroot.
> 
> So I wonder if we can avoid reopening the base img by always opening
> it read-write (using a command-line option), does it make sense?

It is already possible to open a file read-write on the command line, by
using -add-fd twice to put both a read-only and a read-write fd handle
to the same file in a single set N, then using -drive options to specify
/dev/fdset/N rather than the file name.  By that argument, I'm not sure
if adding any other command line options is necessary.

> 
> Or maybe there's some other possible solution to this, for example,
> passing in a filedescriptor for the new base img over a unix socket?

Hmm, we do allow QMP to pass in fds over Unix sockets already, but what
we don't have yet is the ability to associate one of those just-passed
fds to an existing open BDS (right now, you can only create a new fdset
as tied to a new BDS, but block-commit can only target an open BDS;
there is no way to pivot which BDS base is associated with another BDS).
 So maybe there is still room for improvement, especially since having a
QMP solution is nicer than requiring foresight to pass in an fdset from
the command line.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-03-27 14:49 ` Eric Blake
@ 2015-03-27 15:36   ` Michael Tokarev
  2015-03-27 17:12     ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Tokarev @ 2015-03-27 15:36 UTC (permalink / raw)
  To: Eric Blake, qemu-devel

27.03.2015 17:49, Eric Blake wrote:
> On 03/27/2015 03:07 AM, Michael Tokarev wrote:
>> Hello.
>>
>> I tried to experiment with block-commit command, which propagates
>> changes accumulated in an overlay (qcow2) block image file back to
>> the base image file.
>>
>> And immediately faced a problem.  All my VMs are run chrooted into
>> an empty dir and with low-priv user (using -runsa and -chroot options,
>> initially started as root).  Ofcourse this low-priv qemu process
>> can't open the base image anymore, because it doesn't have the
>> necessary permissions and because the base file is inaccessible
>> within the chroot.
>>
>> So I wonder if we can avoid reopening the base img by always opening
>> it read-write (using a command-line option), does it make sense?
> 
> It is already possible to open a file read-write on the command line, by
> using -add-fd twice to put both a read-only and a read-write fd handle
> to the same file in a single set N, then using -drive options to specify
> /dev/fdset/N rather than the file name.  By that argument, I'm not sure
> if adding any other command line options is necessary.

How does fdSET work?  How to use it?  Will the BDS reopen work with an
fdset in an empty chroot?

Sorry I haven't seen this so far, and documentation is a bit vague.

I think I see how this should work, monitor_fdset_get_fd() will search
an FD with matching access mode flags...  Ok, so two fds in an fdset,
one ro and one rw.  And with that in mind, since qemu_open() checks if
the filename starts with /dev/fdset/, it should work inside a chroot.

Wonder how to specify cache mode, or should I open these with proper
O_DIRECT/O_SYNC/whatever?  It looks like it's possible to change O_DIRECT
at runtime but not O_SYNC.

And the more interesting question is how to do that from shell.

Oh well.

Thanks,

/mjt


>> Or maybe there's some other possible solution to this, for example,
>> passing in a filedescriptor for the new base img over a unix socket?
> 
> Hmm, we do allow QMP to pass in fds over Unix sockets already, but what
> we don't have yet is the ability to associate one of those just-passed
> fds to an existing open BDS (right now, you can only create a new fdset
> as tied to a new BDS, but block-commit can only target an open BDS;
> there is no way to pivot which BDS base is associated with another BDS).
>  So maybe there is still room for improvement, especially since having a
> QMP solution is nicer than requiring foresight to pass in an fdset from
> the command line.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-03-27 15:36   ` Michael Tokarev
@ 2015-03-27 17:12     ` Eric Blake
  2015-03-30 15:36       ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2015-03-27 17:12 UTC (permalink / raw)
  To: Michael Tokarev, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2959 bytes --]

On 03/27/2015 09:36 AM, Michael Tokarev wrote:
>> It is already possible to open a file read-write on the command line, by
>> using -add-fd twice to put both a read-only and a read-write fd handle
>> to the same file in a single set N, then using -drive options to specify
>> /dev/fdset/N rather than the file name.  By that argument, I'm not sure
>> if adding any other command line options is necessary.
> 
> How does fdSET work?  How to use it?  Will the BDS reopen work with an
> fdset in an empty chroot?

Basically, you can create an fdset that contains one or more fds.  Any
code in qemu that uses qemu_open() understands the magic pseudo-path of
/dev/fdset/N as redirecting the open to instead find the first fd in
that set that matches the requested permissions.  So if an fdset
contains both a read-only fd and a read-write fd, the set will dup() the
appropriate fd back to the caller of qemu_open.  Opening a drive is one
of the places already wired up to use qemu_open. fdset manipulations can
be done on both the initial command line (-add-fd along with open file
descriptor inheritance) and in QMP (add-fd over the Unix socket with
SCM_RIGHTS).  As the fdset has access to already-open fds, it can access
data even when open() will not succeed (such as in an empty chroot; but
ALSO in the case of NVSv3 which lacks persistent SELinux labeling to
affect open() but has no problem with per-fd labelling, and thus where
fdset is supposed to allow out-of-the-box sandboxing without the current
hack of 'setsebool virt_use_nfs off').

Note that fdsets have been in the code base for a couple of years now,
but that most of it is still on the theory side (it SHOULD work) and not
the practical side (libvirt isn't using them yet, although I want to
eventually get there), so it will be nice if you can post actual working
scenarios if you get it working.

> 
> Sorry I haven't seen this so far, and documentation is a bit vague.

Yeah, that's certainly the case (and patches are of course welcome).

> 
> I think I see how this should work, monitor_fdset_get_fd() will search
> an FD with matching access mode flags...  Ok, so two fds in an fdset,
> one ro and one rw.  And with that in mind, since qemu_open() checks if
> the filename starts with /dev/fdset/, it should work inside a chroot.

Yep.

> 
> Wonder how to specify cache mode, or should I open these with proper
> O_DIRECT/O_SYNC/whatever?  It looks like it's possible to change O_DIRECT
> at runtime but not O_SYNC.
> 
> And the more interesting question is how to do that from shell.

Redirections only get you so far in shell; you may need a wrapper C
program go get O_DIRECT and/or O_SYNC pre-set.  Then again, if you use
QMP and pass over the Unix socket, you need a C program anyways.

> 
> Oh well.

Good luck!

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-03-27 17:12     ` Eric Blake
@ 2015-03-30 15:36       ` Kevin Wolf
  2015-04-01  9:26         ` Michael Tokarev
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2015-03-30 15:36 UTC (permalink / raw)
  To: Eric Blake; +Cc: Michael Tokarev, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

Am 27.03.2015 um 18:12 hat Eric Blake geschrieben:
> On 03/27/2015 09:36 AM, Michael Tokarev wrote:
> > Wonder how to specify cache mode, or should I open these with proper
> > O_DIRECT/O_SYNC/whatever?  It looks like it's possible to change O_DIRECT
> > at runtime but not O_SYNC.
> > 
> > And the more interesting question is how to do that from shell.
> 
> Redirections only get you so far in shell; you may need a wrapper C
> program go get O_DIRECT and/or O_SYNC pre-set.  Then again, if you use
> QMP and pass over the Unix socket, you need a C program anyways.

O_DIRECT can be set with fcntl(), so qemu takes care of that. O_SYNC
is completely unused on Linux these days, so that shouldn't be a problem
either. (Other platforms use it as a misguided attempt of approximating
O_DIRECT. We should really error out instead.)

So if I'm not mistaken, just having one read-only and one read-write fd
should be enough for any configuration in practice.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-03-30 15:36       ` Kevin Wolf
@ 2015-04-01  9:26         ` Michael Tokarev
  2015-04-01  9:54           ` Michael Tokarev
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Tokarev @ 2015-04-01  9:26 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake; +Cc: qemu-devel, qemu-block

30.03.2015 18:36, Kevin Wolf wrote:
> Am 27.03.2015 um 18:12 hat Eric Blake geschrieben:
>> On 03/27/2015 09:36 AM, Michael Tokarev wrote:
>>> Wonder how to specify cache mode, or should I open these with proper
>>> O_DIRECT/O_SYNC/whatever?  It looks like it's possible to change O_DIRECT
>>> at runtime but not O_SYNC.
>>>
>>> And the more interesting question is how to do that from shell.
>>
>> Redirections only get you so far in shell; you may need a wrapper C
>> program go get O_DIRECT and/or O_SYNC pre-set.  Then again, if you use
>> QMP and pass over the Unix socket, you need a C program anyways.
> 
> O_DIRECT can be set with fcntl(), so qemu takes care of that. O_SYNC
> is completely unused on Linux these days, so that shouldn't be a problem
> either. (Other platforms use it as a misguided attempt of approximating
> O_DIRECT. We should really error out instead.)
> 
> So if I'm not mistaken, just having one read-only and one read-write fd
> should be enough for any configuration in practice.

Probably yes.  Except the thing doesn't actually work. ;)

When flushing changes to the base image, that base image needs to be
reopened.  So I should convince qemu that the base image of this qcow
file is /dev/fdset/foo, not the one recorded in the header.

Is it possible?

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-01  9:26         ` Michael Tokarev
@ 2015-04-01  9:54           ` Michael Tokarev
  2015-04-01 12:34             ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Tokarev @ 2015-04-01  9:54 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake; +Cc: qemu-devel, qemu-block

01.04.2015 12:26, Michael Tokarev пишет:
> 30.03.2015 18:36, Kevin Wolf wrote:
>> Am 27.03.2015 um 18:12 hat Eric Blake geschrieben:
>>> On 03/27/2015 09:36 AM, Michael Tokarev wrote:
>>>> Wonder how to specify cache mode, or should I open these with proper
>>>> O_DIRECT/O_SYNC/whatever?  It looks like it's possible to change O_DIRECT
>>>> at runtime but not O_SYNC.
>>>>
>>>> And the more interesting question is how to do that from shell.
>>>
>>> Redirections only get you so far in shell; you may need a wrapper C
>>> program go get O_DIRECT and/or O_SYNC pre-set.  Then again, if you use
>>> QMP and pass over the Unix socket, you need a C program anyways.
>>
>> O_DIRECT can be set with fcntl(), so qemu takes care of that. O_SYNC
>> is completely unused on Linux these days, so that shouldn't be a problem

Do you mean it is unused in qemu or in kernel? :)

>> either. (Other platforms use it as a misguided attempt of approximating
>> O_DIRECT. We should really error out instead.)
>>
>> So if I'm not mistaken, just having one read-only and one read-write fd
>> should be enough for any configuration in practice.

Yes, O_DIRECT appears to work with fdsets.

> Probably yes.  Except the thing doesn't actually work. ;)
> 
> When flushing changes to the base image, that base image needs to be
> reopened.  So I should convince qemu that the base image of this qcow
> file is /dev/fdset/foo, not the one recorded in the header.

qemu-system-x86_64: -drive file=w7x64.qcow2,backing_file=/dev/fdset/2,if=none,id=dr,cache=none: \
 could not open disk image w7x64.qcow2: Block format 'qcow2' used by device 'dr' \
 doesn't support the option 'backing_file'

hmm?..

> Is it possible?

Thanks,

/mjt
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-01  9:54           ` Michael Tokarev
@ 2015-04-01 12:34             ` Kevin Wolf
  2015-04-02 10:58               ` Michael Tokarev
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2015-04-01 12:34 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: qemu-devel, qemu-block

Am 01.04.2015 um 11:54 hat Michael Tokarev geschrieben:
> 01.04.2015 12:26, Michael Tokarev пишет:
> > 30.03.2015 18:36, Kevin Wolf wrote:
> >> Am 27.03.2015 um 18:12 hat Eric Blake geschrieben:
> >>> On 03/27/2015 09:36 AM, Michael Tokarev wrote:
> >>>> Wonder how to specify cache mode, or should I open these with proper
> >>>> O_DIRECT/O_SYNC/whatever?  It looks like it's possible to change O_DIRECT
> >>>> at runtime but not O_SYNC.
> >>>>
> >>>> And the more interesting question is how to do that from shell.
> >>>
> >>> Redirections only get you so far in shell; you may need a wrapper C
> >>> program go get O_DIRECT and/or O_SYNC pre-set.  Then again, if you use
> >>> QMP and pass over the Unix socket, you need a C program anyways.
> >>
> >> O_DIRECT can be set with fcntl(), so qemu takes care of that. O_SYNC
> >> is completely unused on Linux these days, so that shouldn't be a problem
> 
> Do you mean it is unused in qemu or in kernel? :)

In qemu. We used to use O_DSYNC for cache=writethrough long ago, but
nowadays we just issue fdatasync() manually.

> >> either. (Other platforms use it as a misguided attempt of approximating
> >> O_DIRECT. We should really error out instead.)
> >>
> >> So if I'm not mistaken, just having one read-only and one read-write fd
> >> should be enough for any configuration in practice.
> 
> Yes, O_DIRECT appears to work with fdsets.
> 
> > Probably yes.  Except the thing doesn't actually work. ;)
> > 
> > When flushing changes to the base image, that base image needs to be
> > reopened.  So I should convince qemu that the base image of this qcow
> > file is /dev/fdset/foo, not the one recorded in the header.
> 
> qemu-system-x86_64: -drive file=w7x64.qcow2,backing_file=/dev/fdset/2,if=none,id=dr,cache=none: \
>  could not open disk image w7x64.qcow2: Block format 'qcow2' used by device 'dr' \
>  doesn't support the option 'backing_file'

Overriding the backing file should work like this:

    -drive file=...,backing.file.filename=/dev/fdset/2

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-01 12:34             ` Kevin Wolf
@ 2015-04-02 10:58               ` Michael Tokarev
  2015-04-02 11:24                 ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Tokarev @ 2015-04-02 10:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block

01.04.2015 15:34, Kevin Wolf wrote:
[]
> Overriding the backing file should work like this:
> 
>     -drive file=...,backing.file.filename=/dev/fdset/2

Oh-ok, this works.  Sort of.

Because after performing commit (is there a difference between
commit hmp command and block-commit qmp command?  I used the
former so far), the new backing file name is recorded to the
qcow2 file header, and now, qemu-img can't operate on this
file anymore, telling me it can't access backing file due
to bad file descriptor.

So in order to finally commit the file I used qemu-system
again, running it like that specifying backing file, but
with -S option, and issued `commit' command again.

I noticed that I can't specify backing file for
qemu-img commit.

But overall, I think qemu-system should not modify backing
file name in this case.

When performing commit, does qemu mark the areas in the
overlay file as free after writing contents to the backing
file, or will these areas be written again by a subsequent
commit?  Somehow it smells like each next commit writes
more and more data and completes in more and more time.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 10:58               ` Michael Tokarev
@ 2015-04-02 11:24                 ` Kevin Wolf
  2015-04-02 12:04                   ` Michael Tokarev
  2015-04-03  3:59                   ` Jeff Cody
  0 siblings, 2 replies; 20+ messages in thread
From: Kevin Wolf @ 2015-04-02 11:24 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: jcody, qemu-devel, qemu-block

Am 02.04.2015 um 12:58 hat Michael Tokarev geschrieben:
> 01.04.2015 15:34, Kevin Wolf wrote:
> []
> > Overriding the backing file should work like this:
> > 
> >     -drive file=...,backing.file.filename=/dev/fdset/2
> 
> Oh-ok, this works.  Sort of.
> 
> Because after performing commit (is there a difference between
> commit hmp command and block-commit qmp command?  I used the
> former so far),

Yes, two completely different implementations. HMP 'commit' is a
non-live version that effectively pauses the VM while it runs.

Apparently the live version (which is QMP 'block-commit') isn't
available in HMP. Jeff, do you remember why?

qemu-img commit reuses the QMP code today.

> the new backing file name is recorded to the
> qcow2 file header, and now, qemu-img can't operate on this
> file anymore, telling me it can't access backing file due
> to bad file descriptor.

Right. That's why block-commit has an optional 'backing-file' argument
that contains the new backing file string to be written to the file.

> So in order to finally commit the file I used qemu-system
> again, running it like that specifying backing file, but
> with -S option, and issued `commit' command again.
> 
> I noticed that I can't specify backing file for
> qemu-img commit.
> 
> But overall, I think qemu-system should not modify backing
> file name in this case.

So you would leave the backing file with the data that you just
committed down one level in your backing file chain? Wouldn't that
defeat the whole purpose of committing?

> When performing commit, does qemu mark the areas in the
> overlay file as free after writing contents to the backing
> file, or will these areas be written again by a subsequent
> commit?  Somehow it smells like each next commit writes
> more and more data and completes in more and more time.

With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
Other image formats keep the copy.

Live commit always keeps the data in the parent image, but it drops that
image from the backing file chain, so effectively it is dropped as well
(but you need to remove the file manually to reclaim the space).

Jeff, this sounds like we need to make some changes to unify this. That
might mean introducing an option that decides whether an image should be
dropped from the chain or emptied. Once live commit can provide the same
as HMP commit, we should change HMP commit to reuse the live commit
code.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 11:24                 ` Kevin Wolf
@ 2015-04-02 12:04                   ` Michael Tokarev
  2015-04-02 13:07                     ` Eric Blake
  2015-04-02 13:19                     ` Kevin Wolf
  2015-04-03  3:59                   ` Jeff Cody
  1 sibling, 2 replies; 20+ messages in thread
From: Michael Tokarev @ 2015-04-02 12:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel, qemu-block

02.04.2015 14:24, Kevin Wolf wrote:
[]
>> But overall, I think qemu-system should not modify backing
>> file name in this case.
> 
> So you would leave the backing file with the data that you just
> committed down one level in your backing file chain? Wouldn't that
> defeat the whole purpose of committing?

Um.  I don't think we understood each other.

I experimented with the "non-live" HMP commit command.  This
one effectively empties everything in the overlay file,
propagating it to the backing file, but keeps the (now
empty) overlay.  So from the stacking perspective nothing
has changed.  Yet, together with with propagation, it also
modifies the overlay file headers and writes a new name
of the backing file -- the one it currently uses, which,
in my case, is virtual /dev/fdset/foo.  It should keep
the original file name in there, such as win.raw, unless
explicitly asked to write a different name there.

If the stack chain were to be modified by commit command,
yes, the new name should be recorded ofcourse, such as
after rebase.  But since stack chain is not modified,
filename should not be modified either.

>> When performing commit, does qemu mark the areas in the
>> overlay file as free after writing contents to the backing
>> file, or will these areas be written again by a subsequent
>> commit?  Somehow it smells like each next commit writes
>> more and more data and completes in more and more time.
> 
> With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
> Other image formats keep the copy.

Hm.  It is discarded, but the file isn't shrinked.  With "non-live"
commit I don't see a reason why it can't be shrinked too?

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 12:04                   ` Michael Tokarev
@ 2015-04-02 13:07                     ` Eric Blake
  2015-04-03  4:28                       ` Jeff Cody
  2015-04-02 13:19                     ` Kevin Wolf
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Blake @ 2015-04-02 13:07 UTC (permalink / raw)
  To: Michael Tokarev, Kevin Wolf; +Cc: jcody, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 6777 bytes --]

On 04/02/2015 06:04 AM, Michael Tokarev wrote:
> 02.04.2015 14:24, Kevin Wolf wrote:
> []
>>> But overall, I think qemu-system should not modify backing
>>> file name in this case.
>>
>> So you would leave the backing file with the data that you just
>> committed down one level in your backing file chain? Wouldn't that
>> defeat the whole purpose of committing?

No, because there are multiple reasons for committing.  And we recently
updated qemu-img to use QMP for multiple styles of commit:

1: merge data from the active layer to the immediate backing, but keep
the active layer intact (by freeing its sectors). Useful for saying "I
want to reset my point-in-time backup to now, but continue tracking the
delta since the point-in-time".
 starting with: A <- B <- C, with B created at point1, C created at
point2, and right now being point3. B contains point1..point2, C
contains point2..point3+
 qemu-img commit C
 ending with: A <- B <- C, B contains point1..point3, C contains
point3+; all three files still valid

2: merge data from the active layer to the immediate backing, and
discard the active layer (no need to waste time freeing its sectors).
Useful for saying "I no longer need my point-in-time backup, revert to
the smaller chain"
 starting with: A <- B <- C (as before)
 qemu-img commit -d C
 ending with: A <- B, B contains point1..point3+; C invalid the moment
something is written to B

3: merge data from the active layer to a specified backing (typically
more than one file) to really clean up a chain. Useful for saying "I
want to keep current state but discard all intermediate points in time".
 When crossing multiple backing files, you WILL cause corruption to the
intermediate files.  Therefore, this mode implies -d, and there is no
point wasting time in clearing out sectors in the discarded part of the
chain.
 starting with: A <- B <- C (as before)
 qemu-img commit -b A C
 ending with: A; A contains point3+, B is immediately invalid, C is
invalid the moment something else is written to A

4: offline-only: note that 'qemu-img commit' can ONLY commit the changes
of the filename argument, while QMP can commit any arbitrary point in a
chain (note that there is a difference between active commit and
intermediate commit). But qemu-img can be used to do intermediate
commit, by virtue of the fact that since the image is not in active use,
you can specify any intermediate file in the chain as the point to
commit, coupled with rewriting the backing file of the rest of the chain
to match. Requires two qemu-img commands, so might be worth adding sugar
to do it in one command the way QMP can do, if desired.
 starting with: A <- B <- C (as before)
 qemu-img commit -d B
 qemu-img rebase -u A C
 ending with: A <- C; A contains point2, C contains point2+

[Proof that committing to more than the immediate backing file was in
the thread on converting qemu-img to use QMP; although I'm somewhat
repeating that argument down below]

> 
> Um.  I don't think we understood each other.
> 
> I experimented with the "non-live" HMP commit command.  This
> one effectively empties everything in the overlay file,
> propagating it to the backing file, but keeps the (now
> empty) overlay.  So from the stacking perspective nothing
> has changed.  Yet, together with with propagation, it also
> modifies the overlay file headers and writes a new name
> of the backing file -- the one it currently uses, which,
> in my case, is virtual /dev/fdset/foo.  It should keep
> the original file name in there, such as win.raw, unless
> explicitly asked to write a different name there.

Bug #1 - HMP doesn't use QMP commit - and therefore it can only commit
the active layer. I argued above that qemu-img has the same limitation,
except that qemu-img can be used anywhere in the chain by using multiple
qemu-img commands

Bug #2 - HMP changes the backing file when it shouldn't (active commit
should never alter backing information if the active file doesn't change)

> 
> If the stack chain were to be modified by commit command,
> yes, the new name should be recorded ofcourse, such as
> after rebase.  But since stack chain is not modified,
> filename should not be modified either.
> 
>>> When performing commit, does qemu mark the areas in the
>>> overlay file as free after writing contents to the backing
>>> file, or will these areas be written again by a subsequent
>>> commit?  Somehow it smells like each next commit writes
>>> more and more data and completes in more and more time.
>>
>> With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
>> Other image formats keep the copy.
> 
> Hm.  It is discarded, but the file isn't shrinked.  With "non-live"
> commit I don't see a reason why it can't be shrinked too?

Oh, you pointed out another difference.  I was going to say that QMP
active commit followed by aborting the transaction keeps the active
layer unchanged, but QMP does not yet expose a way to empty the active
layer.  That is, starting with these file contents (where - implies
deferring to the backing file):
B 1111
C --22

then qemu-img commit C leads to:
B 1122
C ----

while QMP active commit/abort leads to:
B 1122
C --22

at least both qemu-img commit -d C and QMP active commit/complete lead to:
B 1122
C useless

So we potentially have:

Bug #3 - QMP does not yet expose a way to empty the active layer

and if that is fixed, we need to be careful to not introduce:

Bug #4 - QMP must not allow emptying the active layer if the active
commit crosses more than one backing file but backing chain is not rewritten

To see why, you need merely understand why qemu-img -b implies -d:
     contents   guest would see
   A 1111       1111
A<-B -22-       1221
B<-C --33       1233

then commit C into A. If C is left untouched:
     contents   guest would see
   A 1233       1233
A<-B -22-       1223 # Oops, B is corrupted
B<-C --33       1233

and if C is emptied:
     contents   guest would see
   A 1233       1233
A<-B -22-       1223 # Oops, B is corrupted
B<-C ----       1223 # Oops, C is corrupted

Basically, once a commit crosses more than one file, all intermediate
files are useless and might as well be discarded. But you've pointed out
that by rewriting the backing file of C, we CAN make C still be
consistent and tracking the change since the commit:

     contents   guest would see
   A 1233       1233
A<-C ----       1233

Bug #5 - QMP commit doesn't offer a way to rewrite chain when committing
across multiple images


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 12:04                   ` Michael Tokarev
  2015-04-02 13:07                     ` Eric Blake
@ 2015-04-02 13:19                     ` Kevin Wolf
  2015-04-06 15:37                       ` Michael Tokarev
  1 sibling, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2015-04-02 13:19 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: jcody, qemu-devel, qemu-block

Am 02.04.2015 um 14:04 hat Michael Tokarev geschrieben:
> 02.04.2015 14:24, Kevin Wolf wrote:
> []
> >> But overall, I think qemu-system should not modify backing
> >> file name in this case.
> > 
> > So you would leave the backing file with the data that you just
> > committed down one level in your backing file chain? Wouldn't that
> > defeat the whole purpose of committing?
> 
> Um.  I don't think we understood each other.
> 
> I experimented with the "non-live" HMP commit command.  This
> one effectively empties everything in the overlay file,
> propagating it to the backing file, but keeps the (now
> empty) overlay.  So from the stacking perspective nothing
> has changed.  Yet, together with with propagation, it also
> modifies the overlay file headers and writes a new name
> of the backing file -- the one it currently uses, which,
> in my case, is virtual /dev/fdset/foo.  It should keep
> the original file name in there, such as win.raw, unless
> explicitly asked to write a different name there.
> 
> If the stack chain were to be modified by commit command,
> yes, the new name should be recorded ofcourse, such as
> after rebase.  But since stack chain is not modified,
> filename should not be modified either.

For the record, we discussed this on IRC:

Yes, we did talk past each other because HMP commit isn't supposed to
touch the backing file name at all, so I didn't expect such behaviour,
yet Michael saw it.

The reason is a bug in qcow2_update_header(). Instead of rewriting the
same value as is already in the image, it writes bs->backing_file to the
image. This was always the same as long as you couldn't override the
backing file name on the command line or in blockdev-add, but now it's
obviously wrong.

It would also rewrite the backing file name on other occasions such as
marking the image dirty with lazy_refcounts=on (i.e. on the first
write request).

> >> When performing commit, does qemu mark the areas in the
> >> overlay file as free after writing contents to the backing
> >> file, or will these areas be written again by a subsequent
> >> commit?  Somehow it smells like each next commit writes
> >> more and more data and completes in more and more time.
> > 
> > With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
> > Other image formats keep the copy.
> 
> Hm.  It is discarded, but the file isn't shrinked.  With "non-live"
> commit I don't see a reason why it can't be shrinked too?

This would be a bug as well, but Michael double-checked and it does
shrink in fact.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 11:24                 ` Kevin Wolf
  2015-04-02 12:04                   ` Michael Tokarev
@ 2015-04-03  3:59                   ` Jeff Cody
  2015-04-07  9:18                     ` Kevin Wolf
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Cody @ 2015-04-03  3:59 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Michael Tokarev, qemu-devel, qemu-block

On Thu, Apr 02, 2015 at 01:24:02PM +0200, Kevin Wolf wrote:
> Am 02.04.2015 um 12:58 hat Michael Tokarev geschrieben:
> > 01.04.2015 15:34, Kevin Wolf wrote:
> > []
> > > Overriding the backing file should work like this:
> > > 
> > >     -drive file=...,backing.file.filename=/dev/fdset/2
> > 
> > Oh-ok, this works.  Sort of.
> > 
> > Because after performing commit (is there a difference between
> > commit hmp command and block-commit qmp command?  I used the
> > former so far),
> 
> Yes, two completely different implementations. HMP 'commit' is a
> non-live version that effectively pauses the VM while it runs.
> 
> Apparently the live version (which is QMP 'block-commit') isn't
> available in HMP. Jeff, do you remember why?
> 
> qemu-img commit reuses the QMP code today.
>

IIRC, it was because the live commit was, by definition, live.  And
the original HMP commit operated on the active layer (and the active
layer only), and the live commit originally did not.  So the HMP
commit provided functionality that was not in the QMP block-commit.

> > the new backing file name is recorded to the
> > qcow2 file header, and now, qemu-img can't operate on this
> > file anymore, telling me it can't access backing file due
> > to bad file descriptor.
> 
> Right. That's why block-commit has an optional 'backing-file' argument
> that contains the new backing file string to be written to the file.
> 
> > So in order to finally commit the file I used qemu-system
> > again, running it like that specifying backing file, but
> > with -S option, and issued `commit' command again.
> > 
> > I noticed that I can't specify backing file for
> > qemu-img commit.
> > 
> > But overall, I think qemu-system should not modify backing
> > file name in this case.
> 
> So you would leave the backing file with the data that you just
> committed down one level in your backing file chain? Wouldn't that
> defeat the whole purpose of committing?
> 
> > When performing commit, does qemu mark the areas in the
> > overlay file as free after writing contents to the backing
> > file, or will these areas be written again by a subsequent
> > commit?  Somehow it smells like each next commit writes
> > more and more data and completes in more and more time.
> 
> With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
> Other image formats keep the copy.
> 
> Live commit always keeps the data in the parent image, but it drops that
> image from the backing file chain, so effectively it is dropped as well
> (but you need to remove the file manually to reclaim the space).
> 
> Jeff, this sounds like we need to make some changes to unify this. That
> might mean introducing an option that decides whether an image should be
> dropped from the chain or emptied.

Yes, I think this would be a good idea.  I can think of scenarios
where we would want to keep the original overlay around (i.e., we
still want a snapshot overlay, but just want to consolidate data).
But I can't think of any reason we would want to keep a stale
populated overlay file around.

> Once live commit can provide the same
> as HMP commit, we should change HMP commit to reuse the live commit
> code.
> 

Hmm. My concern here is there may be times we want the behavior that
HMP commit provides - faster offline active layer commit, rather than
a mirror-like operation.  I guess if we do want that behavior,
however, we could always introduce it as an option to block-commit.
So yes, let's provide an option to remove or empty the committed
overlays (applies to all overlays, if there are multiple overlays
being committed in the chain), and then have HMP commit reuse the live
commit code.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 13:07                     ` Eric Blake
@ 2015-04-03  4:28                       ` Jeff Cody
  2015-04-03 19:49                         ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Cody @ 2015-04-03  4:28 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Michael Tokarev, qemu-devel, qemu-block

On Thu, Apr 02, 2015 at 07:07:23AM -0600, Eric Blake wrote:
> On 04/02/2015 06:04 AM, Michael Tokarev wrote:
> > 02.04.2015 14:24, Kevin Wolf wrote:
> > []
> >>> But overall, I think qemu-system should not modify backing
> >>> file name in this case.
> >>
> >> So you would leave the backing file with the data that you just
> >> committed down one level in your backing file chain? Wouldn't that
> >> defeat the whole purpose of committing?
> 
> No, because there are multiple reasons for committing.  And we recently
> updated qemu-img to use QMP for multiple styles of commit:
> 
> 1: merge data from the active layer to the immediate backing, but keep
> the active layer intact (by freeing its sectors). Useful for saying "I
> want to reset my point-in-time backup to now, but continue tracking the
> delta since the point-in-time".
>  starting with: A <- B <- C, with B created at point1, C created at
> point2, and right now being point3. B contains point1..point2, C
> contains point2..point3+
>  qemu-img commit C
>  ending with: A <- B <- C, B contains point1..point3, C contains
> point3+; all three files still valid
> 
> 2: merge data from the active layer to the immediate backing, and
> discard the active layer (no need to waste time freeing its sectors).
> Useful for saying "I no longer need my point-in-time backup, revert to
> the smaller chain"
>  starting with: A <- B <- C (as before)
>  qemu-img commit -d C
>  ending with: A <- B, B contains point1..point3+; C invalid the moment
> something is written to B
> 
> 3: merge data from the active layer to a specified backing (typically
> more than one file) to really clean up a chain. Useful for saying "I
> want to keep current state but discard all intermediate points in time".
>  When crossing multiple backing files, you WILL cause corruption to the
> intermediate files.  Therefore, this mode implies -d, and there is no
> point wasting time in clearing out sectors in the discarded part of the
> chain.
>  starting with: A <- B <- C (as before)
>  qemu-img commit -b A C
>  ending with: A; A contains point3+, B is immediately invalid, C is
> invalid the moment something else is written to A
> 
> 4: offline-only: note that 'qemu-img commit' can ONLY commit the changes
> of the filename argument, while QMP can commit any arbitrary point in a
> chain (note that there is a difference between active commit and
> intermediate commit). But qemu-img can be used to do intermediate
> commit, by virtue of the fact that since the image is not in active use,
> you can specify any intermediate file in the chain as the point to
> commit, coupled with rewriting the backing file of the rest of the chain
> to match. Requires two qemu-img commands, so might be worth adding sugar
> to do it in one command the way QMP can do, if desired.
>  starting with: A <- B <- C (as before)
>  qemu-img commit -d B
>  qemu-img rebase -u A C
>  ending with: A <- C; A contains point2, C contains point2+
> 
> [Proof that committing to more than the immediate backing file was in
> the thread on converting qemu-img to use QMP; although I'm somewhat
> repeating that argument down below]
> 
> > 
> > Um.  I don't think we understood each other.
> > 
> > I experimented with the "non-live" HMP commit command.  This
> > one effectively empties everything in the overlay file,
> > propagating it to the backing file, but keeps the (now
> > empty) overlay.  So from the stacking perspective nothing
> > has changed.  Yet, together with with propagation, it also
> > modifies the overlay file headers and writes a new name
> > of the backing file -- the one it currently uses, which,
> > in my case, is virtual /dev/fdset/foo.  It should keep
> > the original file name in there, such as win.raw, unless
> > explicitly asked to write a different name there.
> 
> Bug #1 - HMP doesn't use QMP commit - and therefore it can only commit
> the active layer. I argued above that qemu-img has the same limitation,
> except that qemu-img can be used anywhere in the chain by using multiple
> qemu-img commands
> 
> Bug #2 - HMP changes the backing file when it shouldn't (active commit
> should never alter backing information if the active file doesn't change)
> 
> > 
> > If the stack chain were to be modified by commit command,
> > yes, the new name should be recorded ofcourse, such as
> > after rebase.  But since stack chain is not modified,
> > filename should not be modified either.
> > 
> >>> When performing commit, does qemu mark the areas in the
> >>> overlay file as free after writing contents to the backing
> >>> file, or will these areas be written again by a subsequent
> >>> commit?  Somehow it smells like each next commit writes
> >>> more and more data and completes in more and more time.
> >>
> >> With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
> >> Other image formats keep the copy.
> > 
> > Hm.  It is discarded, but the file isn't shrinked.  With "non-live"
> > commit I don't see a reason why it can't be shrinked too?
> 
> Oh, you pointed out another difference.  I was going to say that QMP
> active commit followed by aborting the transaction keeps the active
> layer unchanged, but QMP does not yet expose a way to empty the active
> layer.  That is, starting with these file contents (where - implies
> deferring to the backing file):
> B 1111
> C --22
> 
> then qemu-img commit C leads to:
> B 1122
> C ----
> 
> while QMP active commit/abort leads to:
> B 1122
> C --22
> 
> at least both qemu-img commit -d C and QMP active commit/complete lead to:
> B 1122
> C useless
> 
> So we potentially have:
> 
> Bug #3 - QMP does not yet expose a way to empty the active layer
> 
> and if that is fixed, we need to be careful to not introduce:
> 
> Bug #4 - QMP must not allow emptying the active layer if the active
> commit crosses more than one backing file but backing chain is not rewritten
> 
> To see why, you need merely understand why qemu-img -b implies -d:
>      contents   guest would see
>    A 1111       1111
> A<-B -22-       1221
> B<-C --33       1233
> 
> then commit C into A. If C is left untouched:
>      contents   guest would see
>    A 1233       1233
> A<-B -22-       1223 # Oops, B is corrupted
> B<-C --33       1233
> 
> and if C is emptied:
>      contents   guest would see
>    A 1233       1233
> A<-B -22-       1223 # Oops, B is corrupted
> B<-C ----       1223 # Oops, C is corrupted
> 
> Basically, once a commit crosses more than one file, all intermediate
> files are useless and might as well be discarded. But you've pointed out
> that by rewriting the backing file of C, we CAN make C still be
> consistent and tracking the change since the commit:
>

Currently, when we do a commit we drop in the chain all the
invalidated intermediate images, and the committed image as well
(which is what introduced the bdrv_swap craziness):

[base] <--- [snapA] <--- [snapB] <--- [snapC]

Committing snapB down to the base:

[base] <--- [snapC]  

snapB and snapA are discarded.

In the active layer commit, the 'base' that is the recipient of data
becomes the new active layer, and we drop all the overlays above it.

If we allow emptying images, we need to either A) empty all images
that would have otherwise been dropped, or B) empty the current active 
layer, and drop the intermediates.

At first blush, have empty intermediates makes no sense.  But if we
consider multi-parent chains, as can be introduced with blockdev-add,
perhaps it might:

                                                 /-- [snapE]
                                                /
[base] <--- [snapA] <--- [snapB] <--- [snapC] <----- [snapD]


Say, for performance or cleanup reasons, we want to push snapC into
base.  This action invalidates neither snapE or snapD, in theory.

However, in current practice, we drop snapC, snapB, and snapA from
the chain. Then either snapE or snapD is now orphaned or worse,
depending from which "perspective" the block-commit was done.  But
if we just empty snapC, then everything automagically works even in
multi-parent chains:

                       /-- [snapE]
                      /
[base] <---  [snapC] <---- [snapD]
             (empty)

So I think it makes sense to provide an option even for the non-active
layer block commit case to empty the topmost committed overlay, while
dropping the other intermediates.

>      contents   guest would see
>    A 1233       1233
> A<-C ----       1233
> 
> Bug #5 - QMP commit doesn't offer a way to rewrite chain when committing
> across multiple images
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-03  4:28                       ` Jeff Cody
@ 2015-04-03 19:49                         ` Eric Blake
  2015-04-03 19:57                           ` Jeff Cody
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2015-04-03 19:49 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, Michael Tokarev, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2912 bytes --]

On 04/02/2015 10:28 PM, Jeff Cody wrote:

>>
>> Basically, once a commit crosses more than one file, all intermediate
>> files are useless and might as well be discarded.

That's if you do a job-complete operation.  But if you do a job-abort
operation, the chain is left intact.  What we should probably add is a
way to do a job-abort operation that leaves the active file intact, but
which also simultaneously rewrites the backing file of the active image
to point back to the base image and skip over the intermediate files
that are now broken.

>>     But you've pointed out
>> that by rewriting the backing file of C, we CAN make C still be
>> consistent and tracking the change since the commit:
>>
> 
> Currently, when we do a commit we drop in the chain all the
> invalidated intermediate images, and the committed image as well
> (which is what introduced the bdrv_swap craziness):
> 
> [base] <--- [snapA] <--- [snapB] <--- [snapC]
> 
> Committing snapB down to the base:
> 
> [base] <--- [snapC]  
> 
> snapB and snapA are discarded.

Are we actually changing the backing file metadata of snapC when doing
this?  And if so, can management applications control the text being
written (so that it is absolute or relative as desired)?

> 
> In the active layer commit, the 'base' that is the recipient of data
> becomes the new active layer, and we drop all the overlays above it.
> 
> If we allow emptying images, we need to either A) empty all images
> that would have otherwise been dropped, or B) empty the current active 
> layer, and drop the intermediates.
> 
> At first blush, have empty intermediates makes no sense.  But if we
> consider multi-parent chains, as can be introduced with blockdev-add,
> perhaps it might:
> 
>                                                  /-- [snapE]
>                                                 /
> [base] <--- [snapA] <--- [snapB] <--- [snapC] <----- [snapD]
> 
> 
> Say, for performance or cleanup reasons, we want to push snapC into
> base.  This action invalidates neither snapE or snapD, in theory.
> 
> However, in current practice, we drop snapC, snapB, and snapA from
> the chain. Then either snapE or snapD is now orphaned or worse,
> depending from which "perspective" the block-commit was done.  But
> if we just empty snapC, then everything automagically works even in
> multi-parent chains:
> 
>                        /-- [snapE]
>                       /
> [base] <---  [snapC] <---- [snapD]
>              (empty)
> 
> So I think it makes sense to provide an option even for the non-active
> layer block commit case to empty the topmost committed overlay, while
> dropping the other intermediates.

At any rate, I'm glad I've got you thinking about it.

-- 
-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-03 19:49                         ` Eric Blake
@ 2015-04-03 19:57                           ` Jeff Cody
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Cody @ 2015-04-03 19:57 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Michael Tokarev, qemu-devel, qemu-block

On Fri, Apr 03, 2015 at 01:49:01PM -0600, Eric Blake wrote:
> On 04/02/2015 10:28 PM, Jeff Cody wrote:
> 
> >>
> >> Basically, once a commit crosses more than one file, all intermediate
> >> files are useless and might as well be discarded.
> 
> That's if you do a job-complete operation.  But if you do a job-abort
> operation, the chain is left intact.  What we should probably add is a
> way to do a job-abort operation that leaves the active file intact, but
> which also simultaneously rewrites the backing file of the active image
> to point back to the base image and skip over the intermediate files
> that are now broken.
> 
> >>     But you've pointed out
> >> that by rewriting the backing file of C, we CAN make C still be
> >> consistent and tracking the change since the commit:
> >>
> > 
> > Currently, when we do a commit we drop in the chain all the
> > invalidated intermediate images, and the committed image as well
> > (which is what introduced the bdrv_swap craziness):
> > 
> > [base] <--- [snapA] <--- [snapB] <--- [snapC]
> > 
> > Committing snapB down to the base:
> > 
> > [base] <--- [snapC]  
> > 
> > snapB and snapA are discarded.
> 
> Are we actually changing the backing file metadata of snapC when doing
> this?  And if so, can management applications control the text being
> written (so that it is absolute or relative as desired)?
> 

Yes, and yes: in the QMP block-commit command, there is the optional
argument "backing-file".  If provided, that exact string is written
into snapC as the backing filename.  If not provided, then we use the
"filename" member of the BDS for 'base' (e.g. base_bs->filename);


> > 
> > In the active layer commit, the 'base' that is the recipient of data
> > becomes the new active layer, and we drop all the overlays above it.
> > 
> > If we allow emptying images, we need to either A) empty all images
> > that would have otherwise been dropped, or B) empty the current active 
> > layer, and drop the intermediates.
> > 
> > At first blush, have empty intermediates makes no sense.  But if we
> > consider multi-parent chains, as can be introduced with blockdev-add,
> > perhaps it might:
> > 
> >                                                  /-- [snapE]
> >                                                 /
> > [base] <--- [snapA] <--- [snapB] <--- [snapC] <----- [snapD]
> > 
> > 
> > Say, for performance or cleanup reasons, we want to push snapC into
> > base.  This action invalidates neither snapE or snapD, in theory.
> > 
> > However, in current practice, we drop snapC, snapB, and snapA from
> > the chain. Then either snapE or snapD is now orphaned or worse,
> > depending from which "perspective" the block-commit was done.  But
> > if we just empty snapC, then everything automagically works even in
> > multi-parent chains:
> > 
> >                        /-- [snapE]
> >                       /
> > [base] <---  [snapC] <---- [snapD]
> >              (empty)
> > 
> > So I think it makes sense to provide an option even for the non-active
> > layer block commit case to empty the topmost committed overlay, while
> > dropping the other intermediates.
> 
> At any rate, I'm glad I've got you thinking about it.
> 
> -- 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-02 13:19                     ` Kevin Wolf
@ 2015-04-06 15:37                       ` Michael Tokarev
  2015-04-07  9:24                         ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Tokarev @ 2015-04-06 15:37 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel, qemu-block

02.04.2015 16:19, Kevin Wolf wrote:
> Am 02.04.2015 um 14:04 hat Michael Tokarev geschrieben:
>> 02.04.2015 14:24, Kevin Wolf wrote:
>> []
>>>> But overall, I think qemu-system should not modify backing
>>>> file name in this case.
>>>
>>> So you would leave the backing file with the data that you just
>>> committed down one level in your backing file chain? Wouldn't that
>>> defeat the whole purpose of committing?
>>
>> Um.  I don't think we understood each other.
>>
>> I experimented with the "non-live" HMP commit command.  This
>> one effectively empties everything in the overlay file,
>> propagating it to the backing file, but keeps the (now
>> empty) overlay.  So from the stacking perspective nothing
>> has changed.  Yet, together with with propagation, it also
>> modifies the overlay file headers and writes a new name
>> of the backing file -- the one it currently uses, which,
>> in my case, is virtual /dev/fdset/foo.  It should keep
>> the original file name in there, such as win.raw, unless
>> explicitly asked to write a different name there.
>>
>> If the stack chain were to be modified by commit command,
>> yes, the new name should be recorded ofcourse, such as
>> after rebase.  But since stack chain is not modified,
>> filename should not be modified either.
> 
> For the record, we discussed this on IRC:
> 
> Yes, we did talk past each other because HMP commit isn't supposed to
> touch the backing file name at all, so I didn't expect such behaviour,
> yet Michael saw it.
> 
> The reason is a bug in qcow2_update_header(). Instead of rewriting the
> same value as is already in the image, it writes bs->backing_file to the
> image. This was always the same as long as you couldn't override the
> backing file name on the command line or in blockdev-add, but now it's
> obviously wrong.
> 
> It would also rewrite the backing file name on other occasions such as
> marking the image dirty with lazy_refcounts=on (i.e. on the first
> write request).

Kevin, did you have a chance to fix this prob (for 2.3)?
2.1 version does the right thing here.

>>>> When performing commit, does qemu mark the areas in the
>>>> overlay file as free after writing contents to the backing
>>>> file, or will these areas be written again by a subsequent
>>>> commit?  Somehow it smells like each next commit writes
>>>> more and more data and completes in more and more time.
>>>
>>> With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
>>> Other image formats keep the copy.
>>
>> Hm.  It is discarded, but the file isn't shrinked.  With "non-live"
>> commit I don't see a reason why it can't be shrinked too?
> 
> This would be a bug as well, but Michael double-checked and it does
> shrink in fact.

Actually it WAS a bug.  Initially I tested with 2.1 version,
and there, neither `commit' HMP command nor `qemu-img commit'
command shrinks the overlay image, so each new commit re-writes
both the new data AND the old data, and never shrinks.  With
2.2 version it works as expected, except of the above problem
with rewriting the base file name.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-03  3:59                   ` Jeff Cody
@ 2015-04-07  9:18                     ` Kevin Wolf
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2015-04-07  9:18 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Michael Tokarev, qemu-devel, qemu-block

Am 03.04.2015 um 05:59 hat Jeff Cody geschrieben:
> On Thu, Apr 02, 2015 at 01:24:02PM +0200, Kevin Wolf wrote:
> > Am 02.04.2015 um 12:58 hat Michael Tokarev geschrieben:
> > > When performing commit, does qemu mark the areas in the
> > > overlay file as free after writing contents to the backing
> > > file, or will these areas be written again by a subsequent
> > > commit?  Somehow it smells like each next commit writes
> > > more and more data and completes in more and more time.
> > 
> > With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
> > Other image formats keep the copy.
> > 
> > Live commit always keeps the data in the parent image, but it drops that
> > image from the backing file chain, so effectively it is dropped as well
> > (but you need to remove the file manually to reclaim the space).
> > 
> > Jeff, this sounds like we need to make some changes to unify this. That
> > might mean introducing an option that decides whether an image should be
> > dropped from the chain or emptied.
> 
> Yes, I think this would be a good idea.  I can think of scenarios
> where we would want to keep the original overlay around (i.e., we
> still want a snapshot overlay, but just want to consolidate data).
> But I can't think of any reason we would want to keep a stale
> populated overlay file around.

Well, I can think of a reason, I just don't know if it's relevant in
practice. Imagine a remote backing file where the local file serves
mostly as a cache, perhaps with copy-on-read enabled. Dropping all the
local content just to copy it again on the next access wouldn't be very
useful.

> > Once live commit can provide the same
> > as HMP commit, we should change HMP commit to reuse the live commit
> > code.
> > 
> 
> Hmm. My concern here is there may be times we want the behavior that
> HMP commit provides - faster offline active layer commit, rather than
> a mirror-like operation.  I guess if we do want that behavior,
> however, we could always introduce it as an option to block-commit.
> So yes, let's provide an option to remove or empty the committed
> overlays (applies to all overlays, if there are multiple overlays
> being committed in the chain), and then have HMP commit reuse the live
> commit code.

HMP commit should behave similar enough to vm_stop(), start commit block
job, nested event loop until commit completes, vm_start().

Except maybe if there is already a block job running. You wouldn't be
able to commit then until we support multiple block jobs per device.
(And once we do, you probably want to pause all of the block jobs while
committing to retain the original HMP behaviour.)

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] block-commit & dropping privs
  2015-04-06 15:37                       ` Michael Tokarev
@ 2015-04-07  9:24                         ` Kevin Wolf
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2015-04-07  9:24 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: jcody, qemu-devel, qemu-block

Am 06.04.2015 um 17:37 hat Michael Tokarev geschrieben:
> 02.04.2015 16:19, Kevin Wolf wrote:
> > Am 02.04.2015 um 14:04 hat Michael Tokarev geschrieben:
> >> 02.04.2015 14:24, Kevin Wolf wrote:
> >> []
> >>>> But overall, I think qemu-system should not modify backing
> >>>> file name in this case.
> >>>
> >>> So you would leave the backing file with the data that you just
> >>> committed down one level in your backing file chain? Wouldn't that
> >>> defeat the whole purpose of committing?
> >>
> >> Um.  I don't think we understood each other.
> >>
> >> I experimented with the "non-live" HMP commit command.  This
> >> one effectively empties everything in the overlay file,
> >> propagating it to the backing file, but keeps the (now
> >> empty) overlay.  So from the stacking perspective nothing
> >> has changed.  Yet, together with with propagation, it also
> >> modifies the overlay file headers and writes a new name
> >> of the backing file -- the one it currently uses, which,
> >> in my case, is virtual /dev/fdset/foo.  It should keep
> >> the original file name in there, such as win.raw, unless
> >> explicitly asked to write a different name there.
> >>
> >> If the stack chain were to be modified by commit command,
> >> yes, the new name should be recorded ofcourse, such as
> >> after rebase.  But since stack chain is not modified,
> >> filename should not be modified either.
> > 
> > For the record, we discussed this on IRC:
> > 
> > Yes, we did talk past each other because HMP commit isn't supposed to
> > touch the backing file name at all, so I didn't expect such behaviour,
> > yet Michael saw it.
> > 
> > The reason is a bug in qcow2_update_header(). Instead of rewriting the
> > same value as is already in the image, it writes bs->backing_file to the
> > image. This was always the same as long as you couldn't override the
> > backing file name on the command line or in blockdev-add, but now it's
> > obviously wrong.
> > 
> > It would also rewrite the backing file name on other occasions such as
> > marking the image dirty with lazy_refcounts=on (i.e. on the first
> > write request).
> 
> Kevin, did you have a chance to fix this prob (for 2.3)?
> 2.1 version does the right thing here.

I started working on it on Thursday, but as both Good Friday and Easter
Monday are public holidays in Germany, it's not ready yet. I hope to get
it ready in time for -rc3.

2.1 does the "right" thing because it doesn't support bdrv_make_empty()
for qcow2 yet, which is the function that triggers the buggy header
rewrite. But without it, your image can't shrink.

> >>>> When performing commit, does qemu mark the areas in the
> >>>> overlay file as free after writing contents to the backing
> >>>> file, or will these areas be written again by a subsequent
> >>>> commit?  Somehow it smells like each next commit writes
> >>>> more and more data and completes in more and more time.
> >>>
> >>> With qcow2 and qcow, the committed data is discarded with HMP 'commit'.
> >>> Other image formats keep the copy.
> >>
> >> Hm.  It is discarded, but the file isn't shrinked.  With "non-live"
> >> commit I don't see a reason why it can't be shrinked too?
> > 
> > This would be a bug as well, but Michael double-checked and it does
> > shrink in fact.
> 
> Actually it WAS a bug.  Initially I tested with 2.1 version,
> and there, neither `commit' HMP command nor `qemu-img commit'
> command shrinks the overlay image, so each new commit re-writes
> both the new data AND the old data, and never shrinks.  With
> 2.2 version it works as expected, except of the above problem
> with rewriting the base file name.

I would call it a new feature rather than a bug fix, but yes, that's new
in 2.2 (see above).

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-04-07  9:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27  9:07 [Qemu-devel] block-commit & dropping privs Michael Tokarev
2015-03-27 14:49 ` Eric Blake
2015-03-27 15:36   ` Michael Tokarev
2015-03-27 17:12     ` Eric Blake
2015-03-30 15:36       ` Kevin Wolf
2015-04-01  9:26         ` Michael Tokarev
2015-04-01  9:54           ` Michael Tokarev
2015-04-01 12:34             ` Kevin Wolf
2015-04-02 10:58               ` Michael Tokarev
2015-04-02 11:24                 ` Kevin Wolf
2015-04-02 12:04                   ` Michael Tokarev
2015-04-02 13:07                     ` Eric Blake
2015-04-03  4:28                       ` Jeff Cody
2015-04-03 19:49                         ` Eric Blake
2015-04-03 19:57                           ` Jeff Cody
2015-04-02 13:19                     ` Kevin Wolf
2015-04-06 15:37                       ` Michael Tokarev
2015-04-07  9:24                         ` Kevin Wolf
2015-04-03  3:59                   ` Jeff Cody
2015-04-07  9:18                     ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.