On 01/10/2018 10:19 AM, Richard Palethorpe wrote:
> Hello Eric & Peter,
> 
> Eric Blake writes:
> 
>> On 01/07/2018 06:23 AM, Richard Palethorpe wrote:
>>> Add QAPI wrapper functions for the existing snapshot functionality. These
>>> functions behave the same way as the HMP savevm, loadvm and delvm
>>> commands. This will allow applications, such as OpenQA, to programmatically
>>> revert the VM to a previous state with no dependence on HMP or qemu-img.
>>
>> That's already possible; libvirt uses QMP's human-monitor-command to
>> access these HMP commands programmatically.
> 
> That is what we are currently doing and is an improvement over
> programatically using HMP, but I wanted to improve upon
> that. Occasionally saving or loading snapshots fails and it is not clear
> why.

Straightforward mapping of the existing HMP commands into QMP without
any thought about the design won't make the errors any clearer. My
argument is that any QMP design for managing internal snapshots must be
well-designed, but that since we discourage internal snapshots, no one
has been actively working on that design.

> 
>>
>> We've had discussions in the past about what it would take to have
>> specific QMP commands for these operations; the biggest problem is that
>> these commands promote the use of internal snapshots, and there are
>> enough performance and other issues with internal snapshots that we are
>> not yet ready to commit to a long-term interface for making their use
>> easier.  At this point, our recommendation is to prefer external
>> snapshots.
> 
> I don't think there are any issues with using external snapshots for the
> use case I have in mind, so long as they can contain the CPU & RAM
> state. All that is really required is a good clean way of reverting to a
> previous state (where the VM is running). I don't see in the
> documentation or code that blockdev-snapshot-* or any other command can
> save the CPU & RAM state to a file then load it along with a storage
> snapshot?

The trick is to time a blockdev external snapshot with a
migration-to-file as your point-in-time live snapshot, then rollback
means reverting back to the right disk state before starting via
incoming migration from that saved CPU state.

Libvirt has managed to combine blockdev-snapshot (external snapshots)
plus migration to file in order to properly save CPU + RAM state to a
combination of files (admittedly, it is more files to manage than what
you get with a single internal snapshot that captures CPU + RAM + disk
state all into a single qcow2 image, but the building blocks allow more
flexibility).  What libvirt has not done well (yet) is the ability to
roll back to a particular live snapshot, and has ended up documenting
hacks that a user has to manually massage their domain XML to point back
to the correct disk image when restoring state from a given CPU state.
But again, all the pieces are there, and it's more a matter of just
wiring them up correctly.

> 
> Perhaps instead we could use migrate with exec or fd URIs as Peter
> suggests. So we could do a differential migration to a file, then
> continue until the guest enters a bad state, then do an incoming
> migration to go back to the file.
> 
> I don't think that is ideal however, especially from a useability point
> of view. Probably many users approaching the problem will just choose to
> use loadvm/savevm.

As long as a management app hides the complexity, external disk
snapshots coupled with external CPU state saved by migration-to-file is
just as featured as internal snapshots.  Yes, it's not as convenient at
the low level as loadvm/savevm using internal snapshots, but it is more
powerful, better tested, and less error prone.


> 
> Savevm and loadvm are actually very easy to use in our particular use
> case. However using a transaction comprised of lower-level commands
> should not be too difficult either. Although for the common case it
> might be a good idea to have a simpler command which wraps the
> transaction. Whether it defaults to an internal location or requires an
> external file path is not a problem for me at least. The issue seems to
> be just adding a QMP command to take a snapshot of the CPU state,

That exists, in the form of 'migrate'.

> similar to the blockdev-snapshot commands, and allowing it to be part of
> a transaction.

Having 'migrate' be part of a block transaction is not implemented, but
given that libvirt has already figured out how to time the combination
of the two commands (block snapshots + migration to file) to get a
consistent snapshot, I'm not sure that there is any urgent need to
improve 'migrate' to work inside 'transaction'.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org