[PATCH 0/6] migration: bring savevm/loadvm/delvm over to QMP

* [PATCH 0/6] migration: bring savevm/loadvm/delvm over to QMP
@ 2020-07-02 17:57 Daniel P. Berrangé
  2020-07-02 17:57 ` [PATCH 1/6] migration: improve error reporting of block driver state name Daniel P. Berrangé
                   ` (8 more replies)
  0 siblings, 9 replies; 47+ messages in thread
From: Daniel P. Berrangé @ 2020-07-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	qemu-block, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Pavel Dovgalyuk, Paolo Bonzini, Max Reitz

When QMP was first introduced some 10+ years ago now, the snapshot
related commands (savevm/loadvm/delvm) were not converted. This was
primarily because their implementation causes blocking of the thread
running the monitor commands. This was (and still is) considered
undesirable behaviour both in HMP and QMP.

In theory someone was supposed to fix this flaw at some point in the
past 10 years and bring them into the QMP world. Sadly, thus far it
hasn't happened as people always had more important things to work
on. Enterprise apps were much more interested in external snapshots
than internal snapshots as they have many more features.

Meanwhile users still want to use internal snapshots as there is
a certainly simplicity in having everything self-contained in one
image, even though it has limitations. Thus the apps that end up
executing the savevm/loadvm/delvm via the "human-monitor-command"
QMP command.

IOW, the problematic blocking behaviour that was one of the reasons
for not having savevm/loadvm/delvm in QMP is experienced by applications
regardless. By not portting the commands to QMP due to one design flaw,
we've forced apps and users to suffer from other design flaws of HMP (
bad error reporting, strong type checking of args, no introspection) for
an additional 10 years. This feels rather sub-optimal :-(

In practice users don't appear to care strongly about the fact that these
commands block the VM while they run. I might have seen one bug report
about it, but it certainly isn't something that comes up as a frequent
topic except among us QEMU maintainers. Users do care about having
access to the snapshot feature.

Where I am seeing frequent complaints is wrt the use of OVMF combined
with snapshots which has some serious pain points. This is getting worse
as the push to ditch legacy BIOS in favour of UEFI gain momentum both
across OS vendors and mgmt apps. Solving it requires new parameters to
the commands, but doing this in HMP is super unappealing.

After 10 years, I think it is time for us to be a little pragmatic about
our handling of snapshots commands. My desire is that libvirt should never
use "human-monitor-command" under any circumstances, because of the
inherant flaws in HMP as a protocol for machine consumption. If there
are flaws in QMP commands that's fine. If we fix them in future, we can
deprecate the current QMP commands and remove them not too long after,
without being locked in forever.

Thus in this series I'm proposing a direct 1-1 mapping of the existing
HMP commands for savevm/loadvm/delvm into QMP as a first step. This does
not solve the blocking thread problem, but it does eliminate the error
reporting, type checking and introspection problems inherant to HMP.
We're winning on 3 out of the 4 long term problems.

If someone can suggest a easy way to fix the thread blocking problem
too, I'd be interested to hear it. If it involves a major refactoring
then I think user are better served by unlocking what look like easy
wins today.

With a QMP variant, we reasonably deal with the problems related to OVMF:

 - The logic to pick which disk to store the vmstate in is not
   satsifactory.

   The first block driver state cannot be assumed to be the root disk
   image, it might be OVMF varstore and we don't want to store vmstate
   in there.

 - The logic to decide which disks must be snapshotted is hardwired
   to all disks which are writable

   Again with OVMF there might be a writable varstore, but this can be
   raw rather than qcow2 format, and thus unable to be snapshotted.
   While users might wish to snapshot their varstore, in some/many/most
   cases it is entirely uneccessary. Users are blocked from snapshotting
   their VM though due to this varstore.

These are solved by adding two parameters to the commands. The first is
a block device node name that identifies the image to store vmstate in,
and the second is a list of node names to exclude from snapshots.

In the block code I've only dealt with node names for block devices, as
IIUC, this is all that libvirt should need in the -blockdev world it now
lives in. IOW, I've made not attempt to cope with people wanting to use
these QMP commands in combination with -drive args.

I've done some minimal work in libvirt to start to make use of the new
commands to validate their functionality, but this isn't finished yet.

My ultimate goal is to make the GNOME Boxes maintainer happy again by
having internal snapshots work with OVMF:

  https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
f45c5f64048f16a6e

Daniel P. Berrang=C3=A9 (6):
  migration: improve error reporting of block driver state name
  migration: introduce savevm, loadvm, delvm QMP commands
  block: add ability to filter out blockdevs during snapshot
  block: allow specifying name of block device for vmstate storage
  migration: support excluding block devs in QMP snapshot commands
  migration: support picking vmstate disk in QMP snapshot commands

 block/monitor/block-hmp-cmds.c |  4 +-
 block/snapshot.c               | 68 +++++++++++++++++++------
 include/block/snapshot.h       | 21 +++++---
 include/migration/snapshot.h   | 10 +++-
 migration/savevm.c             | 71 +++++++++++++++++++-------
 monitor/hmp-cmds.c             | 20 ++------
 qapi/migration.json            | 91 ++++++++++++++++++++++++++++++++++
 replay/replay-snapshot.c       |  4 +-
 softmmu/vl.c                   |  2 +-
 9 files changed, 228 insertions(+), 63 deletions(-)

--=20
2.26.2

^ permalink raw reply	[flat|nested] 47+ messages in thread