All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
@ 2020-07-27 15:08 Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 1/6] migration: improve error reporting of block driver state name Daniel P. Berrangé
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

A followup to:

 v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html

When QMP was first introduced some 10+ years ago now, the snapshot
related commands (savevm/loadvm/delvm) were not converted. This was
primarily because their implementation causes blocking of the thread
running the monitor commands. This was (and still is) considered
undesirable behaviour both in HMP and QMP.

In theory someone was supposed to fix this flaw at some point in the
past 10 years and bring them into the QMP world. Sadly, thus far it
hasn't happened as people always had more important things to work
on. Enterprise apps were much more interested in external snapshots
than internal snapshots as they have many more features.

Meanwhile users still want to use internal snapshots as there is
a certainly simplicity in having everything self-contained in one
image, even though it has limitations. Thus the apps that end up
executing the savevm/loadvm/delvm via the "human-monitor-command"
QMP command.

IOW, the problematic blocking behaviour that was one of the reasons
for not having savevm/loadvm/delvm in QMP is experienced by applications
regardless. By not portting the commands to QMP due to one design flaw,
we've forced apps and users to suffer from other design flaws of HMP (
bad error reporting, strong type checking of args, no introspection) for
an additional 10 years. This feels rather sub-optimal :-(

In practice users don't appear to care strongly about the fact that these
commands block the VM while they run. I might have seen one bug report
about it, but it certainly isn't something that comes up as a frequent
topic except among us QEMU maintainers. Users do care about having
access to the snapshot feature.

Where I am seeing frequent complaints is wrt the use of OVMF combined
with snapshots which has some serious pain points. This is getting worse
as the push to ditch legacy BIOS in favour of UEFI gain momentum both
across OS vendors and mgmt apps. Solving it requires new parameters to
the commands, but doing this in HMP is super unappealing.

After 10 years, I think it is time for us to be a little pragmatic about
our handling of snapshots commands. My desire is that libvirt should never
use "human-monitor-command" under any circumstances, because of the
inherant flaws in HMP as a protocol for machine consumption.

Thus in this series I'm proposing a fairly direct mapping of the existing
HMP commands for savevm/loadvm/delvm into QMP as a first step. This does
not solve the blocking thread problem, but it does put in a place a
design using the jobs framework which can facilitate solving it later.
It does also solve the error reporting, type checking and introspection
problems inherant to HMP. So we're winning on 3 out of the 4 problems,
and pushed apps to a QMP design that will let us solve the last
remaining problem.

With a QMP variant, we reasonably deal with the problems related to OVMF:

 - The logic to pick which disk to store the vmstate in is not
   satsifactory.

   The first block driver state cannot be assumed to be the root disk
   image, it might be OVMF varstore and we don't want to store vmstate
   in there.

 - The logic to decide which disks must be snapshotted is hardwired
   to all disks which are writable

   Again with OVMF there might be a writable varstore, but this can be
   raw rather than qcow2 format, and thus unable to be snapshotted.
   While users might wish to snapshot their varstore, in some/many/most
   cases it is entirely uneccessary. Users are blocked from snapshotting
   their VM though due to this varstore.

These are solved by adding two parameters to the commands. The first is
a block device node name that identifies the image to store vmstate in,
and the second is a list of node names to include for the snapshots.
If the list of nodes isn't given, it falls back to the historical
behaviour of using all disks matching some undocumented criteria.

In the block code I've only dealt with node names for block devices, as
IIUC, this is all that libvirt should need in the -blockdev world it now
lives in. IOW, I've made not attempt to cope with people wanting to use
these QMP commands in combination with -drive args.

I've done some minimal work in libvirt to start to make use of the new
commands to validate their functionality, but this isn't finished yet.

My ultimate goal is to make the GNOME Boxes maintainer happy again by
having internal snapshots work with OVMF:

  https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
f45c5f64048f16a6e

HELP NEEDED:  this series starts to implement the approach that Kevin
suggested wrto use of generic jobs.

When I try to actually run the code though it crashes:

ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth=
read: assertion failed: (qemu_mutex_iothread_locked())
Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u=
nlock_iothread: assertion failed: (qemu_mutex_iothread_locked())

Obviously I've missed something related to locking, but I've no idea
what, so I'm sending this v2 simply as a way to solicit suggestions
for what I've messed up.

You can reproduce with I/O tests using "check -qcow2 310"  and it
gave a stack:

  Thread 5 (Thread 0x7fffe6e4c700 (LWP 3399011)):
  #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x5555566a9fd8) at ../sysdeps/nptl/futex-internal.h:183
  #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555556227160 <qemu_global_mutex>, cond=0x5555566a9fb0) at pthread_cond_wait.c:508
  #2  __pthread_cond_wait (cond=cond@entry=0x5555566a9fb0, mutex=mutex@entry=0x555556227160 <qemu_global_mutex>) at pthread_cond_wait.c:638
  #3  0x0000555555ceb6cb in qemu_cond_wait_impl (cond=0x5555566a9fb0, mutex=0x555556227160 <qemu_global_mutex>, file=0x555555d44198 "/home/berrange/src/virt/qemu/softmmu/cpus.c", line=1145) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:174
  #4  0x0000555555931974 in qemu_wait_io_event (cpu=cpu@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1145
  #5  0x0000555555933a89 in qemu_dummy_cpu_thread_fn (arg=arg@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1241
  #6  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe6e476f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
  #7  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
  #8  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  
  Thread 4 (Thread 0x7fffe764d700 (LWP 3399010)):
  #0  0x00007ffff4effb6f in __GI___poll (fds=0x7fffdc006ec0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
  #1  0x00007ffff7c1aace in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
  #2  0x00007ffff7c1ae53 in g_main_loop_run () at /lib64/libglib-2.0.so.0
  #3  0x00005555559a9d81 in iothread_run (opaque=opaque@entry=0x55555632f200) at /home/berrange/src/virt/qemu/iothread.c:82
  #4  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe76486f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
  #5  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
  #6  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  
  Thread 3 (Thread 0x7fffe7e4e700 (LWP 3399009)):
  #0  0x00007ffff4fe5c58 in futex_abstimed_wait_cancelable (private=0, abstime=0x7fffe7e49650, clockid=0, expected=0, futex_word=0x5555562bf888) at ../sysdeps/nptl/futex-internal.h:320
  #1  do_futex_wait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:112
  #2  0x00007ffff4fe5d83 in __new_sem_wait_slow (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:184
  #3  0x00007ffff4fe5e12 in sem_timedwait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650) at sem_timedwait.c:40
  #4  0x0000555555cebbdf in qemu_sem_timedwait (sem=sem@entry=0x5555562bf888, ms=ms@entry=10000) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:307
  #5  0x0000555555d03fa4 in worker_thread (opaque=opaque@entry=0x5555562bf810) at /home/berrange/src/virt/qemu/util/thread-pool.c:91
  #6  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe7e496f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
  #7  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
  #8  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  
  Thread 2 (Thread 0x7fffe8750700 (LWP 3399008)):
  #0  0x00007ffff4ed1871 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffe874b670, rem=0x7fffe874b680) at ../sysdeps/unix/sysv/linux/--Type <RET> for more, q to quit, c to continue without paging--
  clock_nanosleep.c:48
  #1  0x00007ffff4ed71c7 in __GI___nanosleep (requested_time=<optimized out>, remaining=<optimized out>) at nanosleep.c:27
  #2  0x00007ffff7c462f7 in g_usleep () at /lib64/libglib-2.0.so.0
  #3  0x0000555555cf3fd0 in call_rcu_thread (opaque=opaque@entry=0x0) at /home/berrange/src/virt/qemu/util/rcu.c:250
  #4  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe874b6f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
  #5  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
  #6  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  
  Thread 1 (Thread 0x7fffe88abec0 (LWP 3398996)):
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffff4e2e895 in __GI_abort () at abort.c:79
  #2  0x00007ffff7be5b8c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0
  #3  0x00007ffff7c43a1f in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
  #4  0x0000555555932da0 in qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1788
  #5  qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1786
  #6  0x0000555555cfeceb in os_host_main_loop_wait (timeout=26359275747000) at /home/berrange/src/virt/qemu/util/main-loop.c:232
  #7  main_loop_wait (nonblocking=nonblocking@entry=0) at /home/berrange/src/virt/qemu/util/main-loop.c:516
  #8  0x0000555555941f27 in qemu_main_loop () at /home/berrange/src/virt/qemu/softmmu/vl.c:1676
  #9  0x000055555581a18e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/berrange/src/virt/qemu/softmmu/main.c:49
  (gdb) 


Changed in v2:

 - Use new command names "snapshot-{load,save,delete}" to make it
   clear that these are different from the "savevm|loadvm|delvm"
   as they use the Job framework

 - Use an include list for block devs, not an exclude list

Daniel P. Berrang=C3=A9 (6):
  migration: improve error reporting of block driver state name
  block: push error reporting into bdrv_all_*_snapshot functions
  migration: stop returning errno from load_snapshot()
  block: add ability to specify list of blockdevs during snapshot
  block: allow specifying name of block device for vmstate storage
  migration: introduce snapshot-{save,load,delete} QMP commands

 block/monitor/block-hmp-cmds.c |   7 +-
 block/snapshot.c               | 167 +++++++++++++++++---------
 include/block/snapshot.h       |  19 +--
 include/migration/snapshot.h   |  10 +-
 migration/savevm.c             | 210 +++++++++++++++++++++++++++------
 monitor/hmp-cmds.c             |  11 +-
 qapi/job.json                  |   9 +-
 qapi/migration.json            | 112 ++++++++++++++++++
 replay/replay-snapshot.c       |   4 +-
 softmmu/vl.c                   |   2 +-
 tests/qemu-iotests/267.out     |  14 +--
 tests/qemu-iotests/310         | 125 ++++++++++++++++++++
 tests/qemu-iotests/310.out     |   0
 tests/qemu-iotests/group       |   1 +
 14 files changed, 562 insertions(+), 129 deletions(-)
 create mode 100755 tests/qemu-iotests/310
 create mode 100644 tests/qemu-iotests/310.out

--=20
2.26.2




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/6] migration: improve error reporting of block driver state name
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
@ 2020-07-27 15:08 ` Daniel P. Berrangé
  2020-08-12 10:19   ` Dr. David Alan Gilbert
  2020-07-27 15:08 ` [PATCH v2 2/6] block: push error reporting into bdrv_all_*_snapshot functions Daniel P. Berrangé
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

With blockdev, a BlockDriverState may not have a device name,
so using a node name is required as an alternative.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 migration/savevm.c         | 12 ++++++------
 tests/qemu-iotests/267.out |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a..cffee6cab7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2655,7 +2655,7 @@ int save_snapshot(const char *name, Error **errp)
 
     if (!bdrv_all_can_snapshot(&bs)) {
         error_setg(errp, "Device '%s' is writable but does not support "
-                   "snapshots", bdrv_get_device_name(bs));
+                   "snapshots", bdrv_get_device_or_node_name(bs));
         return ret;
     }
 
@@ -2664,7 +2664,7 @@ int save_snapshot(const char *name, Error **errp)
         ret = bdrv_all_delete_snapshot(name, &bs1, errp);
         if (ret < 0) {
             error_prepend(errp, "Error while deleting snapshot on device "
-                          "'%s': ", bdrv_get_device_name(bs1));
+                          "'%s': ", bdrv_get_device_or_node_name(bs1));
             return ret;
         }
     }
@@ -2739,7 +2739,7 @@ int save_snapshot(const char *name, Error **errp)
     ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
     if (ret < 0) {
         error_setg(errp, "Error while creating snapshot on '%s'",
-                   bdrv_get_device_name(bs));
+                   bdrv_get_device_or_node_name(bs));
         goto the_end;
     }
 
@@ -2857,14 +2857,14 @@ int load_snapshot(const char *name, Error **errp)
     if (!bdrv_all_can_snapshot(&bs)) {
         error_setg(errp,
                    "Device '%s' is writable but does not support snapshots",
-                   bdrv_get_device_name(bs));
+                   bdrv_get_device_or_node_name(bs));
         return -ENOTSUP;
     }
     ret = bdrv_all_find_snapshot(name, &bs);
     if (ret < 0) {
         error_setg(errp,
                    "Device '%s' does not have the requested snapshot '%s'",
-                   bdrv_get_device_name(bs), name);
+                   bdrv_get_device_or_node_name(bs), name);
         return ret;
     }
 
@@ -2893,7 +2893,7 @@ int load_snapshot(const char *name, Error **errp)
     ret = bdrv_all_goto_snapshot(name, &bs, errp);
     if (ret < 0) {
         error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
-                      name, bdrv_get_device_name(bs));
+                      name, bdrv_get_device_or_node_name(bs));
         goto err_drain;
     }
 
diff --git a/tests/qemu-iotests/267.out b/tests/qemu-iotests/267.out
index d6d80c099f..215902b3ad 100644
--- a/tests/qemu-iotests/267.out
+++ b/tests/qemu-iotests/267.out
@@ -81,11 +81,11 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 Testing: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=file
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm snap0
-Error: Device '' is writable but does not support snapshots
+Error: Device 'file' is writable but does not support snapshots
 (qemu) info snapshots
 No available block device supports snapshots
 (qemu) loadvm snap0
-Error: Device '' is writable but does not support snapshots
+Error: Device 'file' is writable but does not support snapshots
 (qemu) quit
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/6] block: push error reporting into bdrv_all_*_snapshot functions
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 1/6] migration: improve error reporting of block driver state name Daniel P. Berrangé
@ 2020-07-27 15:08 ` Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 3/6] migration: stop returning errno from load_snapshot() Daniel P. Berrangé
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

The bdrv_all_*_snapshot functions return a BlockDriverState pointer
for the invalid backend, which the callers then use to report an
error message. In some cases multiple callers are reporting the
same error message, but with slightly different text. In the future
there will be more error scenarios for some of these methods, which
will benefit from fine grained error message reporting. So it is
helpful to push error reporting down a level.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 block/monitor/block-hmp-cmds.c |  7 ++--
 block/snapshot.c               | 77 +++++++++++++++++-----------------
 include/block/snapshot.h       | 14 +++----
 migration/savevm.c             | 37 +++++-----------
 monitor/hmp-cmds.c             |  7 +---
 tests/qemu-iotests/267.out     | 10 ++---
 6 files changed, 65 insertions(+), 87 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4c8c375172..9df11494d6 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -898,10 +898,11 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
 
     ImageEntry *image_entry, *next_ie;
     SnapshotEntry *snapshot_entry;
+    Error *err = NULL;
 
-    bs = bdrv_all_find_vmstate_bs();
+    bs = bdrv_all_find_vmstate_bs(&err);
     if (!bs) {
-        monitor_printf(mon, "No available block device supports snapshots\n");
+        error_report_err(err);
         return;
     }
     aio_context = bdrv_get_aio_context(bs);
@@ -951,7 +952,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
     total = 0;
     for (i = 0; i < nb_sns; i++) {
         SnapshotEntry *next_sn;
-        if (bdrv_all_find_snapshot(sn_tab[i].name, &bs1) == 0) {
+        if (bdrv_all_find_snapshot(sn_tab[i].name, NULL) == 0) {
             global_snapshots[total] = i;
             total++;
             QTAILQ_FOREACH(image_entry, &image_list, next) {
diff --git a/block/snapshot.c b/block/snapshot.c
index bd9fb01817..6839060622 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -400,14 +400,14 @@ static bool bdrv_all_snapshots_includes_bs(BlockDriverState *bs)
  * These functions will properly handle dataplane (take aio_context_acquire
  * when appropriate for appropriate block drivers) */
 
-bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs)
+bool bdrv_all_can_snapshot(Error **errp)
 {
-    bool ok = true;
     BlockDriverState *bs;
     BdrvNextIterator it;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *ctx = bdrv_get_aio_context(bs);
+        bool ok;
 
         aio_context_acquire(ctx);
         if (bdrv_all_snapshots_includes_bs(bs)) {
@@ -415,26 +415,25 @@ bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs)
         }
         aio_context_release(ctx);
         if (!ok) {
+            error_setg(errp, "Device '%s' is writable but does not support "
+                       "snapshots", bdrv_get_device_or_node_name(bs));
             bdrv_next_cleanup(&it);
-            goto fail;
+            return false;
         }
     }
 
-fail:
-    *first_bad_bs = bs;
-    return ok;
+    return true;
 }
 
-int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bad_bs,
-                             Error **errp)
+int bdrv_all_delete_snapshot(const char *name, Error **errp)
 {
-    int ret = 0;
     BlockDriverState *bs;
     BdrvNextIterator it;
     QEMUSnapshotInfo sn1, *snapshot = &sn1;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *ctx = bdrv_get_aio_context(bs);
+        int ret;
 
         aio_context_acquire(ctx);
         if (bdrv_all_snapshots_includes_bs(bs) &&
@@ -445,26 +444,25 @@ int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bad_bs,
         }
         aio_context_release(ctx);
         if (ret < 0) {
+            error_prepend(errp, "Could not delete snapshot '%s' on '%s': ",
+                          name, bdrv_get_device_or_node_name(bs));
             bdrv_next_cleanup(&it);
-            goto fail;
+            return -1;
         }
     }
 
-fail:
-    *first_bad_bs = bs;
-    return ret;
+    return 0;
 }
 
 
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
-                           Error **errp)
+int bdrv_all_goto_snapshot(const char *name, Error **errp)
 {
-    int ret = 0;
     BlockDriverState *bs;
     BdrvNextIterator it;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *ctx = bdrv_get_aio_context(bs);
+        int ret;
 
         aio_context_acquire(ctx);
         if (bdrv_all_snapshots_includes_bs(bs)) {
@@ -472,75 +470,75 @@ int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
         }
         aio_context_release(ctx);
         if (ret < 0) {
+            error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
+                          name, bdrv_get_device_or_node_name(bs));
             bdrv_next_cleanup(&it);
-            goto fail;
+            return -1;
         }
     }
 
-fail:
-    *first_bad_bs = bs;
-    return ret;
+    return 0;
 }
 
-int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs)
+int bdrv_all_find_snapshot(const char *name, Error **errp)
 {
     QEMUSnapshotInfo sn;
-    int err = 0;
     BlockDriverState *bs;
     BdrvNextIterator it;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *ctx = bdrv_get_aio_context(bs);
+        int ret;
 
         aio_context_acquire(ctx);
         if (bdrv_all_snapshots_includes_bs(bs)) {
-            err = bdrv_snapshot_find(bs, &sn, name);
+            ret = bdrv_snapshot_find(bs, &sn, name);
         }
         aio_context_release(ctx);
-        if (err < 0) {
+        if (ret < 0) {
+            error_setg(errp, "Could not find snapshot '%s' on '%s'",
+                       name, bdrv_get_device_or_node_name(bs));
             bdrv_next_cleanup(&it);
-            goto fail;
+            return -1;
         }
     }
 
-fail:
-    *first_bad_bs = bs;
-    return err;
+    return 0;
 }
 
 int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
                              BlockDriverState *vm_state_bs,
                              uint64_t vm_state_size,
-                             BlockDriverState **first_bad_bs)
+                             Error **errp)
 {
-    int err = 0;
     BlockDriverState *bs;
     BdrvNextIterator it;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *ctx = bdrv_get_aio_context(bs);
+        int ret;
 
         aio_context_acquire(ctx);
         if (bs == vm_state_bs) {
             sn->vm_state_size = vm_state_size;
-            err = bdrv_snapshot_create(bs, sn);
+            ret = bdrv_snapshot_create(bs, sn);
         } else if (bdrv_all_snapshots_includes_bs(bs)) {
             sn->vm_state_size = 0;
-            err = bdrv_snapshot_create(bs, sn);
+            ret = bdrv_snapshot_create(bs, sn);
         }
         aio_context_release(ctx);
-        if (err < 0) {
+        if (ret < 0) {
+            error_setg(errp, "Could not create snapshot '%s' on '%s'",
+                       sn->name, bdrv_get_device_or_node_name(bs));
             bdrv_next_cleanup(&it);
-            goto fail;
+            return -1;
         }
     }
 
-fail:
-    *first_bad_bs = bs;
-    return err;
+    return 0;
 }
 
-BlockDriverState *bdrv_all_find_vmstate_bs(void)
+BlockDriverState *bdrv_all_find_vmstate_bs(Error **errp)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
@@ -558,5 +556,8 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
             break;
         }
     }
+    if (!bs) {
+        error_setg(errp, "No block device supports snapshots");
+    }
     return bs;
 }
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index 2bfcd57578..ba1528eee0 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -76,17 +76,15 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
  * These functions will properly handle dataplane (take aio_context_acquire
  * when appropriate for appropriate block drivers */
 
-bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs);
-int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bsd_bs,
-                             Error **errp);
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
-                           Error **errp);
-int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs);
+bool bdrv_all_can_snapshot(Error **errp);
+int bdrv_all_delete_snapshot(const char *name, Error **errp);
+int bdrv_all_goto_snapshot(const char *name, Error **errp);
+int bdrv_all_find_snapshot(const char *name, Error **errp);
 int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
                              BlockDriverState *vm_state_bs,
                              uint64_t vm_state_size,
-                             BlockDriverState **first_bad_bs);
+                             Error **errp);
 
-BlockDriverState *bdrv_all_find_vmstate_bs(void);
+BlockDriverState *bdrv_all_find_vmstate_bs(Error **errp);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index cffee6cab7..19259ef7c0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2633,7 +2633,7 @@ int qemu_load_device_state(QEMUFile *f)
 
 int save_snapshot(const char *name, Error **errp)
 {
-    BlockDriverState *bs, *bs1;
+    BlockDriverState *bs;
     QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1;
     int ret = -1, ret2;
     QEMUFile *f;
@@ -2653,25 +2653,19 @@ int save_snapshot(const char *name, Error **errp)
         return ret;
     }
 
-    if (!bdrv_all_can_snapshot(&bs)) {
-        error_setg(errp, "Device '%s' is writable but does not support "
-                   "snapshots", bdrv_get_device_or_node_name(bs));
+    if (!bdrv_all_can_snapshot(errp)) {
         return ret;
     }
 
     /* Delete old snapshots of the same name */
     if (name) {
-        ret = bdrv_all_delete_snapshot(name, &bs1, errp);
-        if (ret < 0) {
-            error_prepend(errp, "Error while deleting snapshot on device "
-                          "'%s': ", bdrv_get_device_or_node_name(bs1));
+        if (bdrv_all_delete_snapshot(name, errp) < 0) {
             return ret;
         }
     }
 
-    bs = bdrv_all_find_vmstate_bs();
+    bs = bdrv_all_find_vmstate_bs(errp);
     if (bs == NULL) {
-        error_setg(errp, "No block device can accept snapshots");
         return ret;
     }
     aio_context = bdrv_get_aio_context(bs);
@@ -2736,10 +2730,8 @@ int save_snapshot(const char *name, Error **errp)
     aio_context_release(aio_context);
     aio_context = NULL;
 
-    ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
+    ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, errp);
     if (ret < 0) {
-        error_setg(errp, "Error while creating snapshot on '%s'",
-                   bdrv_get_device_or_node_name(bs));
         goto the_end;
     }
 
@@ -2841,7 +2833,7 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
 
 int load_snapshot(const char *name, Error **errp)
 {
-    BlockDriverState *bs, *bs_vm_state;
+    BlockDriverState *bs_vm_state;
     QEMUSnapshotInfo sn;
     QEMUFile *f;
     int ret;
@@ -2854,23 +2846,16 @@ int load_snapshot(const char *name, Error **errp)
         return -EINVAL;
     }
 
-    if (!bdrv_all_can_snapshot(&bs)) {
-        error_setg(errp,
-                   "Device '%s' is writable but does not support snapshots",
-                   bdrv_get_device_or_node_name(bs));
+    if (!bdrv_all_can_snapshot(errp)) {
         return -ENOTSUP;
     }
-    ret = bdrv_all_find_snapshot(name, &bs);
+    ret = bdrv_all_find_snapshot(name, errp);
     if (ret < 0) {
-        error_setg(errp,
-                   "Device '%s' does not have the requested snapshot '%s'",
-                   bdrv_get_device_or_node_name(bs), name);
         return ret;
     }
 
-    bs_vm_state = bdrv_all_find_vmstate_bs();
+    bs_vm_state = bdrv_all_find_vmstate_bs(errp);
     if (!bs_vm_state) {
-        error_setg(errp, "No block device supports snapshots");
         return -ENOTSUP;
     }
     aio_context = bdrv_get_aio_context(bs_vm_state);
@@ -2890,10 +2875,8 @@ int load_snapshot(const char *name, Error **errp)
     /* Flush all IO requests so they don't interfere with the new state.  */
     bdrv_drain_all_begin();
 
-    ret = bdrv_all_goto_snapshot(name, &bs, errp);
+    ret = bdrv_all_goto_snapshot(name, errp);
     if (ret < 0) {
-        error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
-                      name, bdrv_get_device_or_node_name(bs));
         goto err_drain;
     }
 
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ae4b6a4246..52f7d322e1 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1111,15 +1111,10 @@ void hmp_savevm(Monitor *mon, const QDict *qdict)
 
 void hmp_delvm(Monitor *mon, const QDict *qdict)
 {
-    BlockDriverState *bs;
     Error *err = NULL;
     const char *name = qdict_get_str(qdict, "name");
 
-    if (bdrv_all_delete_snapshot(name, &bs, &err) < 0) {
-        error_prepend(&err,
-                      "deleting snapshot on device '%s': ",
-                      bdrv_get_device_name(bs));
-    }
+    bdrv_all_delete_snapshot(name, &err);
     hmp_handle_error(mon, err);
 }
 
diff --git a/tests/qemu-iotests/267.out b/tests/qemu-iotests/267.out
index 215902b3ad..c65cce893a 100644
--- a/tests/qemu-iotests/267.out
+++ b/tests/qemu-iotests/267.out
@@ -6,9 +6,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 Testing:
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm snap0
-Error: No block device can accept snapshots
+Error: No block device supports snapshots
 (qemu) info snapshots
-No available block device supports snapshots
+No block device supports snapshots
 (qemu) loadvm snap0
 Error: No block device supports snapshots
 (qemu) quit
@@ -22,7 +22,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm snap0
 Error: Device 'none0' is writable but does not support snapshots
 (qemu) info snapshots
-No available block device supports snapshots
+No block device supports snapshots
 (qemu) loadvm snap0
 Error: Device 'none0' is writable but does not support snapshots
 (qemu) quit
@@ -58,7 +58,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm snap0
 Error: Device 'virtio0' is writable but does not support snapshots
 (qemu) info snapshots
-No available block device supports snapshots
+No block device supports snapshots
 (qemu) loadvm snap0
 Error: Device 'virtio0' is writable but does not support snapshots
 (qemu) quit
@@ -83,7 +83,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm snap0
 Error: Device 'file' is writable but does not support snapshots
 (qemu) info snapshots
-No available block device supports snapshots
+No block device supports snapshots
 (qemu) loadvm snap0
 Error: Device 'file' is writable but does not support snapshots
 (qemu) quit
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/6] migration: stop returning errno from load_snapshot()
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 1/6] migration: improve error reporting of block driver state name Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 2/6] block: push error reporting into bdrv_all_*_snapshot functions Daniel P. Berrangé
@ 2020-07-27 15:08 ` Daniel P. Berrangé
  2020-08-12 10:21   ` Dr. David Alan Gilbert
  2020-07-27 15:08 ` [PATCH v2 4/6] block: add ability to specify list of blockdevs during snapshot Daniel P. Berrangé
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

None of the callers care about the errno value since there is a full
Error object populated. This gives consistency with save_snapshot()
which already just returns -1.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 migration/savevm.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 19259ef7c0..6c4d80fc5a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2843,20 +2843,20 @@ int load_snapshot(const char *name, Error **errp)
     if (!replay_can_snapshot()) {
         error_setg(errp, "Record/replay does not allow loading snapshot "
                    "right now. Try once more later.");
-        return -EINVAL;
+        return -1;
     }
 
     if (!bdrv_all_can_snapshot(errp)) {
-        return -ENOTSUP;
+        return -1;
     }
     ret = bdrv_all_find_snapshot(name, errp);
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     bs_vm_state = bdrv_all_find_vmstate_bs(errp);
     if (!bs_vm_state) {
-        return -ENOTSUP;
+        return -1;
     }
     aio_context = bdrv_get_aio_context(bs_vm_state);
 
@@ -2865,11 +2865,11 @@ int load_snapshot(const char *name, Error **errp)
     ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
     aio_context_release(aio_context);
     if (ret < 0) {
-        return ret;
+        return -1;
     } else if (sn.vm_state_size == 0) {
         error_setg(errp, "This is a disk-only snapshot. Revert to it "
                    " offline using qemu-img");
-        return -EINVAL;
+        return -1;
     }
 
     /* Flush all IO requests so they don't interfere with the new state.  */
@@ -2884,7 +2884,6 @@ int load_snapshot(const char *name, Error **errp)
     f = qemu_fopen_bdrv(bs_vm_state, 0);
     if (!f) {
         error_setg(errp, "Could not open VM state file");
-        ret = -EINVAL;
         goto err_drain;
     }
 
@@ -2900,14 +2899,14 @@ int load_snapshot(const char *name, Error **errp)
 
     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
-        return ret;
+        return -1;
     }
 
     return 0;
 
 err_drain:
     bdrv_drain_all_end();
-    return ret;
+    return -1;
 }
 
 void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/6] block: add ability to specify list of blockdevs during snapshot
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2020-07-27 15:08 ` [PATCH v2 3/6] migration: stop returning errno from load_snapshot() Daniel P. Berrangé
@ 2020-07-27 15:08 ` Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 5/6] block: allow specifying name of block device for vmstate storage Daniel P. Berrangé
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

When running snapshot operations, there are various rules for which
blockdevs are included/excluded. While this provides reasonable default
behaviour, there are scenarios that are not well handled by the default
logic. Some of the conditions do not have a single correct answer.

Thus there needs to be a way for the mgmt app to provide an explicit
list of blockdevs to perform snapshots across. This can be achieved
by passing a list of node names that should be used.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 block/monitor/block-hmp-cmds.c |  4 +--
 block/snapshot.c               | 48 ++++++++++++++++++++++------------
 include/block/snapshot.h       | 13 ++++-----
 migration/savevm.c             | 16 ++++++------
 monitor/hmp-cmds.c             |  2 +-
 5 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 9df11494d6..db76c43cc2 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -900,7 +900,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
     SnapshotEntry *snapshot_entry;
     Error *err = NULL;
 
-    bs = bdrv_all_find_vmstate_bs(&err);
+    bs = bdrv_all_find_vmstate_bs(NULL, &err);
     if (!bs) {
         error_report_err(err);
         return;
@@ -952,7 +952,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
     total = 0;
     for (i = 0; i < nb_sns; i++) {
         SnapshotEntry *next_sn;
-        if (bdrv_all_find_snapshot(sn_tab[i].name, NULL) == 0) {
+        if (bdrv_all_find_snapshot(sn_tab[i].name, NULL, NULL) == 0) {
             global_snapshots[total] = i;
             total++;
             QTAILQ_FOREACH(image_entry, &image_list, next) {
diff --git a/block/snapshot.c b/block/snapshot.c
index 6839060622..f2600a8c7f 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -385,22 +385,34 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
     return ret;
 }
 
-static bool bdrv_all_snapshots_includes_bs(BlockDriverState *bs)
+static bool bdrv_all_snapshots_includes_bs(BlockDriverState *bs,
+                                           strList *devices)
 {
-    if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+    if (devices) {
+        const char *node_name = bdrv_get_node_name(bs);
+        while (devices) {
+            if (g_str_equal(node_name, devices->value)) {
+                return true;
+            }
+            devices = devices->next;
+        }
         return false;
-    }
+    } else {
+        if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+            return false;
+        }
 
-    /* Include all nodes that are either in use by a BlockBackend, or that
-     * aren't attached to any node, but owned by the monitor. */
-    return bdrv_has_blk(bs) || QLIST_EMPTY(&bs->parents);
+        /* Include all nodes that are either in use by a BlockBackend, or that
+         * aren't attached to any node, but owned by the monitor. */
+        return bdrv_has_blk(bs) || QLIST_EMPTY(&bs->parents);
+    }
 }
 
 /* Group operations. All block drivers are involved.
  * These functions will properly handle dataplane (take aio_context_acquire
  * when appropriate for appropriate block drivers) */
 
-bool bdrv_all_can_snapshot(Error **errp)
+bool bdrv_all_can_snapshot(strList *devices, Error **errp)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
@@ -410,7 +422,7 @@ bool bdrv_all_can_snapshot(Error **errp)
         bool ok;
 
         aio_context_acquire(ctx);
-        if (bdrv_all_snapshots_includes_bs(bs)) {
+        if (bdrv_all_snapshots_includes_bs(bs, devices)) {
             ok = bdrv_can_snapshot(bs);
         }
         aio_context_release(ctx);
@@ -425,7 +437,7 @@ bool bdrv_all_can_snapshot(Error **errp)
     return true;
 }
 
-int bdrv_all_delete_snapshot(const char *name, Error **errp)
+int bdrv_all_delete_snapshot(const char *name, strList *devices, Error **errp)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
@@ -436,7 +448,7 @@ int bdrv_all_delete_snapshot(const char *name, Error **errp)
         int ret;
 
         aio_context_acquire(ctx);
-        if (bdrv_all_snapshots_includes_bs(bs) &&
+        if (bdrv_all_snapshots_includes_bs(bs, devices) &&
             bdrv_snapshot_find(bs, snapshot, name) >= 0)
         {
             ret = bdrv_snapshot_delete(bs, snapshot->id_str,
@@ -455,7 +467,7 @@ int bdrv_all_delete_snapshot(const char *name, Error **errp)
 }
 
 
-int bdrv_all_goto_snapshot(const char *name, Error **errp)
+int bdrv_all_goto_snapshot(const char *name, strList *devices, Error **errp)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
@@ -465,7 +477,7 @@ int bdrv_all_goto_snapshot(const char *name, Error **errp)
         int ret;
 
         aio_context_acquire(ctx);
-        if (bdrv_all_snapshots_includes_bs(bs)) {
+        if (bdrv_all_snapshots_includes_bs(bs, devices)) {
             ret = bdrv_snapshot_goto(bs, name, errp);
         }
         aio_context_release(ctx);
@@ -480,7 +492,7 @@ int bdrv_all_goto_snapshot(const char *name, Error **errp)
     return 0;
 }
 
-int bdrv_all_find_snapshot(const char *name, Error **errp)
+int bdrv_all_find_snapshot(const char *name, strList *devices, Error **errp)
 {
     QEMUSnapshotInfo sn;
     BlockDriverState *bs;
@@ -491,7 +503,7 @@ int bdrv_all_find_snapshot(const char *name, Error **errp)
         int ret;
 
         aio_context_acquire(ctx);
-        if (bdrv_all_snapshots_includes_bs(bs)) {
+        if (bdrv_all_snapshots_includes_bs(bs, devices)) {
             ret = bdrv_snapshot_find(bs, &sn, name);
         }
         aio_context_release(ctx);
@@ -509,6 +521,7 @@ int bdrv_all_find_snapshot(const char *name, Error **errp)
 int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
                              BlockDriverState *vm_state_bs,
                              uint64_t vm_state_size,
+                             strList *devices,
                              Error **errp)
 {
     BlockDriverState *bs;
@@ -522,7 +535,7 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
         if (bs == vm_state_bs) {
             sn->vm_state_size = vm_state_size;
             ret = bdrv_snapshot_create(bs, sn);
-        } else if (bdrv_all_snapshots_includes_bs(bs)) {
+        } else if (bdrv_all_snapshots_includes_bs(bs, devices)) {
             sn->vm_state_size = 0;
             ret = bdrv_snapshot_create(bs, sn);
         }
@@ -538,7 +551,7 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
     return 0;
 }
 
-BlockDriverState *bdrv_all_find_vmstate_bs(Error **errp)
+BlockDriverState *bdrv_all_find_vmstate_bs(strList *devices, Error **errp)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
@@ -548,7 +561,8 @@ BlockDriverState *bdrv_all_find_vmstate_bs(Error **errp)
         bool found;
 
         aio_context_acquire(ctx);
-        found = bdrv_all_snapshots_includes_bs(bs) && bdrv_can_snapshot(bs);
+        found = bdrv_all_snapshots_includes_bs(bs, devices) &&
+            bdrv_can_snapshot(bs);
         aio_context_release(ctx);
 
         if (found) {
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index ba1528eee0..1c5b0705a9 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -25,7 +25,7 @@
 #ifndef SNAPSHOT_H
 #define SNAPSHOT_H
 
-
+#include "qapi/qapi-builtin-types.h"
 
 #define SNAPSHOT_OPT_BASE       "snapshot."
 #define SNAPSHOT_OPT_ID         "snapshot.id"
@@ -76,15 +76,16 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
  * These functions will properly handle dataplane (take aio_context_acquire
  * when appropriate for appropriate block drivers */
 
-bool bdrv_all_can_snapshot(Error **errp);
-int bdrv_all_delete_snapshot(const char *name, Error **errp);
-int bdrv_all_goto_snapshot(const char *name, Error **errp);
-int bdrv_all_find_snapshot(const char *name, Error **errp);
+bool bdrv_all_can_snapshot(strList *devices, Error **errp);
+int bdrv_all_delete_snapshot(const char *name, strList *devices, Error **errp);
+int bdrv_all_goto_snapshot(const char *name, strList *devices, Error **errp);
+int bdrv_all_find_snapshot(const char *name, strList *devices, Error **errp);
 int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
                              BlockDriverState *vm_state_bs,
                              uint64_t vm_state_size,
+                             strList *devices,
                              Error **errp);
 
-BlockDriverState *bdrv_all_find_vmstate_bs(Error **errp);
+BlockDriverState *bdrv_all_find_vmstate_bs(strList *devices, Error **errp);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 6c4d80fc5a..cdc1f2f2d8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2653,18 +2653,18 @@ int save_snapshot(const char *name, Error **errp)
         return ret;
     }
 
-    if (!bdrv_all_can_snapshot(errp)) {
+    if (!bdrv_all_can_snapshot(NULL, errp)) {
         return ret;
     }
 
     /* Delete old snapshots of the same name */
     if (name) {
-        if (bdrv_all_delete_snapshot(name, errp) < 0) {
+        if (bdrv_all_delete_snapshot(name, NULL, errp) < 0) {
             return ret;
         }
     }
 
-    bs = bdrv_all_find_vmstate_bs(errp);
+    bs = bdrv_all_find_vmstate_bs(NULL, errp);
     if (bs == NULL) {
         return ret;
     }
@@ -2730,7 +2730,7 @@ int save_snapshot(const char *name, Error **errp)
     aio_context_release(aio_context);
     aio_context = NULL;
 
-    ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, errp);
+    ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, NULL, errp);
     if (ret < 0) {
         goto the_end;
     }
@@ -2846,15 +2846,15 @@ int load_snapshot(const char *name, Error **errp)
         return -1;
     }
 
-    if (!bdrv_all_can_snapshot(errp)) {
+    if (!bdrv_all_can_snapshot(NULL, errp)) {
         return -1;
     }
-    ret = bdrv_all_find_snapshot(name, errp);
+    ret = bdrv_all_find_snapshot(name, NULL, errp);
     if (ret < 0) {
         return -1;
     }
 
-    bs_vm_state = bdrv_all_find_vmstate_bs(errp);
+    bs_vm_state = bdrv_all_find_vmstate_bs(NULL, errp);
     if (!bs_vm_state) {
         return -1;
     }
@@ -2875,7 +2875,7 @@ int load_snapshot(const char *name, Error **errp)
     /* Flush all IO requests so they don't interfere with the new state.  */
     bdrv_drain_all_begin();
 
-    ret = bdrv_all_goto_snapshot(name, errp);
+    ret = bdrv_all_goto_snapshot(name, NULL, errp);
     if (ret < 0) {
         goto err_drain;
     }
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 52f7d322e1..90e717f0c4 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1114,7 +1114,7 @@ void hmp_delvm(Monitor *mon, const QDict *qdict)
     Error *err = NULL;
     const char *name = qdict_get_str(qdict, "name");
 
-    bdrv_all_delete_snapshot(name, &err);
+    bdrv_all_delete_snapshot(name, NULL, &err);
     hmp_handle_error(mon, err);
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/6] block: allow specifying name of block device for vmstate storage
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2020-07-27 15:08 ` [PATCH v2 4/6] block: add ability to specify list of blockdevs during snapshot Daniel P. Berrangé
@ 2020-07-27 15:08 ` Daniel P. Berrangé
  2020-07-27 15:08 ` [PATCH v2 6/6] migration: introduce snapshot-{save, load, delete} QMP commands Daniel P. Berrangé
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

Currently the vmstate will be stored in the first block device that
supports snapshots. Historically this would have usually been the
root device, but with UEFI it might be the variable store. There
needs to be a way to override the choice of block device to store
the state in.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 block/monitor/block-hmp-cmds.c |  2 +-
 block/snapshot.c               | 64 ++++++++++++++++++++++++++--------
 include/block/snapshot.h       |  4 ++-
 migration/savevm.c             |  4 +--
 4 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index db76c43cc2..81d1b52262 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -900,7 +900,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
     SnapshotEntry *snapshot_entry;
     Error *err = NULL;
 
-    bs = bdrv_all_find_vmstate_bs(NULL, &err);
+    bs = bdrv_all_find_vmstate_bs(NULL, NULL, &err);
     if (!bs) {
         error_report_err(err);
         return;
diff --git a/block/snapshot.c b/block/snapshot.c
index f2600a8c7f..b1ad70e278 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -551,27 +551,63 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
     return 0;
 }
 
-BlockDriverState *bdrv_all_find_vmstate_bs(strList *devices, Error **errp)
+BlockDriverState *bdrv_all_find_vmstate_bs(const char *vmstate_bs,
+                                           strList *devices,
+                                           Error **errp)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        AioContext *ctx = bdrv_get_aio_context(bs);
-        bool found;
+    if (vmstate_bs) {
+        bool usable = false;
+        for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+            AioContext *ctx = bdrv_get_aio_context(bs);
+            bool match;
 
-        aio_context_acquire(ctx);
-        found = bdrv_all_snapshots_includes_bs(bs, devices) &&
-            bdrv_can_snapshot(bs);
-        aio_context_release(ctx);
+            aio_context_acquire(ctx);
+            if (g_str_equal(vmstate_bs, bdrv_get_node_name(bs))) {
+                match = true;
+                usable = bdrv_can_snapshot(bs);
+            }
+            aio_context_release(ctx);
+            if (match) {
+                bdrv_next_cleanup(&it);
+                break;
+            }
+        }
+        if (!bs) {
+            error_setg(errp,
+                       "block device '%s' does not exist",
+                       vmstate_bs);
+            return NULL;
+        }
 
-        if (found) {
-            bdrv_next_cleanup(&it);
-            break;
+        if (!usable) {
+            error_setg(errp,
+                       "block device '%s' does not support snapshots",
+                       vmstate_bs);
+            return NULL;
+        }
+    } else {
+        for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+            AioContext *ctx = bdrv_get_aio_context(bs);
+            bool found;
+
+            aio_context_acquire(ctx);
+            found = bdrv_all_snapshots_includes_bs(bs, devices) &&
+                bdrv_can_snapshot(bs);
+            aio_context_release(ctx);
+
+            if (found) {
+                bdrv_next_cleanup(&it);
+                break;
+            }
+        }
+
+        if (!bs) {
+            error_setg(errp, "No block device supports snapshots");
+            return NULL;
         }
-    }
-    if (!bs) {
-        error_setg(errp, "No block device supports snapshots");
     }
     return bs;
 }
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index 1c5b0705a9..05550e5da1 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -86,6 +86,8 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
                              strList *devices,
                              Error **errp);
 
-BlockDriverState *bdrv_all_find_vmstate_bs(strList *devices, Error **errp);
+BlockDriverState *bdrv_all_find_vmstate_bs(const char *vmstate_bs,
+                                           strList *devices,
+                                           Error **errp);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index cdc1f2f2d8..1707fa30db 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2664,7 +2664,7 @@ int save_snapshot(const char *name, Error **errp)
         }
     }
 
-    bs = bdrv_all_find_vmstate_bs(NULL, errp);
+    bs = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
     if (bs == NULL) {
         return ret;
     }
@@ -2854,7 +2854,7 @@ int load_snapshot(const char *name, Error **errp)
         return -1;
     }
 
-    bs_vm_state = bdrv_all_find_vmstate_bs(NULL, errp);
+    bs_vm_state = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
     if (!bs_vm_state) {
         return -1;
     }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] migration: introduce snapshot-{save, load, delete} QMP commands
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
                   ` (4 preceding siblings ...)
  2020-07-27 15:08 ` [PATCH v2 5/6] block: allow specifying name of block device for vmstate storage Daniel P. Berrangé
@ 2020-07-27 15:08 ` Daniel P. Berrangé
  2020-08-21 16:27 ` [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
  2020-08-26 15:52 ` Markus Armbruster
  7 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-07-27 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, Markus Armbruster,
	Dr. David Alan Gilbert, Max Reitz, John Snow, Pavel Dovgalyuk,
	Paolo Bonzini

savevm, loadvm and delvm are some of the few HMP commands that have never
been converted to use QMP. The primary reason for this lack of conversion
is that they block execution of the thread for as long as they run.

Despite this downside, however, libvirt and applications using libvirt
have used these commands for as long as QMP has existed, via the
"human-monitor-command" passthrough command. IOW, while it is clearly
desirable to be able to fix the blocking problem, this is not an
immediate obstacle to real world usage.

Meanwhile there is a need for other features which involve adding new
parameters to the commands. This is possible with HMP passthrough, but
it provides no reliable way for apps to introspect features, so using
QAPI modelling is highly desirable.

This patch thus introduces new snapshot-{load,save,delete} commands to
QMP that are intended to replace the old HMP counterparts. The new
commands are given different names, because they will be using the new
QEMU job framework and thus will have diverging behaviour from the HMP
originals. It would thus be misleading to keep the same name.

While this design uses the generic job framework, the current impl is
still blocking. The intention that the blocking problem is fixed later.
None the less applications using these new commands should assume that
they are asynchronous and thus wait for the job status change event to
indicate completion.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 include/migration/snapshot.h |  10 +-
 migration/savevm.c           | 172 +++++++++++++++++++++++++++++++++--
 monitor/hmp-cmds.c           |   4 +-
 qapi/job.json                |   9 +-
 qapi/migration.json          | 112 +++++++++++++++++++++++
 replay/replay-snapshot.c     |   4 +-
 softmmu/vl.c                 |   2 +-
 tests/qemu-iotests/310       | 125 +++++++++++++++++++++++++
 tests/qemu-iotests/310.out   |   0
 tests/qemu-iotests/group     |   1 +
 10 files changed, 421 insertions(+), 18 deletions(-)
 create mode 100755 tests/qemu-iotests/310
 create mode 100644 tests/qemu-iotests/310.out

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index c85b6ec75b..f2ed9d1e43 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -15,7 +15,13 @@
 #ifndef QEMU_MIGRATION_SNAPSHOT_H
 #define QEMU_MIGRATION_SNAPSHOT_H
 
-int save_snapshot(const char *name, Error **errp);
-int load_snapshot(const char *name, Error **errp);
+#include "qapi/qapi-builtin-types.h"
+
+int save_snapshot(const char *name,
+                  const char *vmstate, strList *devices,
+                  Error **errp);
+int load_snapshot(const char *name,
+                  const char *vmstate, strList *devices,
+                  Error **errp);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 1707fa30db..13c5a54aae 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -43,6 +43,8 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-commands-misc.h"
+#include "qapi/clone-visitor.h"
+#include "qapi/qapi-builtin-visit.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "sysemu/cpus.h"
@@ -2631,7 +2633,8 @@ int qemu_load_device_state(QEMUFile *f)
     return 0;
 }
 
-int save_snapshot(const char *name, Error **errp)
+int save_snapshot(const char *name, const char *vmstate,
+                  strList *devices, Error **errp)
 {
     BlockDriverState *bs;
     QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1;
@@ -2653,18 +2656,18 @@ int save_snapshot(const char *name, Error **errp)
         return ret;
     }
 
-    if (!bdrv_all_can_snapshot(NULL, errp)) {
+    if (!bdrv_all_can_snapshot(devices, errp)) {
         return ret;
     }
 
     /* Delete old snapshots of the same name */
     if (name) {
-        if (bdrv_all_delete_snapshot(name, NULL, errp) < 0) {
+        if (bdrv_all_delete_snapshot(name, devices, errp) < 0) {
             return ret;
         }
     }
 
-    bs = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
+    bs = bdrv_all_find_vmstate_bs(vmstate, devices, errp);
     if (bs == NULL) {
         return ret;
     }
@@ -2730,7 +2733,7 @@ int save_snapshot(const char *name, Error **errp)
     aio_context_release(aio_context);
     aio_context = NULL;
 
-    ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, NULL, errp);
+    ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, devices, errp);
     if (ret < 0) {
         goto the_end;
     }
@@ -2831,7 +2834,8 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
     migration_incoming_state_destroy();
 }
 
-int load_snapshot(const char *name, Error **errp)
+int load_snapshot(const char *name, const char *vmstate,
+                  strList *devices, Error **errp)
 {
     BlockDriverState *bs_vm_state;
     QEMUSnapshotInfo sn;
@@ -2846,15 +2850,15 @@ int load_snapshot(const char *name, Error **errp)
         return -1;
     }
 
-    if (!bdrv_all_can_snapshot(NULL, errp)) {
+    if (!bdrv_all_can_snapshot(devices, errp)) {
         return -1;
     }
-    ret = bdrv_all_find_snapshot(name, NULL, errp);
+    ret = bdrv_all_find_snapshot(name, devices, errp);
     if (ret < 0) {
         return -1;
     }
 
-    bs_vm_state = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
+    bs_vm_state = bdrv_all_find_vmstate_bs(vmstate, devices, errp);
     if (!bs_vm_state) {
         return -1;
     }
@@ -2875,7 +2879,7 @@ int load_snapshot(const char *name, Error **errp)
     /* Flush all IO requests so they don't interfere with the new state.  */
     bdrv_drain_all_begin();
 
-    ret = bdrv_all_goto_snapshot(name, NULL, errp);
+    ret = bdrv_all_goto_snapshot(name, devices, errp);
     if (ret < 0) {
         goto err_drain;
     }
@@ -2936,3 +2940,151 @@ bool vmstate_check_only_migratable(const VMStateDescription *vmsd)
 
     return !(vmsd && vmsd->unmigratable);
 }
+
+typedef struct SnapshotJob {
+    Job common;
+    char *tag;
+    char *vmstate;
+    strList *devices;
+} SnapshotJob;
+
+static void qmp_snapshot_job_free(SnapshotJob *s)
+{
+    g_free(s->tag);
+    g_free(s->vmstate);
+    qapi_free_strList(s->devices);
+}
+
+static int coroutine_fn snapshot_load_job_run(Job *job, Error **errp)
+{
+    SnapshotJob *s = container_of(job, SnapshotJob, common);
+    int ret;
+    int saved_vm_running;
+
+    job_progress_set_remaining(&s->common, 1);
+
+    saved_vm_running = runstate_is_running();
+    vm_stop(RUN_STATE_RESTORE_VM);
+
+    ret = load_snapshot(s->tag, s->vmstate, s->devices, errp);
+    if (ret == 0 && saved_vm_running) {
+        vm_start();
+    }
+
+    job_progress_update(&s->common, 1);
+
+    qmp_snapshot_job_free(s);
+
+    return ret;
+}
+
+static int coroutine_fn snapshot_save_job_run(Job *job, Error **errp)
+{
+    SnapshotJob *s = container_of(job, SnapshotJob, common);
+    int ret;
+
+    job_progress_set_remaining(&s->common, 1);
+    ret = save_snapshot(s->tag, s->vmstate, s->devices, errp);
+    job_progress_update(&s->common, 1);
+
+    qmp_snapshot_job_free(s);
+
+    return ret;
+}
+
+static int coroutine_fn snapshot_delete_job_run(Job *job, Error **errp)
+{
+    SnapshotJob *s = container_of(job, SnapshotJob, common);
+    int ret;
+
+    job_progress_set_remaining(&s->common, 1);
+    ret = bdrv_all_delete_snapshot(s->tag, s->devices, errp);
+    job_progress_update(&s->common, 1);
+
+    qmp_snapshot_job_free(s);
+
+    return ret;
+}
+
+static const JobDriver snapshot_load_job_driver = {
+    .instance_size = sizeof(SnapshotJob),
+    .job_type      = JOB_TYPE_SNAPSHOT_LOAD,
+    .run           = snapshot_load_job_run,
+};
+
+static const JobDriver snapshot_save_job_driver = {
+    .instance_size = sizeof(SnapshotJob),
+    .job_type      = JOB_TYPE_SNAPSHOT_SAVE,
+    .run           = snapshot_save_job_run,
+};
+
+static const JobDriver snapshot_delete_job_driver = {
+    .instance_size = sizeof(SnapshotJob),
+    .job_type      = JOB_TYPE_SNAPSHOT_DELETE,
+    .run           = snapshot_delete_job_run,
+};
+
+
+void qmp_snapshot_save(const char *job_id,
+                       const char *tag,
+                       bool has_vmstate, const char *vmstate,
+                       bool has_devices, strList *devices,
+                       Error **errp)
+{
+    SnapshotJob *s;
+
+    s = job_create(job_id, &snapshot_save_job_driver, NULL,
+                   qemu_get_aio_context(), JOB_MANUAL_DISMISS,
+                   NULL, NULL, errp);
+    if (!s) {
+        return;
+    }
+
+    s->tag = g_strdup(tag);
+    s->vmstate = has_vmstate ? g_strdup(vmstate) : NULL;
+    s->devices = has_devices ? QAPI_CLONE(strList, devices) : NULL;
+
+    job_start(&s->common);
+}
+
+void qmp_snapshot_load(const char *job_id,
+                       const char *tag,
+                       bool has_vmstate, const char *vmstate,
+                       bool has_devices, strList *devices,
+                       Error **errp)
+{
+    SnapshotJob *s;
+
+    s = job_create(job_id, &snapshot_load_job_driver, NULL,
+                   qemu_get_aio_context(), JOB_MANUAL_DISMISS,
+                   NULL, NULL, errp);
+    if (!s) {
+        return;
+    }
+
+    s->tag = g_strdup(tag);
+    s->vmstate = has_vmstate ? g_strdup(vmstate) : NULL;
+    s->devices = has_devices ? QAPI_CLONE(strList, devices) : NULL;
+
+    job_start(&s->common);
+}
+
+void qmp_snapshot_delete(const char *job_id,
+                         const char *tag,
+                         bool has_devices, strList *devices,
+                         Error **errp)
+{
+    SnapshotJob *s;
+
+    s = job_create(job_id, &snapshot_delete_job_driver, NULL,
+                   qemu_get_aio_context(), JOB_MANUAL_DISMISS,
+                   NULL, NULL, errp);
+    if (!s) {
+        return;
+    }
+
+    s->tag = g_strdup(tag);
+    s->devices = has_devices ? QAPI_CLONE(strList, devices) : NULL;
+
+    job_start(&s->common);
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 90e717f0c4..2b51273720 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1095,7 +1095,7 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict)
 
     vm_stop(RUN_STATE_RESTORE_VM);
 
-    if (load_snapshot(name, &err) == 0 && saved_vm_running) {
+    if (load_snapshot(name, NULL, NULL, &err) == 0 && saved_vm_running) {
         vm_start();
     }
     hmp_handle_error(mon, err);
@@ -1105,7 +1105,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
 
-    save_snapshot(qdict_get_try_str(qdict, "name"), &err);
+    save_snapshot(qdict_get_try_str(qdict, "name"), NULL, NULL, &err);
     hmp_handle_error(mon, err);
 }
 
diff --git a/qapi/job.json b/qapi/job.json
index c48a0c3e34..6aa03a7aff 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -21,10 +21,17 @@
 #
 # @amend: image options amend job type, see "x-blockdev-amend" (since 5.1)
 #
+# @snapshot-load: snapshot load job type, see "loadvm" (since 5.2)
+#
+# @snapshot-save: snapshot save job type, see "savevm" (since 5.2)
+#
+# @snapshot-delete: snapshot delete job type, see "delvm" (since 5.2)
+#
 # Since: 1.7
 ##
 { 'enum': 'JobType',
-  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend'] }
+  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend',
+           'snapshot-load', 'snapshot-save', 'snapshot-delete'] }
 
 ##
 # @JobStatus:
diff --git a/qapi/migration.json b/qapi/migration.json
index d5000558c6..0ca556c158 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1621,3 +1621,115 @@
 ##
 { 'event': 'UNPLUG_PRIMARY',
   'data': { 'device-id': 'str' } }
+
+##
+# @snapshot-save:
+#
+# Save a VM snapshot
+#
+# @job-id: identifier for the newly created job
+# @tag: name of the snapshot to create. If it already
+# exists it will be replaced.
+# @devices: list of block device node names to save a snapshot to
+# @vmstate: block device node name to save vmstate to
+#
+# Applications should not assume that the snapshot save is complete
+# when this command returns. Completion is indicated by the job
+# status. Clients can wait for the JOB_STATUS_CHANGE event.
+#
+# Note that the VM CPUs will be paused during the time it takes to
+# save the snapshot
+#
+# Returns: nothing
+#
+# Example:
+#
+# -> { "execute": "snapshot-save",
+#      "data": {
+#         "job-id": "snapsave0",
+#         "tag": "my-snap",
+#         "vmstate": "disk0",
+#         "devices": ["disk0", "disk1"]
+#      }
+#    }
+# <- { "return": { } }
+#
+# Since: 5.2
+##
+{ 'command': 'snapshot-save',
+  'data': { 'job-id': 'str',
+            'tag': 'str',
+            '*vmstate': 'str',
+            '*devices': ['str'] } }
+
+##
+# @snapshot-load:
+#
+# Load a VM snapshot
+#
+# @job-id: identifier for the newly created job
+# @tag: name of the snapshot to load.
+# @devices: list of block device node names to load a snapshot from
+# @vmstate: block device node name to load vmstate from
+#
+# Applications should not assume that the snapshot load is complete
+# when this command returns. Completion is indicated by the job
+# status. Clients can wait for the JOB_STATUS_CHANGE event.
+#
+# Returns: nothing
+#
+# Example:
+#
+# -> { "execute": "snapshot-load",
+#      "data": {
+#         "job-id": "snapload0",
+#         "tag": "my-snap",
+#         "vmstate": "disk0",
+#         "devices": ["disk0", "disk1"]
+#      }
+#    }
+# <- { "return": { } }
+#
+# Since: 5.2
+##
+{ 'command': 'snapshot-load',
+  'data': { 'job-id': 'str',
+            'tag': 'str',
+            '*vmstate': 'str',
+            '*devices': ['str'] } }
+
+##
+# @snapshot-delete:
+#
+# Delete a VM snapshot
+#
+# @job-id: identifier for the newly created job
+# @tag: name of the snapshot to delete.
+# @devices: list of block device node names to delete a snapshot from
+#
+# Applications should not assume that the snapshot load is complete
+# when this command returns. Completion is indicated by the job
+# status. Clients can wait for the JOB_STATUS_CHANGE event.
+#
+# Note that the VM CPUs will be paused during the time it takes to
+# delete the snapshot
+#
+# Returns: nothing
+#
+# Example:
+#
+# -> { "execute": "snapshot-delete",
+#      "data": {
+#         "job-id": "snapdelete0",
+#         "tag": "my-snap",
+#         "devices": ["disk0", "disk1"]
+#      }
+#    }
+# <- { "return": { } }
+#
+# Since: 5.2
+##
+{ 'command': 'snapshot-delete',
+  'data': { 'job-id': 'str',
+            'tag': 'str',
+            '*devices': ['str'] } }
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index e26fa4c892..f0f45a4f24 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -77,13 +77,13 @@ void replay_vmstate_init(void)
 
     if (replay_snapshot) {
         if (replay_mode == REPLAY_MODE_RECORD) {
-            if (save_snapshot(replay_snapshot, &err) != 0) {
+            if (save_snapshot(replay_snapshot, NULL, NULL, &err) != 0) {
                 error_report_err(err);
                 error_report("Could not create snapshot for icount record");
                 exit(1);
             }
         } else if (replay_mode == REPLAY_MODE_PLAY) {
-            if (load_snapshot(replay_snapshot, &err) != 0) {
+            if (load_snapshot(replay_snapshot, NULL, NULL, &err) != 0) {
                 error_report_err(err);
                 error_report("Could not load snapshot for icount replay");
                 exit(1);
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 3416241557..9d2d38360a 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -4456,7 +4456,7 @@ void qemu_init(int argc, char **argv, char **envp)
     register_global_state();
     if (loadvm) {
         Error *local_err = NULL;
-        if (load_snapshot(loadvm, &local_err) < 0) {
+        if (load_snapshot(loadvm, NULL, NULL, &local_err) < 0) {
             error_report_err(local_err);
             autostart = 0;
             exit(1);
diff --git a/tests/qemu-iotests/310 b/tests/qemu-iotests/310
new file mode 100755
index 0000000000..b84b3a6dd6
--- /dev/null
+++ b/tests/qemu-iotests/310
@@ -0,0 +1,125 @@
+#!/usr/bin/env bash
+#
+# Test which nodes are involved in internal snapshots
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=berrange@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    rm -f "$SOCK_DIR/nbd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_require_drivers copy-on-read
+
+# Internal snapshots are (currently) impossible with refcount_bits=1,
+# and generally impossible with external data files
+_unsupported_imgopts 'refcount_bits=1[^0-9]' data_file
+
+_require_devices virtio-blk
+
+do_run_qemu()
+{
+    echo Testing: "$@"
+    (
+        if ! test -t 0; then
+            while read cmd; do
+                echo $cmd
+            done
+        fi
+        echo quit
+    ) | $QEMU -nographic -qmp stdio -nodefaults "$@"
+    echo
+}
+
+run_qemu()
+{
+    do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | _filter_hmp |
+        _filter_generated_node_ids | _filter_imgfmt | _filter_vmstate_size
+}
+
+size=128M
+
+run_test()
+{
+    if [ -n "$BACKING_FILE" ]; then
+        _make_test_img -b "$BACKING_FILE" -F $IMGFMT $size
+    else
+        _make_test_img $size
+    fi
+    run_qemu "$@" <<EOF | _filter_date
+{ "execute": "qmp_capabilities" }
+{ "execute": "snapshot-save",
+  "arguments": {
+    "job-id": "snapsave0",
+    "tag": "snap0",
+    "devices": ["disk1"]
+  } }
+{ "execute": "query-jobs" }
+{ "execute": "query-named-block-nodes" }
+{ "execute": "snapshot-load",
+  "arguments": {
+    "job-id": "snapload0",
+    "tag": "snap0",
+    "devices": ["disk1"]
+  } }
+{ "execute": "query-jobs" }
+{ "execute": "query-named-block-nodes" }
+{ "execute": "snapshot-delete",
+  "arguments": {
+    "job-id": "snapdel0",
+    "tag": "snap0",
+    "devices": ["disk1"]
+  } }
+{ "execute": "query-jobs" }
+{ "execute": "snapshot-delete",
+  "arguments": {
+    "job-id": "snapdel1",
+    "tag": "snap1",
+    "devices": ["disk1"]
+  } }
+{ "execute": "query-jobs" }
+{ "execute": "query-named-block-nodes" }
+{ "execute": "quit" }
+EOF
+}
+
+echo "Basic test"
+
+run_test -blockdev "{\"driver\":\"file\",\"filename\":\"$TEST_IMG\",\"node-name\":\"disk0\"}" \
+         -blockdev "{\"driver\":\"qcow2\",\"file\":\"disk0\",\"node-name\":\"disk1\"}"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/310.out b/tests/qemu-iotests/310.out
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 1d0252e1f0..fb2a442e9f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -290,6 +290,7 @@
 277 rw quick
 279 rw backing quick
 280 rw migration quick
+310 rw quick
 281 rw quick
 282 rw img quick
 283 auto quick
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] migration: improve error reporting of block driver state name
  2020-07-27 15:08 ` [PATCH v2 1/6] migration: improve error reporting of block driver state name Daniel P. Berrangé
@ 2020-08-12 10:19   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-12 10:19 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, Markus Armbruster, qemu-devel, Max Reitz,
	John Snow, Pavel Dovgalyuk, Paolo Bonzini

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> With blockdev, a BlockDriverState may not have a device name,
> so using a node name is required as an alternative.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c         | 12 ++++++------
>  tests/qemu-iotests/267.out |  4 ++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 45c9dd9d8a..cffee6cab7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2655,7 +2655,7 @@ int save_snapshot(const char *name, Error **errp)
>  
>      if (!bdrv_all_can_snapshot(&bs)) {
>          error_setg(errp, "Device '%s' is writable but does not support "
> -                   "snapshots", bdrv_get_device_name(bs));
> +                   "snapshots", bdrv_get_device_or_node_name(bs));
>          return ret;
>      }
>  
> @@ -2664,7 +2664,7 @@ int save_snapshot(const char *name, Error **errp)
>          ret = bdrv_all_delete_snapshot(name, &bs1, errp);
>          if (ret < 0) {
>              error_prepend(errp, "Error while deleting snapshot on device "
> -                          "'%s': ", bdrv_get_device_name(bs1));
> +                          "'%s': ", bdrv_get_device_or_node_name(bs1));
>              return ret;
>          }
>      }
> @@ -2739,7 +2739,7 @@ int save_snapshot(const char *name, Error **errp)
>      ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
>      if (ret < 0) {
>          error_setg(errp, "Error while creating snapshot on '%s'",
> -                   bdrv_get_device_name(bs));
> +                   bdrv_get_device_or_node_name(bs));
>          goto the_end;
>      }
>  
> @@ -2857,14 +2857,14 @@ int load_snapshot(const char *name, Error **errp)
>      if (!bdrv_all_can_snapshot(&bs)) {
>          error_setg(errp,
>                     "Device '%s' is writable but does not support snapshots",
> -                   bdrv_get_device_name(bs));
> +                   bdrv_get_device_or_node_name(bs));
>          return -ENOTSUP;
>      }
>      ret = bdrv_all_find_snapshot(name, &bs);
>      if (ret < 0) {
>          error_setg(errp,
>                     "Device '%s' does not have the requested snapshot '%s'",
> -                   bdrv_get_device_name(bs), name);
> +                   bdrv_get_device_or_node_name(bs), name);
>          return ret;
>      }
>  
> @@ -2893,7 +2893,7 @@ int load_snapshot(const char *name, Error **errp)
>      ret = bdrv_all_goto_snapshot(name, &bs, errp);
>      if (ret < 0) {
>          error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
> -                      name, bdrv_get_device_name(bs));
> +                      name, bdrv_get_device_or_node_name(bs));
>          goto err_drain;
>      }
>  
> diff --git a/tests/qemu-iotests/267.out b/tests/qemu-iotests/267.out
> index d6d80c099f..215902b3ad 100644
> --- a/tests/qemu-iotests/267.out
> +++ b/tests/qemu-iotests/267.out
> @@ -81,11 +81,11 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
>  Testing: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=file
>  QEMU X.Y.Z monitor - type 'help' for more information
>  (qemu) savevm snap0
> -Error: Device '' is writable but does not support snapshots
> +Error: Device 'file' is writable but does not support snapshots
>  (qemu) info snapshots
>  No available block device supports snapshots
>  (qemu) loadvm snap0
> -Error: Device '' is writable but does not support snapshots
> +Error: Device 'file' is writable but does not support snapshots
>  (qemu) quit
>  
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] migration: stop returning errno from load_snapshot()
  2020-07-27 15:08 ` [PATCH v2 3/6] migration: stop returning errno from load_snapshot() Daniel P. Berrangé
@ 2020-08-12 10:21   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-12 10:21 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, Markus Armbruster, qemu-devel, Max Reitz,
	John Snow, Pavel Dovgalyuk, Paolo Bonzini

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> None of the callers care about the errno value since there is a full
> Error object populated. This gives consistency with save_snapshot()
> which already just returns -1.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(Note this is true of snapshots only; in postcopy there are places we
care about the errno for recovery)

> ---
>  migration/savevm.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 19259ef7c0..6c4d80fc5a 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2843,20 +2843,20 @@ int load_snapshot(const char *name, Error **errp)
>      if (!replay_can_snapshot()) {
>          error_setg(errp, "Record/replay does not allow loading snapshot "
>                     "right now. Try once more later.");
> -        return -EINVAL;
> +        return -1;
>      }
>  
>      if (!bdrv_all_can_snapshot(errp)) {
> -        return -ENOTSUP;
> +        return -1;
>      }
>      ret = bdrv_all_find_snapshot(name, errp);
>      if (ret < 0) {
> -        return ret;
> +        return -1;
>      }
>  
>      bs_vm_state = bdrv_all_find_vmstate_bs(errp);
>      if (!bs_vm_state) {
> -        return -ENOTSUP;
> +        return -1;
>      }
>      aio_context = bdrv_get_aio_context(bs_vm_state);
>  
> @@ -2865,11 +2865,11 @@ int load_snapshot(const char *name, Error **errp)
>      ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
>      aio_context_release(aio_context);
>      if (ret < 0) {
> -        return ret;
> +        return -1;
>      } else if (sn.vm_state_size == 0) {
>          error_setg(errp, "This is a disk-only snapshot. Revert to it "
>                     " offline using qemu-img");
> -        return -EINVAL;
> +        return -1;
>      }
>  
>      /* Flush all IO requests so they don't interfere with the new state.  */
> @@ -2884,7 +2884,6 @@ int load_snapshot(const char *name, Error **errp)
>      f = qemu_fopen_bdrv(bs_vm_state, 0);
>      if (!f) {
>          error_setg(errp, "Could not open VM state file");
> -        ret = -EINVAL;
>          goto err_drain;
>      }
>  
> @@ -2900,14 +2899,14 @@ int load_snapshot(const char *name, Error **errp)
>  
>      if (ret < 0) {
>          error_setg(errp, "Error %d while loading VM state", ret);
> -        return ret;
> +        return -1;
>      }
>  
>      return 0;
>  
>  err_drain:
>      bdrv_drain_all_end();
> -    return ret;
> +    return -1;
>  }
>  
>  void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
                   ` (5 preceding siblings ...)
  2020-07-27 15:08 ` [PATCH v2 6/6] migration: introduce snapshot-{save, load, delete} QMP commands Daniel P. Berrangé
@ 2020-08-21 16:27 ` Daniel P. Berrangé
  2020-08-26 15:52 ` Markus Armbruster
  7 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-08-21 16:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, Markus Armbruster, Dr. David Alan Gilbert,
	Max Reitz, John Snow, Pavel Dovgalyuk, Paolo Bonzini

On Mon, Jul 27, 2020 at 04:08:37PM +0100, Daniel P. Berrangé wrote:
> A followup to:
> 
>  v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html

snip

> HELP NEEDED:  this series starts to implement the approach that Kevin
> suggested wrto use of generic jobs.
> 
> When I try to actually run the code though it crashes:
> 
> ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth=
> read: assertion failed: (qemu_mutex_iothread_locked())
> Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u=
> nlock_iothread: assertion failed: (qemu_mutex_iothread_locked())
> 
> Obviously I've missed something related to locking, but I've no idea
> what, so I'm sending this v2 simply as a way to solicit suggestions
> for what I've messed up.

What I've found is

qmp_snapshot_save() is the QMP handler and runs in the main thread, so iothread
lock is held.


This calls job_create() which ends up invoking  snapshot_save_job_run
in a background coroutine, but IIUC  iothread lock is still held when
the coroutine starts.

This then invokes save_snapshot() which invokes qemu_savevm_state


This calls   qemu_mutex_unlock_iothread() and then 
qemu_savevm_state_setup().

Eventually something in the qcow2 code triggers qemu_coroutine_yield()
so control goes back to the main event loop thread.


The problem is that the iothread lock has been released, but the main
event loop thread is still expecting it to be held.

I've no idea how to go about solving this problem.


The save_snapshot() code, as written today, needs to run serialized with
everything else, but because the job framework has used a coroutine to
run it, we can switch back to the main event thread at any time.

I don't know how to force save_snapshot() to be serialized when using
the generic job framework.


> 
> You can reproduce with I/O tests using "check -qcow2 310"  and it
> gave a stack:
> 
>   Thread 5 (Thread 0x7fffe6e4c700 (LWP 3399011)):
>   #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x5555566a9fd8) at ../sysdeps/nptl/futex-internal.h:183
>   #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555556227160 <qemu_global_mutex>, cond=0x5555566a9fb0) at pthread_cond_wait.c:508
>   #2  __pthread_cond_wait (cond=cond@entry=0x5555566a9fb0, mutex=mutex@entry=0x555556227160 <qemu_global_mutex>) at pthread_cond_wait.c:638
>   #3  0x0000555555ceb6cb in qemu_cond_wait_impl (cond=0x5555566a9fb0, mutex=0x555556227160 <qemu_global_mutex>, file=0x555555d44198 "/home/berrange/src/virt/qemu/softmmu/cpus.c", line=1145) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:174
>   #4  0x0000555555931974 in qemu_wait_io_event (cpu=cpu@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1145
>   #5  0x0000555555933a89 in qemu_dummy_cpu_thread_fn (arg=arg@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1241
>   #6  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe6e476f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
>   #7  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
>   #8  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>   
>   Thread 4 (Thread 0x7fffe764d700 (LWP 3399010)):
>   #0  0x00007ffff4effb6f in __GI___poll (fds=0x7fffdc006ec0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
>   #1  0x00007ffff7c1aace in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
>   #2  0x00007ffff7c1ae53 in g_main_loop_run () at /lib64/libglib-2.0.so.0
>   #3  0x00005555559a9d81 in iothread_run (opaque=opaque@entry=0x55555632f200) at /home/berrange/src/virt/qemu/iothread.c:82
>   #4  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe76486f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
>   #5  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
>   #6  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>   
>   Thread 3 (Thread 0x7fffe7e4e700 (LWP 3399009)):
>   #0  0x00007ffff4fe5c58 in futex_abstimed_wait_cancelable (private=0, abstime=0x7fffe7e49650, clockid=0, expected=0, futex_word=0x5555562bf888) at ../sysdeps/nptl/futex-internal.h:320
>   #1  do_futex_wait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:112
>   #2  0x00007ffff4fe5d83 in __new_sem_wait_slow (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:184
>   #3  0x00007ffff4fe5e12 in sem_timedwait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650) at sem_timedwait.c:40
>   #4  0x0000555555cebbdf in qemu_sem_timedwait (sem=sem@entry=0x5555562bf888, ms=ms@entry=10000) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:307
>   #5  0x0000555555d03fa4 in worker_thread (opaque=opaque@entry=0x5555562bf810) at /home/berrange/src/virt/qemu/util/thread-pool.c:91
>   #6  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe7e496f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
>   #7  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
>   #8  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>   
>   Thread 2 (Thread 0x7fffe8750700 (LWP 3399008)):
>   #0  0x00007ffff4ed1871 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffe874b670, rem=0x7fffe874b680) at ../sysdeps/unix/sysv/linux/--Type <RET> for more, q to quit, c to continue without paging--
>   clock_nanosleep.c:48
>   #1  0x00007ffff4ed71c7 in __GI___nanosleep (requested_time=<optimized out>, remaining=<optimized out>) at nanosleep.c:27
>   #2  0x00007ffff7c462f7 in g_usleep () at /lib64/libglib-2.0.so.0
>   #3  0x0000555555cf3fd0 in call_rcu_thread (opaque=opaque@entry=0x0) at /home/berrange/src/virt/qemu/util/rcu.c:250
>   #4  0x0000555555ceb049 in qemu_thread_start (args=0x7fffe874b6f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521
>   #5  0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477
>   #6  0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>   
>   Thread 1 (Thread 0x7fffe88abec0 (LWP 3398996)):
>   #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>   #1  0x00007ffff4e2e895 in __GI_abort () at abort.c:79
>   #2  0x00007ffff7be5b8c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0
>   #3  0x00007ffff7c43a1f in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
>   #4  0x0000555555932da0 in qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1788
>   #5  qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1786
>   #6  0x0000555555cfeceb in os_host_main_loop_wait (timeout=26359275747000) at /home/berrange/src/virt/qemu/util/main-loop.c:232
>   #7  main_loop_wait (nonblocking=nonblocking@entry=0) at /home/berrange/src/virt/qemu/util/main-loop.c:516
>   #8  0x0000555555941f27 in qemu_main_loop () at /home/berrange/src/virt/qemu/softmmu/vl.c:1676
>   #9  0x000055555581a18e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/berrange/src/virt/qemu/softmmu/main.c:49
>   (gdb)

The coroutine trace to go with this is

(gdb) qemu coroutine 0x5555575b86b0
#0  qemu_coroutine_switch (from_=<optimized out>, to_=<optimized out>, action=action@entry=COROUTINE_YIELD)
    at /home/berrange/src/virt/qemu/util/coroutine-ucontext.c:303
#1  0x0000555555cfbccd in qemu_coroutine_yield () at /home/berrange/src/virt/qemu/util/qemu-coroutine.c:193
#2  0x0000555555d00d49 in thread_pool_submit_co (pool=0x5555562b8810, func=func@entry=0x555555c3a2d0 <handle_aiocb_rw>, arg=arg@entry=0x7fffe61ff200)
    at /home/berrange/src/virt/qemu/util/thread-pool.c:288
#3  0x0000555555c3c4db in raw_thread_pool_submit (arg=0x7fffe61ff200, func=0x555555c3a2d0 <handle_aiocb_rw>, bs=0x55555653c2d0)
    at /home/berrange/src/virt/qemu/block/file-posix.c:1975
#4  raw_co_prw (bs=0x55555653c2d0, offset=131072, bytes=65536, qiov=0x7fffe61ff560, type=1) at /home/berrange/src/virt/qemu/block/file-posix.c:2022
#5  0x0000555555c43131 in bdrv_driver_preadv (bs=0x55555653c2d0, offset=131072, bytes=65536, qiov=0x7fffe61ff560, qiov_offset=<optimized out>, flags=0)
    at /home/berrange/src/virt/qemu/block/io.c:1139
#6  0x0000555555c47484 in bdrv_aligned_preadv
    (req=req@entry=0x7fffe61ff440, offset=131072, bytes=65536, align=<optimized out>, qiov=0x7fffe61ff560, qiov_offset=0, flags=0, child=0x5555562b2a80, child=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:1515
#7  0x0000555555c48955 in bdrv_co_preadv_part
    (child=0x5555562b2a80, offset=<optimized out>, bytes=<optimized out>, qiov=<optimized out>, qiov_offset=<optimized out>, flags=0)
    at /home/berrange/src/virt/qemu/block/io.c:1757
#8  0x0000555555c48ef4 in bdrv_run_co (opaque=0x7fffe61ff540, entry=0x555555c48a60 <bdrv_rw_co_entry>, bs=0x55555653c2d0)
    at /home/berrange/src/virt/qemu/block/io.c:915
#9  bdrv_prwv_co (flags=0, is_write=false, qiov=0x7fffe61ff560, offset=131072, child=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:966
#10 bdrv_preadv (qiov=0x7fffe61ff560, offset=131072, child=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:1024
#11 bdrv_pread (child=<optimized out>, offset=offset@entry=131072, buf=<optimized out>, bytes=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:1041
#12 0x0000555555c1fa2a in qcow2_cache_do_get (bs=0x555556542e00, c=0x55555654b130, offset=131072, table=0x7fffe61ff630, read_from_disk=<optimized out>)
    at /home/berrange/src/virt/qemu/block/qcow2-cache.c:49
#13 0x0000555555c14206 in qcow2_get_refcount (bs=bs@entry=0x555556542e00, cluster_index=0, refcount=refcount@entry=0x7fffe61ff670)
    at /home/berrange/src/virt/qemu/block/qcow2-refcount.c:271
#14 0x0000555555c1450d in alloc_clusters_noref (bs=bs@entry=0x555556542e00, size=140737054242416, size@entry=24, max=max@entry=72057594037927935)
    at /home/berrange/src/virt/qemu/block/qcow2-refcount.c:981
#15 0x0000555555c15534 in qcow2_alloc_clusters (bs=bs@entry=0x555556542e00, size=size@entry=24) at /home/berrange/src/virt/qemu/block/qcow2-refcount.c:1013
#16 0x0000555555c1a1f3 in qcow2_grow_l1_table (bs=0x555556542e00, min_size=min_size@entry=3, exact_size=exact_size@entry=false)
    at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:139
#17 0x0000555555c1a983 in get_cluster_table
    (bs=bs@entry=0x555556542e00, offset=offset@entry=1073741824, new_l2_slice=new_l2_slice@entry=0x7fffe61ff8b0, new_l2_index=new_l2_index@entry=0x7fffe61ff8a8)
    at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:688
#18 0x0000555555c1bee9 in handle_copied
    (m=0x7fffe61ff968, bytes=<synthetic pointer>, host_offset=<synthetic pointer>, guest_offset=1073741824, bs=0x555556542e00)
    at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:1185
#19 qcow2_alloc_cluster_offset
    (bs=bs@entry=0x555556542e00, offset=offset@entry=1073741824, bytes=bytes@entry=0x7fffe61ff95c, host_offset=host_offset@entry=0x7fffe61ff960, m=m@entry=0x7fffe61ff968) at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:1570
--Type <RET> for more, q to quit, c to continue without paging--
#20 0x0000555555c0adbc in qcow2_co_pwritev_part (bs=0x555556542e00, offset=1073741824, bytes=281, qiov=0x7fffe61ffa30, qiov_offset=0, flags=<optimized out>)
    at /home/berrange/src/virt/qemu/block/qcow2.c:2577
#21 0x0000555555c439f1 in bdrv_co_rw_vmstate (bs=0x555556542e00, qiov=<optimized out>, pos=<optimized out>, is_read=<optimized out>)
    at /home/berrange/src/virt/qemu/block/io.c:2660
#22 0x0000555555c45ecc in bdrv_co_rw_vmstate_entry (opaque=0x7fffe61ff9f0) at /home/berrange/src/virt/qemu/block/io.c:2674
#23 bdrv_run_co (opaque=0x7fffe61ff9f0, entry=0x555555c43a50 <bdrv_co_rw_vmstate_entry>, bs=0x555556542e00) at /home/berrange/src/virt/qemu/block/io.c:915
#24 bdrv_rw_vmstate (is_read=false, pos=0, qiov=0x7fffe61ff9e0, bs=0x555556542e00) at /home/berrange/src/virt/qemu/block/io.c:2688
#25 bdrv_writev_vmstate (bs=bs@entry=0x555556542e00, qiov=qiov@entry=0x7fffe61ffa30, pos=pos@entry=0) at /home/berrange/src/virt/qemu/block/io.c:2707
#26 0x0000555555b43ac8 in block_writev_buffer (opaque=0x555556542e00, iov=<optimized out>, iovcnt=<optimized out>, pos=0, errp=<optimized out>)
    at /home/berrange/src/virt/qemu/migration/savevm.c:139
#27 0x0000555555b4e698 in qemu_fflush (f=f@entry=0x5555566ae980) at /home/berrange/src/virt/qemu/migration/qemu-file.c:232
#28 0x000055555586bdeb in ram_save_setup (f=0x5555566ae980, opaque=<optimized out>) at /home/berrange/src/virt/qemu/migration/ram.c:2513
#29 0x0000555555b45bd1 in qemu_savevm_state_setup (f=f@entry=0x5555566ae980) at /home/berrange/src/virt/qemu/migration/savevm.c:1178
#30 0x0000555555b48713 in qemu_savevm_state (errp=0x5555575ba280, f=0x5555566ae980) at /home/berrange/src/virt/qemu/migration/savevm.c:1535
#31 save_snapshot (name=<optimized out>, vmstate=<optimized out>, devices=0x555556e50ba0, errp=errp@entry=0x5555575ba280)
    at /home/berrange/src/virt/qemu/migration/savevm.c:2744
#32 0x0000555555b48a8b in snapshot_save_job_run (job=0x5555575ba200, errp=0x5555575ba280) at /home/berrange/src/virt/qemu/migration/savevm.c:3016
#33 0x0000555555bef632 in job_co_entry (opaque=0x5555575ba200) at /home/berrange/src/virt/qemu/job.c:908
#34 0x0000555555cf74a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/berrange/src/virt/qemu/util/coroutine-ucontext.c:173
#35 0x00007ffff4e5a250 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
#36 0x00007fffffffc6a0 in  ()
#37 0x0000000000000000 in  ()


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
                   ` (6 preceding siblings ...)
  2020-08-21 16:27 ` [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
@ 2020-08-26 15:52 ` Markus Armbruster
  2020-08-26 18:28   ` Daniel P. Berrangé
  7 siblings, 1 reply; 23+ messages in thread
From: Markus Armbruster @ 2020-08-26 15:52 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

Sorry for taking so long to reply.

Daniel P. Berrangé <berrange@redhat.com> writes:

> A followup to:
>
>  v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html
>
> When QMP was first introduced some 10+ years ago now, the snapshot
> related commands (savevm/loadvm/delvm) were not converted. This was
> primarily because their implementation causes blocking of the thread
> running the monitor commands. This was (and still is) considered
> undesirable behaviour both in HMP and QMP.

One of several reasons.

> In theory someone was supposed to fix this flaw at some point in the
> past 10 years and bring them into the QMP world. Sadly, thus far it
> hasn't happened as people always had more important things to work
> on. Enterprise apps were much more interested in external snapshots
> than internal snapshots as they have many more features.

Several attempts have been made to bring the functionality to QMP.
Sadly, they went nowhere.

I posted an analysis of the issues in reply to one of the more serious
attempts:

    Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org>
    https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03593.html

I'd like to hear your take on it.  I append the relevant part for your
convenience.  Perhaps your code is already close to what I describe
there.  I'm interested in where it falls short.

> Meanwhile users still want to use internal snapshots as there is
> a certainly simplicity in having everything self-contained in one
> image, even though it has limitations. Thus the apps that end up
> executing the savevm/loadvm/delvm via the "human-monitor-command"
> QMP command.
>
> IOW, the problematic blocking behaviour that was one of the reasons
> for not having savevm/loadvm/delvm in QMP is experienced by applications
> regardless. By not portting the commands to QMP due to one design flaw,
> we've forced apps and users to suffer from other design flaws of HMP (
> bad error reporting, strong type checking of args, no introspection) for
> an additional 10 years. This feels rather sub-optimal :-(
>
> In practice users don't appear to care strongly about the fact that these
> commands block the VM while they run. I might have seen one bug report
> about it, but it certainly isn't something that comes up as a frequent
> topic except among us QEMU maintainers. Users do care about having
> access to the snapshot feature.
>
> Where I am seeing frequent complaints is wrt the use of OVMF combined
> with snapshots which has some serious pain points. This is getting worse
> as the push to ditch legacy BIOS in favour of UEFI gain momentum both
> across OS vendors and mgmt apps. Solving it requires new parameters to
> the commands, but doing this in HMP is super unappealing.
>
> After 10 years, I think it is time for us to be a little pragmatic about
> our handling of snapshots commands. My desire is that libvirt should never
> use "human-monitor-command" under any circumstances, because of the
> inherant flaws in HMP as a protocol for machine consumption.
>
> Thus in this series I'm proposing a fairly direct mapping of the existing
> HMP commands for savevm/loadvm/delvm into QMP as a first step. This does
> not solve the blocking thread problem, but it does put in a place a
> design using the jobs framework which can facilitate solving it later.
> It does also solve the error reporting, type checking and introspection
> problems inherant to HMP. So we're winning on 3 out of the 4 problems,
> and pushed apps to a QMP design that will let us solve the last
> remaining problem.
>
> With a QMP variant, we reasonably deal with the problems related to OVMF:
>
>  - The logic to pick which disk to store the vmstate in is not
>    satsifactory.
>
>    The first block driver state cannot be assumed to be the root disk
>    image, it might be OVMF varstore and we don't want to store vmstate
>    in there.

Yes, this is one of the issues.  Glad you're addressing it.

>  - The logic to decide which disks must be snapshotted is hardwired
>    to all disks which are writable
>
>    Again with OVMF there might be a writable varstore, but this can be
>    raw rather than qcow2 format, and thus unable to be snapshotted.
>    While users might wish to snapshot their varstore, in some/many/most
>    cases it is entirely uneccessary. Users are blocked from snapshotting
>    their VM though due to this varstore.

Another one.  Glad again.

> These are solved by adding two parameters to the commands. The first is
> a block device node name that identifies the image to store vmstate in,
> and the second is a list of node names to include for the snapshots.
> If the list of nodes isn't given, it falls back to the historical
> behaviour of using all disks matching some undocumented criteria.
>
> In the block code I've only dealt with node names for block devices, as
> IIUC, this is all that libvirt should need in the -blockdev world it now
> lives in. IOW, I've made not attempt to cope with people wanting to use
> these QMP commands in combination with -drive args.
>
> I've done some minimal work in libvirt to start to make use of the new
> commands to validate their functionality, but this isn't finished yet.
>
> My ultimate goal is to make the GNOME Boxes maintainer happy again by
> having internal snapshots work with OVMF:
>
>   https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
> f45c5f64048f16a6e
>
> HELP NEEDED:  this series starts to implement the approach that Kevin
> suggested wrto use of generic jobs.

Does this mean you're trying to use the jobs infrastructure?

> When I try to actually run the code though it crashes:
>
> ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth=
> read: assertion failed: (qemu_mutex_iothread_locked())
> Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u=
> nlock_iothread: assertion failed: (qemu_mutex_iothread_locked())
>
> Obviously I've missed something related to locking, but I've no idea
> what, so I'm sending this v2 simply as a way to solicit suggestions
> for what I've messed up.
>
> You can reproduce with I/O tests using "check -qcow2 310"  and it
> gave a stack:
[...]

Relevant part of Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org>

If we can't make a sane QMP interface, I'd rather have no QMP interface.
However, I believe we *can* make a sane QMP interface if we put in the
design work.

The design work must start with a review of what we're trying to
accomplish, and how to fit it into the rest of the system.  Here's my
attempt.  Since my knowledge on snapshots is rather superficial, I'm
cc'ing Kevin for additional snapshot expertise.  Kevin, please correct
me when I talk nonsense.  I'm further cc'ing Eric and Peter for the
management layer perspective.

A point-in-time snapshot of a system consists of a snapshot of its
machine state and snapshots of its storage.  All the snapshots need to
be made at the same point in time for the result to be consistent.

Snapshots of read-only storage carry no information and are commonly
omitted.

Isolated storage snapshots can make sense, but snapshotting the machine
state without also snapshotting the machine's storage doesn't sound
useful to me.

Both storage and machine state snapshots come in two flavours: internal
and external.

External ones can be made with any block backend, but internal storage
snapshots work only with certain formats, notably qcow2.  QMP supports
both kinds of storage snapshots.

Both kinds of storage snapshots need exclusive access while they work.
They're relatively quick, but the delay could be noticable for large
internal snapshots, and perhaps for external snapshots on really slow
storage.

Internal machine state snapshots are currently only available via HMP's
savevm, which integrates internal machine state and storage snapshots.
This is non-live, i.e. the guest is stopped while the snapshot gets
saved.  I figure we could make it live if we really wanted to.  Another
instance of the emerging background job concept.

On the implementation level, QCOW2 can't currently store a machine state
snapshot without also storing a storage snapshot.  I guess we could
change this if we really wanted to.

External machine state snapshots are basically migrate to file.
Supported by QMP.

Live migration to file is possible, but currently wastes space, because
memory dirtied during migration gets saved multiple times.  Could be
fixed either by making migration update previously saved memory instead
of appending (beware, random I/O), or by compacting the file afterwards.

Non-live migration to file doesn't waste space that way.

To take multiple *consistent* snapshots, you have to bundle them up in a
transaction.  Transactions currently support only *storage* snapshots,
though.

Work-around for external machine state snapshot: migrate to file
(possibly live), leaving the guest sopped on completion, take storage
snapshots, resume guest.

You can combine internal and external storage snapshots with an external
machine state snapshot to get a mixed system snapshot.

You currently can't do that with an internal machine state snapshot: the
only way to take one is HMP savevm, which insists on internally
snapshotting all writable storage, and doesn't transact together with
external storage snapshots.

Except for the case "purely internal snapshot with just one writable
storage device", a system snapshot consists of multiple parts stored in
separate files.  Tying the parts together is a management problem.  QEMU
provides rudimentary management for purely internal snapshots, but it's
flawed: missing storage isn't detected, and additional storage can creep
in if snapshot tags or IDs happen to match.  I guess managing the parts
is better left to the management layer.

I figure a fully general QMP interface would let you perform a system
snapshot by combining storage snapshots of either kind with either kind
of machine state snapshot.

We already have most of the building blocks: we can take external and
internal storage snapshots, and combine them in transactions.

What's missing is transactionable machine state snapshots.

We know how to work around it for external machine state snapshots (see
above), but a transaction-based solution might be nicer.

Any solution for internal machine state snapshots in QMP should at least
try to fit into this.  Some restrictions are okay.  For instance, we
probably need to restrict internal machine state snapshots to piggyback
on an internal storage snapshot for now, so we don't have to dig up
QCOW2 just to get QMP support.

We can talk about more convenient interfaces for common special cases,
but I feel we need to design for the general case.  We don't have to
implement the completely general case right away, though.  As long as we
know where we want to go, incremental steps towards that goal are fine.

Can we create a transactionable QMP command to take an internal machine
state snapshot?

This would be like HMP savevm with the following differences:

* Separate parameters for tag and ID.  I'll have none of this
  overloading nonsense in QMP.

* Specify the destination block device.  I'll have none of this "pick a
  device in some magic, undocumented way" in QMP.

* Leave alone the other block devices.  Adding transaction actions to
  snapshot all the writable block devices to get a full system snapshot
  is the user's responsibility.

Limitations:

* No live internal machine snapshot, yet.

* The storage device taking the internal snapshot must also be
  internally snapshot for now.  In fact, the command does both
  (tolerable wart).

Open questions:

* Do we want the QMP command to delete existing snapshots with
  conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
  the transaction?

* Do we need transactions for switching to a system snapshot, too?

Opinions?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-26 15:52 ` Markus Armbruster
@ 2020-08-26 18:28   ` Daniel P. Berrangé
  2020-08-26 18:34     ` Daniel P. Berrangé
  2020-08-27 11:04     ` Markus Armbruster
  0 siblings, 2 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-08-26 18:28 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> Sorry for taking so long to reply.
> 
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > A followup to:
> >
> >  v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html
> >
> > When QMP was first introduced some 10+ years ago now, the snapshot
> > related commands (savevm/loadvm/delvm) were not converted. This was
> > primarily because their implementation causes blocking of the thread
> > running the monitor commands. This was (and still is) considered
> > undesirable behaviour both in HMP and QMP.
> 
> One of several reasons.
> 
> > In theory someone was supposed to fix this flaw at some point in the
> > past 10 years and bring them into the QMP world. Sadly, thus far it
> > hasn't happened as people always had more important things to work
> > on. Enterprise apps were much more interested in external snapshots
> > than internal snapshots as they have many more features.
> 
> Several attempts have been made to bring the functionality to QMP.
> Sadly, they went nowhere.
> 
> I posted an analysis of the issues in reply to one of the more serious
> attempts:
> 
>     Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org>
>     https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03593.html
> 
> I'd like to hear your take on it.  I append the relevant part for your
> convenience.  Perhaps your code is already close to what I describe
> there.  I'm interested in where it falls short.
> 
> > Meanwhile users still want to use internal snapshots as there is
> > a certainly simplicity in having everything self-contained in one
> > image, even though it has limitations. Thus the apps that end up
> > executing the savevm/loadvm/delvm via the "human-monitor-command"
> > QMP command.
> >
> > IOW, the problematic blocking behaviour that was one of the reasons
> > for not having savevm/loadvm/delvm in QMP is experienced by applications
> > regardless. By not portting the commands to QMP due to one design flaw,
> > we've forced apps and users to suffer from other design flaws of HMP (
> > bad error reporting, strong type checking of args, no introspection) for
> > an additional 10 years. This feels rather sub-optimal :-(
> >
> > In practice users don't appear to care strongly about the fact that these
> > commands block the VM while they run. I might have seen one bug report
> > about it, but it certainly isn't something that comes up as a frequent
> > topic except among us QEMU maintainers. Users do care about having
> > access to the snapshot feature.
> >
> > Where I am seeing frequent complaints is wrt the use of OVMF combined
> > with snapshots which has some serious pain points. This is getting worse
> > as the push to ditch legacy BIOS in favour of UEFI gain momentum both
> > across OS vendors and mgmt apps. Solving it requires new parameters to
> > the commands, but doing this in HMP is super unappealing.
> >
> > After 10 years, I think it is time for us to be a little pragmatic about
> > our handling of snapshots commands. My desire is that libvirt should never
> > use "human-monitor-command" under any circumstances, because of the
> > inherant flaws in HMP as a protocol for machine consumption.
> >
> > Thus in this series I'm proposing a fairly direct mapping of the existing
> > HMP commands for savevm/loadvm/delvm into QMP as a first step. This does
> > not solve the blocking thread problem, but it does put in a place a
> > design using the jobs framework which can facilitate solving it later.
> > It does also solve the error reporting, type checking and introspection
> > problems inherant to HMP. So we're winning on 3 out of the 4 problems,
> > and pushed apps to a QMP design that will let us solve the last
> > remaining problem.
> >
> > With a QMP variant, we reasonably deal with the problems related to OVMF:
> >
> >  - The logic to pick which disk to store the vmstate in is not
> >    satsifactory.
> >
> >    The first block driver state cannot be assumed to be the root disk
> >    image, it might be OVMF varstore and we don't want to store vmstate
> >    in there.
> 
> Yes, this is one of the issues.  Glad you're addressing it.
> 
> >  - The logic to decide which disks must be snapshotted is hardwired
> >    to all disks which are writable
> >
> >    Again with OVMF there might be a writable varstore, but this can be
> >    raw rather than qcow2 format, and thus unable to be snapshotted.
> >    While users might wish to snapshot their varstore, in some/many/most
> >    cases it is entirely uneccessary. Users are blocked from snapshotting
> >    their VM though due to this varstore.
> 
> Another one.  Glad again.
> 
> > These are solved by adding two parameters to the commands. The first is
> > a block device node name that identifies the image to store vmstate in,
> > and the second is a list of node names to include for the snapshots.
> > If the list of nodes isn't given, it falls back to the historical
> > behaviour of using all disks matching some undocumented criteria.
> >
> > In the block code I've only dealt with node names for block devices, as
> > IIUC, this is all that libvirt should need in the -blockdev world it now
> > lives in. IOW, I've made not attempt to cope with people wanting to use
> > these QMP commands in combination with -drive args.
> >
> > I've done some minimal work in libvirt to start to make use of the new
> > commands to validate their functionality, but this isn't finished yet.
> >
> > My ultimate goal is to make the GNOME Boxes maintainer happy again by
> > having internal snapshots work with OVMF:
> >
> >   https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
> > f45c5f64048f16a6e
> >
> > HELP NEEDED:  this series starts to implement the approach that Kevin
> > suggested wrto use of generic jobs.
> 
> Does this mean you're trying to use the jobs infrastructure?

Yes, this is working now.


> Relevant part of Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org>
> 
> If we can't make a sane QMP interface, I'd rather have no QMP interface.

I strongly disagree with this. This absolutist position is why we
have made zero progress in 10+ years, leaving mgmt apps suffering
with HMP passthrough, as described above. 

This is a prime example of perfect being the enemy of good.

> However, I believe we *can* make a sane QMP interface if we put in the
> design work.

We should make a credible attempt at QMP design, but if perfection
isn't practical, we should none the less do *something* in QMP, even
if we find we need to deprecate & replace it later.

> The design work must start with a review of what we're trying to
> accomplish, and how to fit it into the rest of the system.  Here's my
> attempt.  Since my knowledge on snapshots is rather superficial, I'm
> cc'ing Kevin for additional snapshot expertise.  Kevin, please correct
> me when I talk nonsense.  I'm further cc'ing Eric and Peter for the
> management layer perspective.

The things I'm trying to accomplish as listed in the text above.
Primarily this is about fixing the ability to snapshot guests where
QEMU's heuristics for picking block devices is broken. ie explicitly
list the disks to snapshot and which to store vmstate in.

Converting from HMP to QMP is esentially an enabler, because adding
new args to existing savevm/loadvm HMP commands is just too horrible
to contemplate, as mgmt apps have no sane way to probe HMP. That's
what QMP is for.

Essentially the goal is to fix the inability to use internal snapshots
with UEFI based VMs that is causing immediate pain for mgmt apps due
to increased need to support UEFI.

> A point-in-time snapshot of a system consists of a snapshot of its
> machine state and snapshots of its storage.  All the snapshots need to
> be made at the same point in time for the result to be consistent.
> 
> Snapshots of read-only storage carry no information and are commonly
> omitted.
> 
> Isolated storage snapshots can make sense, but snapshotting the machine
> state without also snapshotting the machine's storage doesn't sound
> useful to me.
> 
> Both storage and machine state snapshots come in two flavours: internal
> and external.
> 
> External ones can be made with any block backend, but internal storage
> snapshots work only with certain formats, notably qcow2.  QMP supports
> both kinds of storage snapshots.
> 
> Both kinds of storage snapshots need exclusive access while they work.
> They're relatively quick, but the delay could be noticable for large
> internal snapshots, and perhaps for external snapshots on really slow
> storage.
> 
> Internal machine state snapshots are currently only available via HMP's
> savevm, which integrates internal machine state and storage snapshots.
> This is non-live, i.e. the guest is stopped while the snapshot gets
> saved.  I figure we could make it live if we really wanted to.  Another
> instance of the emerging background job concept.
> 
> On the implementation level, QCOW2 can't currently store a machine state
> snapshot without also storing a storage snapshot.  I guess we could
> change this if we really wanted to.
> 
> External machine state snapshots are basically migrate to file.
> Supported by QMP.
> 
> Live migration to file is possible, but currently wastes space, because
> memory dirtied during migration gets saved multiple times.  Could be
> fixed either by making migration update previously saved memory instead
> of appending (beware, random I/O), or by compacting the file afterwards.
> 
> Non-live migration to file doesn't waste space that way.
> 
> To take multiple *consistent* snapshots, you have to bundle them up in a
> transaction.  Transactions currently support only *storage* snapshots,
> though.
> 
> Work-around for external machine state snapshot: migrate to file
> (possibly live), leaving the guest sopped on completion, take storage
> snapshots, resume guest.
> 
> You can combine internal and external storage snapshots with an external
> machine state snapshot to get a mixed system snapshot.
> 
> You currently can't do that with an internal machine state snapshot: the
> only way to take one is HMP savevm, which insists on internally
> snapshotting all writable storage, and doesn't transact together with
> external storage snapshots.
> 
> Except for the case "purely internal snapshot with just one writable
> storage device", a system snapshot consists of multiple parts stored in
> separate files.  Tying the parts together is a management problem.  QEMU
> provides rudimentary management for purely internal snapshots, but it's
> flawed: missing storage isn't detected, and additional storage can creep
> in if snapshot tags or IDs happen to match.  I guess managing the parts
> is better left to the management layer.
> 
> I figure a fully general QMP interface would let you perform a system
> snapshot by combining storage snapshots of either kind with either kind
> of machine state snapshot.
> 
> We already have most of the building blocks: we can take external and
> internal storage snapshots, and combine them in transactions.
> 
> What's missing is transactionable machine state snapshots.
> 
> We know how to work around it for external machine state snapshots (see
> above), but a transaction-based solution might be nicer.
> 
> Any solution for internal machine state snapshots in QMP should at least
> try to fit into this.  Some restrictions are okay.  For instance, we
> probably need to restrict internal machine state snapshots to piggyback
> on an internal storage snapshot for now, so we don't have to dig up
> QCOW2 just to get QMP support.

From the POV of practicality, making a design that unifies internal
and external snapshots is something I'm considering out of scope.
It increases the design time burden, as well as implementation burden.
On my side, improving internal snapshots is a "spare time" project,
not something I can justify spending weeks or months on.

My goal is to implement something that is achievable in a short
amount of time that gets us out of the hole we've been in for 10
years. Minimal refactoring of the internal snapshot code aside
from fixing the critical limitations we have today around choice
of disks to snapshot.

If someone later wants to come up with a grand unified design
for everything that's fine, we can deprecate the new QMP commands
I'm proposing now.

> We can talk about more convenient interfaces for common special cases,
> but I feel we need to design for the general case.  We don't have to
> implement the completely general case right away, though.  As long as we
> know where we want to go, incremental steps towards that goal are fine.
> 
> Can we create a transactionable QMP command to take an internal machine
> state snapshot?
> 
> This would be like HMP savevm with the following differences:
> 
> * Separate parameters for tag and ID.  I'll have none of this
>   overloading nonsense in QMP.
> 
> * Specify the destination block device.  I'll have none of this "pick a
>   device in some magic, undocumented way" in QMP.
> 
> * Leave alone the other block devices.  Adding transaction actions to
>   snapshot all the writable block devices to get a full system snapshot
>   is the user's responsibility.
> 
> Limitations:
> 
> * No live internal machine snapshot, yet.
> 
> * The storage device taking the internal snapshot must also be
>   internally snapshot for now.  In fact, the command does both
>   (tolerable wart).
> 
> Open questions:
> 
> * Do we want the QMP command to delete existing snapshots with
>   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
>   the transaction?

The intent is for the QMP commands to operate exclusively on
'tags', and never consider "ID".

> * Do we need transactions for switching to a system snapshot, too?
> 
> Opinions?



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-26 18:28   ` Daniel P. Berrangé
@ 2020-08-26 18:34     ` Daniel P. Berrangé
  2020-08-27 11:06       ` Markus Armbruster
  2020-08-27 11:04     ` Markus Armbruster
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-08-26 18:34 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> > Open questions:
> > 
> > * Do we want the QMP command to delete existing snapshots with
> >   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
> >   the transaction?
> 
> The intent is for the QMP commands to operate exclusively on
> 'tags', and never consider "ID".

I forgot that even HMP ignores "ID" now and works exclusively in terms
of tags since:


  commit 6ca080453ea403959ccde661030ca16264acc181
  Author: Daniel Henrique Barboza <danielhb413@gmail.com>
  Date:   Wed Nov 7 11:09:58 2018 -0200

    block/snapshot.c: eliminate use of ID input in snapshot operations
    

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-26 18:28   ` Daniel P. Berrangé
  2020-08-26 18:34     ` Daniel P. Berrangé
@ 2020-08-27 11:04     ` Markus Armbruster
  2020-08-27 11:34       ` Daniel P. Berrangé
  1 sibling, 1 reply; 23+ messages in thread
From: Markus Armbruster @ 2020-08-27 11:04 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
>> Sorry for taking so long to reply.
>> 
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > A followup to:
>> >
>> >  v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html
>> >
>> > When QMP was first introduced some 10+ years ago now, the snapshot
>> > related commands (savevm/loadvm/delvm) were not converted. This was
>> > primarily because their implementation causes blocking of the thread
>> > running the monitor commands. This was (and still is) considered
>> > undesirable behaviour both in HMP and QMP.
>> 
>> One of several reasons.
>> 
>> > In theory someone was supposed to fix this flaw at some point in the
>> > past 10 years and bring them into the QMP world. Sadly, thus far it
>> > hasn't happened as people always had more important things to work
>> > on. Enterprise apps were much more interested in external snapshots
>> > than internal snapshots as they have many more features.
>> 
>> Several attempts have been made to bring the functionality to QMP.
>> Sadly, they went nowhere.
>> 
>> I posted an analysis of the issues in reply to one of the more serious
>> attempts:
>> 
>>     Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org>
>>     https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03593.html
>> 
>> I'd like to hear your take on it.  I append the relevant part for your
>> convenience.  Perhaps your code is already close to what I describe
>> there.  I'm interested in where it falls short.
>> 
>> > Meanwhile users still want to use internal snapshots as there is
>> > a certainly simplicity in having everything self-contained in one
>> > image, even though it has limitations. Thus the apps that end up
>> > executing the savevm/loadvm/delvm via the "human-monitor-command"
>> > QMP command.
>> >
>> > IOW, the problematic blocking behaviour that was one of the reasons
>> > for not having savevm/loadvm/delvm in QMP is experienced by applications
>> > regardless. By not portting the commands to QMP due to one design flaw,
>> > we've forced apps and users to suffer from other design flaws of HMP (
>> > bad error reporting, strong type checking of args, no introspection) for
>> > an additional 10 years. This feels rather sub-optimal :-(
>> >
>> > In practice users don't appear to care strongly about the fact that these
>> > commands block the VM while they run. I might have seen one bug report
>> > about it, but it certainly isn't something that comes up as a frequent
>> > topic except among us QEMU maintainers. Users do care about having
>> > access to the snapshot feature.
>> >
>> > Where I am seeing frequent complaints is wrt the use of OVMF combined
>> > with snapshots which has some serious pain points. This is getting worse
>> > as the push to ditch legacy BIOS in favour of UEFI gain momentum both
>> > across OS vendors and mgmt apps. Solving it requires new parameters to
>> > the commands, but doing this in HMP is super unappealing.
>> >
>> > After 10 years, I think it is time for us to be a little pragmatic about
>> > our handling of snapshots commands. My desire is that libvirt should never
>> > use "human-monitor-command" under any circumstances, because of the
>> > inherant flaws in HMP as a protocol for machine consumption.
>> >
>> > Thus in this series I'm proposing a fairly direct mapping of the existing
>> > HMP commands for savevm/loadvm/delvm into QMP as a first step. This does
>> > not solve the blocking thread problem, but it does put in a place a
>> > design using the jobs framework which can facilitate solving it later.
>> > It does also solve the error reporting, type checking and introspection
>> > problems inherant to HMP. So we're winning on 3 out of the 4 problems,
>> > and pushed apps to a QMP design that will let us solve the last
>> > remaining problem.
>> >
>> > With a QMP variant, we reasonably deal with the problems related to OVMF:
>> >
>> >  - The logic to pick which disk to store the vmstate in is not
>> >    satsifactory.
>> >
>> >    The first block driver state cannot be assumed to be the root disk
>> >    image, it might be OVMF varstore and we don't want to store vmstate
>> >    in there.
>> 
>> Yes, this is one of the issues.  Glad you're addressing it.
>> 
>> >  - The logic to decide which disks must be snapshotted is hardwired
>> >    to all disks which are writable
>> >
>> >    Again with OVMF there might be a writable varstore, but this can be
>> >    raw rather than qcow2 format, and thus unable to be snapshotted.
>> >    While users might wish to snapshot their varstore, in some/many/most
>> >    cases it is entirely uneccessary. Users are blocked from snapshotting
>> >    their VM though due to this varstore.
>> 
>> Another one.  Glad again.
>> 
>> > These are solved by adding two parameters to the commands. The first is
>> > a block device node name that identifies the image to store vmstate in,
>> > and the second is a list of node names to include for the snapshots.
>> > If the list of nodes isn't given, it falls back to the historical
>> > behaviour of using all disks matching some undocumented criteria.
>> >
>> > In the block code I've only dealt with node names for block devices, as
>> > IIUC, this is all that libvirt should need in the -blockdev world it now
>> > lives in. IOW, I've made not attempt to cope with people wanting to use
>> > these QMP commands in combination with -drive args.
>> >
>> > I've done some minimal work in libvirt to start to make use of the new
>> > commands to validate their functionality, but this isn't finished yet.
>> >
>> > My ultimate goal is to make the GNOME Boxes maintainer happy again by
>> > having internal snapshots work with OVMF:
>> >
>> >   https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
>> > f45c5f64048f16a6e
>> >
>> > HELP NEEDED:  this series starts to implement the approach that Kevin
>> > suggested wrto use of generic jobs.
>> 
>> Does this mean you're trying to use the jobs infrastructure?
>
> Yes, this is working now.
>
>
>> Relevant part of Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org>
>> 
>> If we can't make a sane QMP interface, I'd rather have no QMP interface.
>
> I strongly disagree with this. This absolutist position is why we
> have made zero progress in 10+ years, leaving mgmt apps suffering
> with HMP passthrough, as described above. 
>
> This is a prime example of perfect being the enemy of good.

You seem to use a different version of 'sane' than I do :)

For me, a 'sane QMP interface' does not have to be perfect.  It should

* solve a problem people have (check)

* be non-magical (clearly feasible; just a bit more work than "slap QMP
  around the existing --- and less than sane --- HMP commands")

* be as general as we can make it, given the resources (this is a matter
  of thinking through alternatives and picking a reasonable one)

* allow compatible evolution towards a more perfect interface (this is a
  matter of thinking ahead, to reduce the risk of boxing ourselves into
  a corner)

>> However, I believe we *can* make a sane QMP interface if we put in the
>> design work.
>
> We should make a credible attempt at QMP design, but if perfection
> isn't practical, we should none the less do *something* in QMP, even
> if we find we need to deprecate & replace it later.

All the prior iterations failed at "attempt", i.e. long before coming
anywhere near "credible".

>> The design work must start with a review of what we're trying to
>> accomplish, and how to fit it into the rest of the system.  Here's my
>> attempt.  Since my knowledge on snapshots is rather superficial, I'm
>> cc'ing Kevin for additional snapshot expertise.  Kevin, please correct
>> me when I talk nonsense.  I'm further cc'ing Eric and Peter for the
>> management layer perspective.
>
> The things I'm trying to accomplish as listed in the text above.
> Primarily this is about fixing the ability to snapshot guests where
> QEMU's heuristics for picking block devices is broken. ie explicitly
> list the disks to snapshot and which to store vmstate in.
>
> Converting from HMP to QMP is esentially an enabler, because adding
> new args to existing savevm/loadvm HMP commands is just too horrible
> to contemplate, as mgmt apps have no sane way to probe HMP. That's
> what QMP is for.
>
> Essentially the goal is to fix the inability to use internal snapshots
> with UEFI based VMs that is causing immediate pain for mgmt apps due
> to increased need to support UEFI.

I think understand your goal.  Now try to understand mine: I want to
keep QMP supportable.  I'm willing to accept less than perfect
additions, but not when it's less than perfect just because people can't
be bothered to think about any other solution than the first that comes
to mind, in this case a 1:1 port of the existing HMP commands.

You've clearly put in some thought.  That's why I'm not sending you the
canned "the exact same thing has been tried before, I said no then
because of $reasons, the reasons haven't changed, and neither has my no"
letter.

>> A point-in-time snapshot of a system consists of a snapshot of its
>> machine state and snapshots of its storage.  All the snapshots need to
>> be made at the same point in time for the result to be consistent.
>> 
>> Snapshots of read-only storage carry no information and are commonly
>> omitted.
>> 
>> Isolated storage snapshots can make sense, but snapshotting the machine
>> state without also snapshotting the machine's storage doesn't sound
>> useful to me.
>> 
>> Both storage and machine state snapshots come in two flavours: internal
>> and external.
>> 
>> External ones can be made with any block backend, but internal storage
>> snapshots work only with certain formats, notably qcow2.  QMP supports
>> both kinds of storage snapshots.
>> 
>> Both kinds of storage snapshots need exclusive access while they work.
>> They're relatively quick, but the delay could be noticable for large
>> internal snapshots, and perhaps for external snapshots on really slow
>> storage.
>> 
>> Internal machine state snapshots are currently only available via HMP's
>> savevm, which integrates internal machine state and storage snapshots.
>> This is non-live, i.e. the guest is stopped while the snapshot gets
>> saved.  I figure we could make it live if we really wanted to.  Another
>> instance of the emerging background job concept.
>> 
>> On the implementation level, QCOW2 can't currently store a machine state
>> snapshot without also storing a storage snapshot.  I guess we could
>> change this if we really wanted to.
>> 
>> External machine state snapshots are basically migrate to file.
>> Supported by QMP.
>> 
>> Live migration to file is possible, but currently wastes space, because
>> memory dirtied during migration gets saved multiple times.  Could be
>> fixed either by making migration update previously saved memory instead
>> of appending (beware, random I/O), or by compacting the file afterwards.
>> 
>> Non-live migration to file doesn't waste space that way.
>> 
>> To take multiple *consistent* snapshots, you have to bundle them up in a
>> transaction.  Transactions currently support only *storage* snapshots,
>> though.
>> 
>> Work-around for external machine state snapshot: migrate to file
>> (possibly live), leaving the guest sopped on completion, take storage
>> snapshots, resume guest.
>> 
>> You can combine internal and external storage snapshots with an external
>> machine state snapshot to get a mixed system snapshot.
>> 
>> You currently can't do that with an internal machine state snapshot: the
>> only way to take one is HMP savevm, which insists on internally
>> snapshotting all writable storage, and doesn't transact together with
>> external storage snapshots.
>> 
>> Except for the case "purely internal snapshot with just one writable
>> storage device", a system snapshot consists of multiple parts stored in
>> separate files.  Tying the parts together is a management problem.  QEMU
>> provides rudimentary management for purely internal snapshots, but it's
>> flawed: missing storage isn't detected, and additional storage can creep
>> in if snapshot tags or IDs happen to match.  I guess managing the parts
>> is better left to the management layer.
>> 
>> I figure a fully general QMP interface would let you perform a system
>> snapshot by combining storage snapshots of either kind with either kind
>> of machine state snapshot.
>> 
>> We already have most of the building blocks: we can take external and
>> internal storage snapshots, and combine them in transactions.
>> 
>> What's missing is transactionable machine state snapshots.
>> 
>> We know how to work around it for external machine state snapshots (see
>> above), but a transaction-based solution might be nicer.
>> 
>> Any solution for internal machine state snapshots in QMP should at least
>> try to fit into this.  Some restrictions are okay.  For instance, we
>> probably need to restrict internal machine state snapshots to piggyback
>> on an internal storage snapshot for now, so we don't have to dig up
>> QCOW2 just to get QMP support.
>
> From the POV of practicality, making a design that unifies internal
> and external snapshots is something I'm considering out of scope.
> It increases the design time burden, as well as implementation burden.
> On my side, improving internal snapshots is a "spare time" project,
> not something I can justify spending weeks or months on.

I'm not demanding a solution that unifies internal and external
snapshots.  I'm asking for a bit of serious thought on an interface that
could compatibly evolve there.  Hours, not weeks or months.

> My goal is to implement something that is achievable in a short
> amount of time that gets us out of the hole we've been in for 10
> years. Minimal refactoring of the internal snapshot code aside
> from fixing the critical limitations we have today around choice
> of disks to snapshot.
>
> If someone later wants to come up with a grand unified design
> for everything that's fine, we can deprecate the new QMP commands
> I'm proposing now.

Failing at coming up with an interface that has a reasonable chance to
be future-proof is okay.

Not even trying is not okay.

Having not tried for 10+ years doesn't make it okay.

I believe I already did much of the design work back in 2016.  Now I'd
like you to engage with it a bit deeper than just "perfect is the enemy
of good; now take my patches, please".

Specifically, I'd like you to think about monolothic snapshot command
(internal snapshots only by design) vs. transaction of individual
snapshot commands (design is not restricted to internal snapshots, but
we may want to accept implementation restrictions).

We already have transactionable individual storage snapshot commands.
What's missing is a transactionable machine state snapshot command.

One restriction I'd readily accept at this time is "the machine state
snapshot must write to a QCOW2 that is also internally snapshot in the
same transaction".

Now explain to me why this is impractical.

>> We can talk about more convenient interfaces for common special cases,
>> but I feel we need to design for the general case.  We don't have to
>> implement the completely general case right away, though.  As long as we
>> know where we want to go, incremental steps towards that goal are fine.
>> 
>> Can we create a transactionable QMP command to take an internal machine
>> state snapshot?
>> 
>> This would be like HMP savevm with the following differences:
>> 
>> * Separate parameters for tag and ID.  I'll have none of this
>>   overloading nonsense in QMP.
>> 
>> * Specify the destination block device.  I'll have none of this "pick a
>>   device in some magic, undocumented way" in QMP.
>> 
>> * Leave alone the other block devices.  Adding transaction actions to
>>   snapshot all the writable block devices to get a full system snapshot
>>   is the user's responsibility.
>> 
>> Limitations:
>> 
>> * No live internal machine snapshot, yet.
>> 
>> * The storage device taking the internal snapshot must also be
>>   internally snapshot for now.  In fact, the command does both
>>   (tolerable wart).
>> 
>> Open questions:
>> 
>> * Do we want the QMP command to delete existing snapshots with
>>   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
>>   the transaction?
>
> The intent is for the QMP commands to operate exclusively on
> 'tags', and never consider "ID".
>
>> * Do we need transactions for switching to a system snapshot, too?
>> 
>> Opinions?
>
>
>
> Regards,
> Daniel



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-26 18:34     ` Daniel P. Berrangé
@ 2020-08-27 11:06       ` Markus Armbruster
  2020-08-27 13:13         ` Kevin Wolf
  0 siblings, 1 reply; 23+ messages in thread
From: Markus Armbruster @ 2020-08-27 11:06 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote:
>> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
>> > Open questions:
>> > 
>> > * Do we want the QMP command to delete existing snapshots with
>> >   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
>> >   the transaction?
>> 
>> The intent is for the QMP commands to operate exclusively on
>> 'tags', and never consider "ID".
>
> I forgot that even HMP ignores "ID" now and works exclusively in terms
> of tags since:
>
>
>   commit 6ca080453ea403959ccde661030ca16264acc181
>   Author: Daniel Henrique Barboza <danielhb413@gmail.com>
>   Date:   Wed Nov 7 11:09:58 2018 -0200
>
>     block/snapshot.c: eliminate use of ID input in snapshot operations

Almost a year after I sent the memo I quoted.  It's an incompatible
change, but nobody complained, and I'm glad we got this issue out of the
way.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-27 11:04     ` Markus Armbruster
@ 2020-08-27 11:34       ` Daniel P. Berrangé
  2020-09-01 13:22         ` Markus Armbruster
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 11:34 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> > From the POV of practicality, making a design that unifies internal
> > and external snapshots is something I'm considering out of scope.
> > It increases the design time burden, as well as implementation burden.
> > On my side, improving internal snapshots is a "spare time" project,
> > not something I can justify spending weeks or months on.
> 
> I'm not demanding a solution that unifies internal and external
> snapshots.  I'm asking for a bit of serious thought on an interface that
> could compatibly evolve there.  Hours, not weeks or months.
> 
> > My goal is to implement something that is achievable in a short
> > amount of time that gets us out of the hole we've been in for 10
> > years. Minimal refactoring of the internal snapshot code aside
> > from fixing the critical limitations we have today around choice
> > of disks to snapshot.
> >
> > If someone later wants to come up with a grand unified design
> > for everything that's fine, we can deprecate the new QMP commands
> > I'm proposing now.
> 
> Failing at coming up with an interface that has a reasonable chance to
> be future-proof is okay.
> 
> Not even trying is not okay.

This was raised in my v1 posting:

  https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html

but the conclusion was that it was a non-trivial amount of extra
implementation work


> Specifically, I'd like you to think about monolothic snapshot command
> (internal snapshots only by design) vs. transaction of individual
> snapshot commands (design is not restricted to internal snapshots, but
> we may want to accept implementation restrictions).
> 
> We already have transactionable individual storage snapshot commands.
> What's missing is a transactionable machine state snapshot command.

At a high level I consider what I've proposed as being higher level
syntax sugar vs a more generic future impl based on multiple commands
for snapshotting disk & state. I don't think I'd claim that it will
evolve to become the design you're suggesting here, as they are designs
from different levels in the stack. IOW, I think the would ultimately
just exist in parallel. I don't think that's a real problem from a
maint POV, as the large burden from the monolithic snapshot code is
not the HMP/QMP interface, but rather the guts of the internal
impl in the migration/savevm.c and block/snapshot.c files. That code
will exist for as long as the HMP commands exist, and adding a QMP
interface doesn't make that situation worse unless we were intending
to drop the existing HMP commands. If we did change our minds though,
we can just deprecate the QMP command at any time we like.


> One restriction I'd readily accept at this time is "the machine state
> snapshot must write to a QCOW2 that is also internally snapshot in the
> same transaction".
> 
> Now explain to me why this is impractical.

The issues were described by Kevin here:

  https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html

Assuming the migration impl is actually possible to solve, there is
still the question of actually writing it. That's a non-trivial
amount of work someone has to find time for.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-27 11:06       ` Markus Armbruster
@ 2020-08-27 13:13         ` Kevin Wolf
  2020-08-28  6:20           ` Markus Armbruster
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2020-08-27 13:13 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Pavel Dovgalyuk, Paolo Bonzini,
	Max Reitz, John Snow

Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote:
> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> >> > Open questions:
> >> > 
> >> > * Do we want the QMP command to delete existing snapshots with
> >> >   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
> >> >   the transaction?
> >> 
> >> The intent is for the QMP commands to operate exclusively on
> >> 'tags', and never consider "ID".
> >
> > I forgot that even HMP ignores "ID" now and works exclusively in terms
> > of tags since:
> >
> >
> >   commit 6ca080453ea403959ccde661030ca16264acc181
> >   Author: Daniel Henrique Barboza <danielhb413@gmail.com>
> >   Date:   Wed Nov 7 11:09:58 2018 -0200
> >
> >     block/snapshot.c: eliminate use of ID input in snapshot operations
> 
> Almost a year after I sent the memo I quoted.  It's an incompatible
> change, but nobody complained, and I'm glad we got this issue out of the
> way.

FWIW, I would have ignored any complaint about incompatible changes in
HMP. It's not supposed to be a stable API, but UI.

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-27 13:13         ` Kevin Wolf
@ 2020-08-28  6:20           ` Markus Armbruster
  2020-08-28  8:46             ` Kevin Wolf
  0 siblings, 1 reply; 23+ messages in thread
From: Markus Armbruster @ 2020-08-28  6:20 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Pavel Dovgalyuk, Paolo Bonzini,
	Max Reitz, John Snow

Kevin Wolf <kwolf@redhat.com> writes:

> Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote:
>> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
>> >> > Open questions:
>> >> > 
>> >> > * Do we want the QMP command to delete existing snapshots with
>> >> >   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
>> >> >   the transaction?
>> >> 
>> >> The intent is for the QMP commands to operate exclusively on
>> >> 'tags', and never consider "ID".
>> >
>> > I forgot that even HMP ignores "ID" now and works exclusively in terms
>> > of tags since:
>> >
>> >
>> >   commit 6ca080453ea403959ccde661030ca16264acc181
>> >   Author: Daniel Henrique Barboza <danielhb413@gmail.com>
>> >   Date:   Wed Nov 7 11:09:58 2018 -0200
>> >
>> >     block/snapshot.c: eliminate use of ID input in snapshot operations
>> 
>> Almost a year after I sent the memo I quoted.  It's an incompatible
>> change, but nobody complained, and I'm glad we got this issue out of the
>> way.
>
> FWIW, I would have ignored any complaint about incompatible changes in
> HMP. It's not supposed to be a stable API, but UI.

The iffy part is actually the loss of ability to access snapshots that
lack a name.  Complaints about that would have been valid, I think.
Fortunately, there have been none.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-28  6:20           ` Markus Armbruster
@ 2020-08-28  8:46             ` Kevin Wolf
  2020-08-31  7:19               ` Markus Armbruster
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2020-08-28  8:46 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Pavel Dovgalyuk, Paolo Bonzini,
	Max Reitz, John Snow

Am 28.08.2020 um 08:20 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote:
> >> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> >> >> > Open questions:
> >> >> > 
> >> >> > * Do we want the QMP command to delete existing snapshots with
> >> >> >   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
> >> >> >   the transaction?
> >> >> 
> >> >> The intent is for the QMP commands to operate exclusively on
> >> >> 'tags', and never consider "ID".
> >> >
> >> > I forgot that even HMP ignores "ID" now and works exclusively in terms
> >> > of tags since:
> >> >
> >> >
> >> >   commit 6ca080453ea403959ccde661030ca16264acc181
> >> >   Author: Daniel Henrique Barboza <danielhb413@gmail.com>
> >> >   Date:   Wed Nov 7 11:09:58 2018 -0200
> >> >
> >> >     block/snapshot.c: eliminate use of ID input in snapshot operations
> >> 
> >> Almost a year after I sent the memo I quoted.  It's an incompatible
> >> change, but nobody complained, and I'm glad we got this issue out of the
> >> way.
> >
> > FWIW, I would have ignored any complaint about incompatible changes in
> > HMP. It's not supposed to be a stable API, but UI.
> 
> The iffy part is actually the loss of ability to access snapshots that
> lack a name.  Complaints about that would have been valid, I think.
> Fortunately, there have been none.

'loadvm ""' should do the trick for these. The same way as you have to
use with 'savevm' to create them in non-prehistoric versions of QEMU.
We stopped creating snapshots with empty names by default in 0.14, so
they are probably not very relevant any more. (Versioned machine types
go back "only" to 1.0, so good luck loading a snapshot from an older
version. And I wouldn't bet money either on a 1.0 snapshot still working
with that machine type...)

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-28  8:46             ` Kevin Wolf
@ 2020-08-31  7:19               ` Markus Armbruster
  0 siblings, 0 replies; 23+ messages in thread
From: Markus Armbruster @ 2020-08-31  7:19 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Pavel Dovgalyuk, Paolo Bonzini,
	Max Reitz, John Snow

Kevin Wolf <kwolf@redhat.com> writes:

> Am 28.08.2020 um 08:20 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben:
>> >> Daniel P. Berrangé <berrange@redhat.com> writes:
>> >> 
>> >> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote:
>> >> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
>> >> >> > Open questions:
>> >> >> > 
>> >> >> > * Do we want the QMP command to delete existing snapshots with
>> >> >> >   conflicting tag / ID, like HMP savevm does?  Or do we want it to fail
>> >> >> >   the transaction?
>> >> >> 
>> >> >> The intent is for the QMP commands to operate exclusively on
>> >> >> 'tags', and never consider "ID".
>> >> >
>> >> > I forgot that even HMP ignores "ID" now and works exclusively in terms
>> >> > of tags since:
>> >> >
>> >> >
>> >> >   commit 6ca080453ea403959ccde661030ca16264acc181
>> >> >   Author: Daniel Henrique Barboza <danielhb413@gmail.com>
>> >> >   Date:   Wed Nov 7 11:09:58 2018 -0200
>> >> >
>> >> >     block/snapshot.c: eliminate use of ID input in snapshot operations
>> >> 
>> >> Almost a year after I sent the memo I quoted.  It's an incompatible
>> >> change, but nobody complained, and I'm glad we got this issue out of the
>> >> way.
>> >
>> > FWIW, I would have ignored any complaint about incompatible changes in
>> > HMP. It's not supposed to be a stable API, but UI.
>> 
>> The iffy part is actually the loss of ability to access snapshots that
>> lack a name.  Complaints about that would have been valid, I think.
>> Fortunately, there have been none.
>
> 'loadvm ""' should do the trick for these.

As long as you have at most one.

>                                            The same way as you have to
> use with 'savevm' to create them in non-prehistoric versions of QEMU.
> We stopped creating snapshots with empty names by default in 0.14, so
> they are probably not very relevant any more. (Versioned machine types
> go back "only" to 1.0, so good luck loading a snapshot from an older
> version. And I wouldn't bet money either on a 1.0 snapshot still working
> with that machine type...)

No argument.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-08-27 11:34       ` Daniel P. Berrangé
@ 2020-09-01 13:22         ` Markus Armbruster
  2020-09-01 13:41           ` Daniel P. Berrangé
  2020-09-01 13:58           ` Kevin Wolf
  0 siblings, 2 replies; 23+ messages in thread
From: Markus Armbruster @ 2020-09-01 13:22 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
>> > From the POV of practicality, making a design that unifies internal
>> > and external snapshots is something I'm considering out of scope.
>> > It increases the design time burden, as well as implementation burden.
>> > On my side, improving internal snapshots is a "spare time" project,
>> > not something I can justify spending weeks or months on.
>> 
>> I'm not demanding a solution that unifies internal and external
>> snapshots.  I'm asking for a bit of serious thought on an interface that
>> could compatibly evolve there.  Hours, not weeks or months.
>> 
>> > My goal is to implement something that is achievable in a short
>> > amount of time that gets us out of the hole we've been in for 10
>> > years. Minimal refactoring of the internal snapshot code aside
>> > from fixing the critical limitations we have today around choice
>> > of disks to snapshot.
>> >
>> > If someone later wants to come up with a grand unified design
>> > for everything that's fine, we can deprecate the new QMP commands
>> > I'm proposing now.
>> 
>> Failing at coming up with an interface that has a reasonable chance to
>> be future-proof is okay.
>> 
>> Not even trying is not okay.
>
> This was raised in my v1 posting:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html
>
> but the conclusion was that it was a non-trivial amount of extra
> implementation work

Thanks for the pointer.  I've now read that review thread.

>> Specifically, I'd like you to think about monolothic snapshot command
>> (internal snapshots only by design) vs. transaction of individual
>> snapshot commands (design is not restricted to internal snapshots, but
>> we may want to accept implementation restrictions).
>> 
>> We already have transactionable individual storage snapshot commands.
>> What's missing is a transactionable machine state snapshot command.
>
> At a high level I consider what I've proposed as being higher level
> syntax sugar vs a more generic future impl based on multiple commands
> for snapshotting disk & state. I don't think I'd claim that it will
> evolve to become the design you're suggesting here, as they are designs
> from different levels in the stack. IOW, I think the would ultimately
> just exist in parallel. I don't think that's a real problem from a
> maint POV, as the large burden from the monolithic snapshot code is
> not the HMP/QMP interface, but rather the guts of the internal
> impl in the migration/savevm.c and block/snapshot.c files. That code
> will exist for as long as the HMP commands exist, and adding a QMP
> interface doesn't make that situation worse unless we were intending
> to drop the existing HMP commands. If we did change our minds though,
> we can just deprecate the QMP command at any time we like.
>
>
>> One restriction I'd readily accept at this time is "the machine state
>> snapshot must write to a QCOW2 that is also internally snapshot in the
>> same transaction".
>> 
>> Now explain to me why this is impractical.
>
> The issues were described by Kevin here:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html
>
> Assuming the migration impl is actually possible to solve, there is
> still the question of actually writing it. That's a non-trivial
> amount of work someone has to find time for.

Kevin explained how the transactionable machine state snapshot command
should be made non-blocking: post-copy.

I don't dispute that creating such a post-copy snapshot is a non-trivial
task.  It is out of reach for you and me.  I didn't actually ask for it,
though.

You argue that providing a blocking snapshot in QMP is better than
nothing, and good enough for quite a few applications.  I agree!  I
blocked prior attempts at porting HMP's savevm/loadvm to QMP not because
they were blocking, but because they stuck to the HMP interface, and the
HMP interface is bonkers.  I would accept the restriction "snapshotting
machine state is blocking, i.e. it stops the machine."  As I wrote in
2016, "Limitations: No live internal machine snapshot, yet."

Aside: unless I'm mistaken, taking an internal block device snapshot is
also blocking, but unlike taking a machine state snapshot, it's fast
enough for the blocking not to matter.  That's the "sync" in
blockdev-snapshot-internal-sync.

I asked you to consider the interface design I proposed back in 2016.
You point out above that your interface is more high-level, and could be
turned into sugar for a low level interface.

If true, this means your proposal doesn't box us into a corner, which is
good.

Let me elaborate a bit on the desugaring, just to make sure we're on the
same page.  Please correct me where I'm talking nonsense.

snapshot-save creates job that snapshots a set of block devices and the
machine state.  The snapshots are consistent, i.e. they are all taken at
the same point in time.  The block device snapshots are all internal.
The machine state snapshot is saved together with one of the (internal)
block device snapshots.

This is basically a transaction of blockdev-snapshot-internal-sync
(which exists) plus machine-snapshot-internal-sync (which doesn't exist)
wrapped in a job.

Likweise for snapshot-load, except there not even the command for block
snapshots exists.

I doubt creating the (transactionable, but blocking) low-level commands
is actually out of reach.  It's a matter of adding interfaces to parts
of the code you already got working.

I'm not demanding you do that, though.  As I said, my chief concerns are
keeping "bonkers" out of QMP, and not boxing us in needlessly.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-09-01 13:22         ` Markus Armbruster
@ 2020-09-01 13:41           ` Daniel P. Berrangé
  2020-09-01 13:58           ` Kevin Wolf
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2020-09-01 13:41 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Peter Krempa, Denis V. Lunev, qemu-block,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Pavel Dovgalyuk, Paolo Bonzini, Max Reitz, John Snow

On Tue, Sep 01, 2020 at 03:22:24PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> >> > From the POV of practicality, making a design that unifies internal
> >> > and external snapshots is something I'm considering out of scope.
> >> > It increases the design time burden, as well as implementation burden.
> >> > On my side, improving internal snapshots is a "spare time" project,
> >> > not something I can justify spending weeks or months on.
> >> 
> >> I'm not demanding a solution that unifies internal and external
> >> snapshots.  I'm asking for a bit of serious thought on an interface that
> >> could compatibly evolve there.  Hours, not weeks or months.
> >> 
> >> > My goal is to implement something that is achievable in a short
> >> > amount of time that gets us out of the hole we've been in for 10
> >> > years. Minimal refactoring of the internal snapshot code aside
> >> > from fixing the critical limitations we have today around choice
> >> > of disks to snapshot.
> >> >
> >> > If someone later wants to come up with a grand unified design
> >> > for everything that's fine, we can deprecate the new QMP commands
> >> > I'm proposing now.
> >> 
> >> Failing at coming up with an interface that has a reasonable chance to
> >> be future-proof is okay.
> >> 
> >> Not even trying is not okay.
> >
> > This was raised in my v1 posting:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html
> >
> > but the conclusion was that it was a non-trivial amount of extra
> > implementation work
> 
> Thanks for the pointer.  I've now read that review thread.
> 
> >> Specifically, I'd like you to think about monolothic snapshot command
> >> (internal snapshots only by design) vs. transaction of individual
> >> snapshot commands (design is not restricted to internal snapshots, but
> >> we may want to accept implementation restrictions).
> >> 
> >> We already have transactionable individual storage snapshot commands.
> >> What's missing is a transactionable machine state snapshot command.
> >
> > At a high level I consider what I've proposed as being higher level
> > syntax sugar vs a more generic future impl based on multiple commands
> > for snapshotting disk & state. I don't think I'd claim that it will
> > evolve to become the design you're suggesting here, as they are designs
> > from different levels in the stack. IOW, I think the would ultimately
> > just exist in parallel. I don't think that's a real problem from a
> > maint POV, as the large burden from the monolithic snapshot code is
> > not the HMP/QMP interface, but rather the guts of the internal
> > impl in the migration/savevm.c and block/snapshot.c files. That code
> > will exist for as long as the HMP commands exist, and adding a QMP
> > interface doesn't make that situation worse unless we were intending
> > to drop the existing HMP commands. If we did change our minds though,
> > we can just deprecate the QMP command at any time we like.
> >
> >
> >> One restriction I'd readily accept at this time is "the machine state
> >> snapshot must write to a QCOW2 that is also internally snapshot in the
> >> same transaction".
> >> 
> >> Now explain to me why this is impractical.
> >
> > The issues were described by Kevin here:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html
> >
> > Assuming the migration impl is actually possible to solve, there is
> > still the question of actually writing it. That's a non-trivial
> > amount of work someone has to find time for.
> 
> Kevin explained how the transactionable machine state snapshot command
> should be made non-blocking: post-copy.
> 
> I don't dispute that creating such a post-copy snapshot is a non-trivial
> task.  It is out of reach for you and me.  I didn't actually ask for it,
> though.
> 
> You argue that providing a blocking snapshot in QMP is better than
> nothing, and good enough for quite a few applications.  I agree!  I
> blocked prior attempts at porting HMP's savevm/loadvm to QMP not because
> they were blocking, but because they stuck to the HMP interface, and the
> HMP interface is bonkers.  I would accept the restriction "snapshotting
> machine state is blocking, i.e. it stops the machine."  As I wrote in
> 2016, "Limitations: No live internal machine snapshot, yet."

FYI, when I documented the new QAPI commands I implemented, i choose to
*not* say that the snapshot is blocking. Instead I said:

  # Applications should not assume that the snapshot load is complete
  # when this command returns. Completion is indicated by the job
  # status. Clients can wait for the JOB_STATUS_CHANGE event. If the
  # job aborts, errors can be obtained via the 'query-jobs' command,
  # though.

The idea was that if we fix these QAPI commands to not block in future,
we want to minimize the risk of breaking clients, by discouraging them
from assuming the impl will always be blocking in future. IOW they
should assume the commands are asynchronous right now, even though they
are not.

> Aside: unless I'm mistaken, taking an internal block device snapshot is
> also blocking, but unlike taking a machine state snapshot, it's fast
> enough for the blocking not to matter.  That's the "sync" in
> blockdev-snapshot-internal-sync.


> Let me elaborate a bit on the desugaring, just to make sure we're on the
> same page.  Please correct me where I'm talking nonsense.
> 
> snapshot-save creates job that snapshots a set of block devices and the
> machine state.  The snapshots are consistent, i.e. they are all taken at
> the same point in time.  The block device snapshots are all internal.
> The machine state snapshot is saved together with one of the (internal)
> block device snapshots.

Yep

> This is basically a transaction of blockdev-snapshot-internal-sync
> (which exists) plus machine-snapshot-internal-sync (which doesn't exist)
> wrapped in a job.

Yes, except it isn't clear to me whether you can separate out the
functionality into two separate commands. It might be neccessary
to save the vmstate at the same time as the disk snapshot. In such
a case, instead of machine-snapshot-internal-sync, we might end up
having a "include vmstate" bool option to blockdev-snapshot-internal-sync
Either way though, we'd end up with a series of commands inside a
transaction.

> Likweise or snapshot-load, except there not even the command for block
> snapshots exists.
> 
> I doubt creating the (transactionable, but blocking) low-level commands
> is actually out of reach.  It's a matter of adding interfaces to parts
> of the code you already got working.

If we're splitting it up into one command per disk, and another command
for vmstate, then it will require refactoring the current migration/savevm.c
and block/snapshot.c code AFAICT, because its all written around the idea
of processing all disks at the same time.

> I'm not demanding you do that, though.  As I said, my chief concerns are
> keeping "bonkers" out of QMP, and not boxing us in needlessly.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
  2020-09-01 13:22         ` Markus Armbruster
  2020-09-01 13:41           ` Daniel P. Berrangé
@ 2020-09-01 13:58           ` Kevin Wolf
  1 sibling, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2020-09-01 13:58 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Krempa, Daniel P. Berrangé,
	Denis V. Lunev, qemu-block, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Pavel Dovgalyuk, Paolo Bonzini,
	Max Reitz, John Snow

Am 01.09.2020 um 15:22 hat Markus Armbruster geschrieben:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> >> > From the POV of practicality, making a design that unifies internal
> >> > and external snapshots is something I'm considering out of scope.
> >> > It increases the design time burden, as well as implementation burden.
> >> > On my side, improving internal snapshots is a "spare time" project,
> >> > not something I can justify spending weeks or months on.
> >> 
> >> I'm not demanding a solution that unifies internal and external
> >> snapshots.  I'm asking for a bit of serious thought on an interface that
> >> could compatibly evolve there.  Hours, not weeks or months.
> >> 
> >> > My goal is to implement something that is achievable in a short
> >> > amount of time that gets us out of the hole we've been in for 10
> >> > years. Minimal refactoring of the internal snapshot code aside
> >> > from fixing the critical limitations we have today around choice
> >> > of disks to snapshot.
> >> >
> >> > If someone later wants to come up with a grand unified design
> >> > for everything that's fine, we can deprecate the new QMP commands
> >> > I'm proposing now.
> >> 
> >> Failing at coming up with an interface that has a reasonable chance to
> >> be future-proof is okay.
> >> 
> >> Not even trying is not okay.
> >
> > This was raised in my v1 posting:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html
> >
> > but the conclusion was that it was a non-trivial amount of extra
> > implementation work
> 
> Thanks for the pointer.  I've now read that review thread.
> 
> >> Specifically, I'd like you to think about monolothic snapshot command
> >> (internal snapshots only by design) vs. transaction of individual
> >> snapshot commands (design is not restricted to internal snapshots, but
> >> we may want to accept implementation restrictions).
> >> 
> >> We already have transactionable individual storage snapshot commands.
> >> What's missing is a transactionable machine state snapshot command.
> >
> > At a high level I consider what I've proposed as being higher level
> > syntax sugar vs a more generic future impl based on multiple commands
> > for snapshotting disk & state. I don't think I'd claim that it will
> > evolve to become the design you're suggesting here, as they are designs
> > from different levels in the stack. IOW, I think the would ultimately
> > just exist in parallel. I don't think that's a real problem from a
> > maint POV, as the large burden from the monolithic snapshot code is
> > not the HMP/QMP interface, but rather the guts of the internal
> > impl in the migration/savevm.c and block/snapshot.c files. That code
> > will exist for as long as the HMP commands exist, and adding a QMP
> > interface doesn't make that situation worse unless we were intending
> > to drop the existing HMP commands. If we did change our minds though,
> > we can just deprecate the QMP command at any time we like.
> >
> >
> >> One restriction I'd readily accept at this time is "the machine state
> >> snapshot must write to a QCOW2 that is also internally snapshot in the
> >> same transaction".
> >> 
> >> Now explain to me why this is impractical.
> >
> > The issues were described by Kevin here:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html
> >
> > Assuming the migration impl is actually possible to solve, there is
> > still the question of actually writing it. That's a non-trivial
> > amount of work someone has to find time for.
> 
> Kevin explained how the transactionable machine state snapshot command
> should be made non-blocking: post-copy.
> 
> I don't dispute that creating such a post-copy snapshot is a non-trivial
> task.  It is out of reach for you and me.  I didn't actually ask for it,
> though.
> 
> You argue that providing a blocking snapshot in QMP is better than
> nothing, and good enough for quite a few applications.  I agree!  I
> blocked prior attempts at porting HMP's savevm/loadvm to QMP not because
> they were blocking, but because they stuck to the HMP interface, and the
> HMP interface is bonkers.  I would accept the restriction "snapshotting
> machine state is blocking, i.e. it stops the machine."  As I wrote in
> 2016, "Limitations: No live internal machine snapshot, yet."
> 
> Aside: unless I'm mistaken, taking an internal block device snapshot is
> also blocking, but unlike taking a machine state snapshot, it's fast
> enough for the blocking not to matter.  That's the "sync" in
> blockdev-snapshot-internal-sync.
> 
> I asked you to consider the interface design I proposed back in 2016.
> You point out above that your interface is more high-level, and could be
> turned into sugar for a low level interface.
> 
> If true, this means your proposal doesn't box us into a corner, which is
> good.
> 
> Let me elaborate a bit on the desugaring, just to make sure we're on the
> same page.  Please correct me where I'm talking nonsense.
> 
> snapshot-save creates job that snapshots a set of block devices and the
> machine state.  The snapshots are consistent, i.e. they are all taken at
> the same point in time.  The block device snapshots are all internal.
> The machine state snapshot is saved together with one of the (internal)
> block device snapshots.
> 
> This is basically a transaction of blockdev-snapshot-internal-sync
> (which exists) plus machine-snapshot-internal-sync (which doesn't exist)
> wrapped in a job.
> 
> Likweise for snapshot-load, except there not even the command for block
> snapshots exists.
> 
> I doubt creating the (transactionable, but blocking) low-level commands
> is actually out of reach.  It's a matter of adding interfaces to parts
> of the code you already got working.
> 
> I'm not demanding you do that, though.  As I said, my chief concerns are
> keeping "bonkers" out of QMP, and not boxing us in needlessly.

I doubt this is as easy as it may seem at the first sight. To remind
everyone, the way internal snapshots with VM state work today is:

1. Write VM state to VM state area inside the image (in qcow2, this is
   essentially the same as disk content, except at offsets higher than
   the image size).

2. Create the qcow2 snapshot, which means that the current disk content
   (including the VM state at higher offsets) becomes COW and the
   snapshotted copy becomes read-ony.

3. Discard the VM state in the active layer again, we don't need it
   there.

The implication is that either 1. has to be completed before 2. can
happen, or that 2. must be able to write into an already taken snapshot
rather than to the active layer (which in turn implies that the snapshot
must have completed).

So a naive implementation with a transaction might not give the right
result. It's not just independent operations, but some ordering between
them is required.

I feel having a single block job that covers both parts gives us more
flexibility in how we want to synchronise writing the VM state and
taking the disk snapshot - or in your words, avoids boxing us into a
corner.

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-01 13:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-27 15:08 [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
2020-07-27 15:08 ` [PATCH v2 1/6] migration: improve error reporting of block driver state name Daniel P. Berrangé
2020-08-12 10:19   ` Dr. David Alan Gilbert
2020-07-27 15:08 ` [PATCH v2 2/6] block: push error reporting into bdrv_all_*_snapshot functions Daniel P. Berrangé
2020-07-27 15:08 ` [PATCH v2 3/6] migration: stop returning errno from load_snapshot() Daniel P. Berrangé
2020-08-12 10:21   ` Dr. David Alan Gilbert
2020-07-27 15:08 ` [PATCH v2 4/6] block: add ability to specify list of blockdevs during snapshot Daniel P. Berrangé
2020-07-27 15:08 ` [PATCH v2 5/6] block: allow specifying name of block device for vmstate storage Daniel P. Berrangé
2020-07-27 15:08 ` [PATCH v2 6/6] migration: introduce snapshot-{save, load, delete} QMP commands Daniel P. Berrangé
2020-08-21 16:27 ` [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP Daniel P. Berrangé
2020-08-26 15:52 ` Markus Armbruster
2020-08-26 18:28   ` Daniel P. Berrangé
2020-08-26 18:34     ` Daniel P. Berrangé
2020-08-27 11:06       ` Markus Armbruster
2020-08-27 13:13         ` Kevin Wolf
2020-08-28  6:20           ` Markus Armbruster
2020-08-28  8:46             ` Kevin Wolf
2020-08-31  7:19               ` Markus Armbruster
2020-08-27 11:04     ` Markus Armbruster
2020-08-27 11:34       ` Daniel P. Berrangé
2020-09-01 13:22         ` Markus Armbruster
2020-09-01 13:41           ` Daniel P. Berrangé
2020-09-01 13:58           ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.