QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V1 00/32] Live Update
@ 2020-07-30 15:14 Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 01/32] savevm: add vmstate handler iterators Steve Sistare
                   ` (35 more replies)
  0 siblings, 36 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Improve and extend the qemu functions that save and restore VM state so a
guest may be suspended and resumed with minimal pause time.  qemu may be
updated to a new version in between.

The first set of patches adds the cprsave and cprload commands to save and
restore VM state, and allow the host kernel to be updated and rebooted in
between.  The VM must create guest RAM in a persistent shared memory file,
such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/

cprsave stops the VCPUs and saves VM device state in a simple file, and
thus supports any type of guest image and block device.  The caller must
not modify the VM's block devices between cprsave and cprload.

cprsave and cprload support guests with vfio devices if the caller first
suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
The guest drivers suspend methods flush outstanding requests and re-
initialize the devices, and thus there is no device state to save and
restore.

   1 savevm: add vmstate handler iterators
   2 savevm: VM handlers mode mask
   3 savevm: QMP command for cprsave
   4 savevm: HMP Command for cprsave
   5 savevm: QMP command for cprload
   6 savevm: HMP Command for cprload
   7 savevm: QMP command for cprinfo
   8 savevm: HMP command for cprinfo
   9 savevm: prevent cprsave if memory is volatile
  10 kvmclock: restore paused KVM clock
  11 cpu: disable ticks when suspended
  12 vl: pause option
  13 gdbstub: gdb support for suspended state

The next patches add a restart method that eliminates the persistent memory
constraint, and allows qemu to be updated across the restart, but does not
allow host reboot.  Anonymous memory segments used by the guest are
preserved across a re-exec of qemu, mapped at the same VA, via a proposed
madvise(MADV_DOEXEC) option in the Linux kernel.  See
https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/

  14 savevm: VMS_RESTART and cprsave restart
  15 vl: QEMU_START_FREEZE env var
  16 oslib: add qemu_clr_cloexec
  17 util: env var helpers
  18 osdep: import MADV_DOEXEC
  19 memory: ram_block_add cosmetic changes
  20 vl: add helper to request re-exec
  21 exec, memory: exec(3) to restart
  22 char: qio_channel_socket_accept reuse fd
  23 char: save/restore chardev socket fds
  24 ui: save/restore vnc socket fds
  25 char: save/restore chardev pty fds
  26 monitor: save/restore QMP negotiation status
  27 vhost: reset vhost devices upon cprsave
  28 char: restore terminal on restart

The next patches extend the restart method to save and restore vfio-pci
state, eliminating the requirement for a guest agent.  The vfio container,
group, and device descriptors are preserved across the qemu re-exec.

  29 pci: export pci_update_mappings
  30 vfio-pci: save and restore
  31 vfio-pci: trace pci config
  32 vfio-pci: improved tracing

Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
"cprload restart".  The software update is performed while the guest is
running to minimize downtime.

window 1				| window 2
					|
# qemu-system-x86_64 ... 		|
QEMU 4.2.0 monitor - type 'help' ...	|
(qemu) info status			|
VM status: running			|
					| # yum update qemu
(qemu) cprsave /tmp/qemu.sav restart	|
QEMU 4.2.1 monitor - type 'help' ...	|
(qemu) info status			|
VM status: paused (prelaunch)		|
(qemu) cprload /tmp/qemu.sav		|
(qemu) info status			|
VM status: running			|


Here is an example of updating the host kernel using "cprload reboot"

window 1					| window 2
						|
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...		|
(qemu) info status				|
VM status: running				|
						| # yum update kernel-uek
(qemu) cprsave /tmp/qemu.sav restart		|
						|
# systemctl kexec				|
kexec_core: Starting new kernel			|
...						|
						|
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...		|
(qemu) info status				|
VM status: paused (prelaunch)			|
(qemu) cprload /tmp/qemu.sav			|
(qemu) info status				|
VM status: running				|


Mark Kanda (5):
  char: qio_channel_socket_accept reuse fd
  char: save/restore chardev socket fds
  ui: save/restore vnc socket fds
  monitor: save/restore QMP negotiation status
  vhost: reset vhost devices upon cprsave

Steve Sistare (27):
  savevm: add vmstate handler iterators
  savevm: VM handlers mode mask
  savevm: QMP command for cprsave
  savevm: HMP Command for cprsave
  savevm: QMP command for cprload
  savevm: HMP Command for cprload
  savevm: QMP command for cprinfo
  savevm: HMP command for cprinfo
  savevm: prevent cprsave if memory is volatile
  kvmclock: restore paused KVM clock
  cpu: disable ticks when suspended
  vl: pause option
  gdbstub: gdb support for suspended state
  savevm: VMS_RESTART and cprsave restart
  vl: QEMU_START_FREEZE env var
  oslib: add qemu_clr_cloexec
  util: env var helpers
  osdep: import MADV_DOEXEC
  memory: ram_block_add cosmetic changes
  vl: add helper to request re-exec
  exec, memory: exec(3) to restart
  char: save/restore chardev pty fds
  char: restore terminal on restart
  pci: export pci_update_mappings
  vfio-pci: save and restore
  vfio-pci: trace pci config
  vfio-pci: improved tracing

 MAINTAINERS                    |   7 ++
 accel/kvm/kvm-all.c            |   8 +-
 accel/kvm/trace-events         |   3 +-
 chardev/char-pty.c             |  38 +++++--
 chardev/char-socket.c          |  35 ++++++
 chardev/char-stdio.c           |   7 ++
 chardev/char.c                 |  16 +++
 exec.c                         |  88 +++++++++++++--
 gdbstub.c                      |  11 +-
 hmp-commands.hx                |  46 ++++++++
 hw/i386/kvm/clock.c            |   6 +-
 hw/pci/msix.c                  |   1 +
 hw/pci/pci.c                   |  17 +--
 hw/pci/trace-events            |   5 +-
 hw/vfio/common.c               | 115 ++++++++++++++++----
 hw/vfio/pci.c                  | 179 ++++++++++++++++++++++++++++++-
 hw/vfio/platform.c             |   2 +-
 hw/vfio/trace-events           |  11 +-
 hw/virtio/vhost.c              |  12 +++
 include/chardev/char.h         |   8 ++
 include/exec/memory.h          |   4 +
 include/hw/pci/pci.h           |   2 +
 include/hw/vfio/vfio-common.h  |   4 +-
 include/io/channel-socket.h    |   3 +-
 include/migration/register.h   |   3 +
 include/migration/vmstate.h    |  11 ++
 include/monitor/hmp.h          |   3 +
 include/qemu/cutils.h          |   1 +
 include/qemu/env.h             |  31 ++++++
 include/qemu/osdep.h           |   8 ++
 include/sysemu/sysemu.h        |  10 ++
 io/channel-socket.c            |  12 ++-
 io/net-listener.c              |   4 +-
 migration/block.c              |   1 +
 migration/migration.c          |   4 +-
 migration/ram.c                |   1 +
 migration/savevm.c             | 237 ++++++++++++++++++++++++++++++++++++-----
 migration/savevm.h             |   4 +-
 monitor/hmp-cmds.c             |  28 +++++
 monitor/qmp-cmds.c             |  16 +++
 monitor/qmp.c                  |  42 ++++++++
 qapi/migration.json            |  35 ++++++
 qapi/pragma.json               |   1 +
 qemu-options.hx                |   9 ++
 scsi/qemu-pr-helper.c          |   2 +-
 softmmu/vl.c                   |  65 ++++++++++-
 tests/qtest/tpm-emu.c          |   2 +-
 tests/test-char.c              |   2 +-
 tests/test-io-channel-socket.c |   4 +-
 trace-events                   |   2 +
 ui/vnc.c                       | 153 +++++++++++++++++++++-----
 util/Makefile.objs             |   2 +-
 util/env.c                     | 132 +++++++++++++++++++++++
 util/oslib-posix.c             |   9 ++
 util/oslib-win32.c             |   4 +
 55 files changed, 1331 insertions(+), 135 deletions(-)
 create mode 100644 include/qemu/env.h
 create mode 100644 util/env.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 01/32] savevm: add vmstate handler iterators
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 02/32] savevm: VM handlers mode mask Steve Sistare
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the SAVEVM_FOREACH and SAVEVM_FORALL macros to loop over all save
VM state handlers.  The former will filter handlers based on the operation
in the later patch "savevm: VM handlers mode mask".  The latter loops over
all handlers.

No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/savevm.c | 57 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9..a07fcad 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -266,6 +266,25 @@ static SaveState savevm_state = {
     .global_section_id = 0,
 };
 
+/*
+ * The FOREACH macros will filter handlers based on the current operation when
+ * additional conditions are added in a subsequent patch.
+ */
+
+#define SAVEVM_FOREACH(se, entry)                                    \
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry)                \
+
+#define SAVEVM_FOREACH_SAFE(se, entry, new_se)                       \
+    QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se)   \
+
+/* The FORALL macros unconditionally loop over all handlers. */
+
+#define SAVEVM_FORALL(se, entry)                                     \
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry)
+
+#define SAVEVM_FORALL_SAFE(se, entry, new_se)                        \
+    QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se)
+
 static bool should_validate_capability(int capability)
 {
     assert(capability >= 0 && capability < MIGRATION_CAPABILITY__MAX);
@@ -673,7 +692,7 @@ static uint32_t calculate_new_instance_id(const char *idstr)
     SaveStateEntry *se;
     uint32_t instance_id = 0;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FORALL(se, entry) {
         if (strcmp(idstr, se->idstr) == 0
             && instance_id <= se->instance_id) {
             instance_id = se->instance_id + 1;
@@ -689,7 +708,7 @@ static int calculate_compat_instance_id(const char *idstr)
     SaveStateEntry *se;
     int instance_id = 0;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FORALL(se, entry) {
         if (!se->compat) {
             continue;
         }
@@ -803,7 +822,7 @@ void unregister_savevm(VMStateIf *obj, const char *idstr, void *opaque)
     }
     pstrcat(id, sizeof(id), idstr);
 
-    QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
+    SAVEVM_FORALL_SAFE(se, entry, new_se) {
         if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
             savevm_state_handler_remove(se);
             g_free(se->compat);
@@ -867,7 +886,7 @@ void vmstate_unregister(VMStateIf *obj, const VMStateDescription *vmsd,
 {
     SaveStateEntry *se, *new_se;
 
-    QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
+    SAVEVM_FORALL_SAFE(se, entry, new_se) {
         if (se->vmsd == vmsd && se->opaque == opaque) {
             savevm_state_handler_remove(se);
             g_free(se->compat);
@@ -1119,7 +1138,7 @@ bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FORALL(se, entry) {
         if (se->vmsd && se->vmsd->unmigratable) {
             error_setg(errp, "State blocked by non-migratable device '%s'",
                        se->idstr);
@@ -1145,7 +1164,7 @@ bool qemu_savevm_state_guest_unplug_pending(void)
 {
     SaveStateEntry *se;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (se->vmsd && se->vmsd->dev_unplug_pending &&
             se->vmsd->dev_unplug_pending(se->opaque)) {
             return true;
@@ -1162,7 +1181,7 @@ void qemu_savevm_state_setup(QEMUFile *f)
     int ret;
 
     trace_savevm_state_setup();
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops || !se->ops->save_setup) {
             continue;
         }
@@ -1193,7 +1212,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
 
     trace_savevm_state_resume_prepare();
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops || !se->ops->resume_prepare) {
             continue;
         }
@@ -1223,7 +1242,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
     int ret = 1;
 
     trace_savevm_state_iterate();
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops || !se->ops->save_live_iterate) {
             continue;
         }
@@ -1291,7 +1310,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     SaveStateEntry *se;
     int ret;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops || !se->ops->save_live_complete_postcopy) {
             continue;
         }
@@ -1324,7 +1343,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
     SaveStateEntry *se;
     int ret;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops ||
             (in_postcopy && se->ops->has_postcopy &&
              se->ops->has_postcopy(se->opaque)) ||
@@ -1366,7 +1385,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     vmdesc = qjson_new();
     json_prop_int(vmdesc, "page_size", qemu_target_page_size());
     json_start_array(vmdesc, "devices");
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
 
         if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
             continue;
@@ -1476,7 +1495,7 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
     *res_postcopy_only = 0;
 
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
             continue;
         }
@@ -1501,7 +1520,7 @@ void qemu_savevm_state_cleanup(void)
     }
 
     trace_savevm_state_cleanup();
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (se->ops && se->ops->save_cleanup) {
             se->ops->save_cleanup(se->opaque);
         }
@@ -1580,7 +1599,7 @@ int qemu_save_device_state(QEMUFile *f)
     }
     cpu_synchronize_all_states();
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         int ret;
 
         if (se->is_ram) {
@@ -1612,7 +1631,7 @@ static SaveStateEntry *find_se(const char *idstr, uint32_t instance_id)
 {
     SaveStateEntry *se;
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FORALL(se, entry) {
         if (!strcmp(se->idstr, idstr) &&
             (instance_id == se->instance_id ||
              instance_id == se->alias_id))
@@ -2334,7 +2353,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
     }
 
     trace_qemu_loadvm_state_section_partend(section_id);
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (se->load_section_id == section_id) {
             break;
         }
@@ -2400,7 +2419,7 @@ static int qemu_loadvm_state_setup(QEMUFile *f)
     int ret;
 
     trace_loadvm_state_setup();
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (!se->ops || !se->ops->load_setup) {
             continue;
         }
@@ -2425,7 +2444,7 @@ void qemu_loadvm_state_cleanup(void)
     SaveStateEntry *se;
 
     trace_loadvm_state_cleanup();
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    SAVEVM_FOREACH(se, entry) {
         if (se->ops && se->ops->load_cleanup) {
             se->ops->load_cleanup(se->opaque);
         }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 02/32] savevm: VM handlers mode mask
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 01/32] savevm: add vmstate handler iterators Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 03/32] savevm: QMP command for cprsave Steve Sistare
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a new mode argument to qemu_savevm_state() and qemu_loadvm_state() that
can customize the operation.  Define the VMS_MIGRATE and VMS_SNAPSHOT modes
for the existing live migration and snapshot capabilities.

Provide a mode mask for vmstate handlers.  A handler is only processed by
SAVEVM_FOREACH if its mask includes the savevm_state.mode.  Unmodified
handler declarations have a zero mask field, which implicitly enables the
handler for all modes.

No functional change for the VMS_MIGRATE and VMS_SNAPSHOT modes.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/migration/register.h |  3 +++
 include/migration/vmstate.h  |  9 ++++++++
 migration/migration.c        |  4 ++--
 migration/savevm.c           | 51 +++++++++++++++++++++++++++++++++++---------
 migration/savevm.h           |  4 +++-
 5 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0..c030a10 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -17,6 +17,9 @@
 #include "hw/vmstate-if.h"
 
 typedef struct SaveVMHandlers {
+    /* Mask of VMStateMode's that should use this handler */
+    unsigned mode_mask;
+
     /* This runs inside the iothread lock.  */
     SaveStateHandler *save_state;
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f68ed7d..fa575f9 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -158,6 +158,12 @@ typedef enum {
     MIG_PRI_MAX,
 } MigrationPriority;
 
+typedef enum {
+    VMS_MIGRATE  = (1U << 1),
+    VMS_SNAPSHOT = (1U << 2),
+    VMS_MODE_ALL = ~0U
+} VMStateMode;
+
 struct VMStateField {
     const char *name;
     const char *err_hint;
@@ -182,6 +188,7 @@ struct VMStateDescription {
     int minimum_version_id;
     int minimum_version_id_old;
     MigrationPriority priority;
+    unsigned mode_mask;
     LoadStateHandler *load_state_old;
     int (*pre_load)(void *opaque);
     int (*post_load)(void *opaque, int version_id);
@@ -1215,4 +1222,6 @@ void vmstate_register_ram_global(struct MemoryRegion *memory);
 
 bool vmstate_check_only_migratable(const VMStateDescription *vmsd);
 
+void savevm_set_mode(VMStateMode mode);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 2ed9923..e3d0899 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -465,7 +465,7 @@ static void process_incoming_migration_co(void *opaque)
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
     migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
                       MIGRATION_STATUS_ACTIVE);
-    ret = qemu_loadvm_state(mis->from_src_file);
+    ret = qemu_loadvm_state(mis->from_src_file, VMS_MIGRATE);
 
     ps = postcopy_state_get();
     trace_process_incoming_migration_co_end(ret, ps);
@@ -3414,7 +3414,7 @@ static void *migration_thread(void *opaque)
 
     object_ref(OBJECT(s));
     update_iteration_initial_status(s);
-
+    savevm_set_mode(VMS_MIGRATE);
     qemu_savevm_state_header(s->to_dst_file);
 
     /*
diff --git a/migration/savevm.c b/migration/savevm.c
index a07fcad..ce02b6b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -256,6 +256,7 @@ typedef struct SaveState {
     const char *name;
     uint32_t target_page_bits;
     uint32_t caps_count;
+    VMStateMode mode;
     MigrationCapability *capabilities;
     QemuUUID uuid;
 } SaveState;
@@ -266,16 +267,15 @@ static SaveState savevm_state = {
     .global_section_id = 0,
 };
 
-/*
- * The FOREACH macros will filter handlers based on the current operation when
- * additional conditions are added in a subsequent patch.
- */
+/* The FOREACH macros filter handlers based on the current operation. */
 
 #define SAVEVM_FOREACH(se, entry)                                    \
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry)                \
+        if (savevm_state.mode & mode_mask(se))
 
 #define SAVEVM_FOREACH_SAFE(se, entry, new_se)                       \
     QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se)   \
+        if (savevm_state.mode & mode_mask(se))
 
 /* The FORALL macros unconditionally loop over all handlers. */
 
@@ -285,6 +285,33 @@ static SaveState savevm_state = {
 #define SAVEVM_FORALL_SAFE(se, entry, new_se)                        \
     QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se)
 
+/*
+ * Set the current mode to be used for filtering savevm handlers in
+ * SAVEVM_FOREACH.
+ */
+void savevm_set_mode(VMStateMode mode)
+{
+    savevm_state.mode = mode;
+}
+
+/*
+ * A savevm handler is selected in SAVEVM_FOREACH if its mask overlaps the
+ * current mode.  The mask is defined by either the new vmsd interface or the
+ * legacy ops interface.  If the mask is zero, it implicily includes all modes.
+ */
+static inline unsigned mode_mask(SaveStateEntry *se)
+{
+    const VMStateDescription *vmsd = se->vmsd;
+    unsigned mask = 0;
+
+    if (vmsd) {
+        mask = vmsd->mode_mask;
+    } else if (se->ops) {
+        mask = se->ops->mode_mask;
+    }
+    return mask ? mask : VMS_MODE_ALL;
+}
+
 static bool should_validate_capability(int capability)
 {
     assert(capability >= 0 && capability < MIGRATION_CAPABILITY__MAX);
@@ -1527,12 +1554,14 @@ void qemu_savevm_state_cleanup(void)
     }
 }
 
-static int qemu_savevm_state(QEMUFile *f, Error **errp)
+static int qemu_savevm_state(QEMUFile *f, VMStateMode mode, Error **errp)
 {
     int ret;
     MigrationState *ms = migrate_get_current();
     MigrationStatus status;
 
+    savevm_set_mode(mode);
+
     if (migration_is_running(ms->state)) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
         return -EINVAL;
@@ -2557,13 +2586,14 @@ out:
     return ret;
 }
 
-int qemu_loadvm_state(QEMUFile *f)
+int qemu_loadvm_state(QEMUFile *f, VMStateMode mode)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     int ret;
 
-    if (qemu_savevm_state_blocked(&local_err)) {
+    if ((mode & (VMS_SNAPSHOT | VMS_MIGRATE)) &&
+        qemu_savevm_state_blocked(&local_err)) {
         error_report_err(local_err);
         return -EINVAL;
     }
@@ -2736,7 +2766,7 @@ int save_snapshot(const char *name, Error **errp)
         error_setg(errp, "Could not open VM state file");
         goto the_end;
     }
-    ret = qemu_savevm_state(f, errp);
+    ret = qemu_savevm_state(f, VMS_SNAPSHOT, errp);
     vm_state_size = qemu_ftell(f);
     ret2 = qemu_fclose(f);
     if (ret < 0) {
@@ -2785,6 +2815,7 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
     int saved_vm_running;
     int ret;
 
+    savevm_set_mode(VMS_MIGRATE);
     if (!has_live) {
         /* live default to true so old version of Xen tool stack can have a
          * successfull live migration */
@@ -2850,7 +2881,7 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
     f = qemu_fopen_channel_input(QIO_CHANNEL(ioc));
     object_unref(OBJECT(ioc));
 
-    ret = qemu_loadvm_state(f);
+    ret = qemu_loadvm_state(f, VMS_MIGRATE);
     qemu_fclose(f);
     if (ret < 0) {
         error_setg(errp, QERR_IO_ERROR);
@@ -2928,7 +2959,7 @@ int load_snapshot(const char *name, Error **errp)
     mis->from_src_file = f;
 
     aio_context_acquire(aio_context);
-    ret = qemu_loadvm_state(f);
+    ret = qemu_loadvm_state(f, VMS_SNAPSHOT);
     migration_incoming_state_destroy();
     aio_context_release(aio_context);
 
diff --git a/migration/savevm.h b/migration/savevm.h
index ba64a7e..4b7ce91 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -14,6 +14,8 @@
 #ifndef MIGRATION_SAVEVM_H
 #define MIGRATION_SAVEVM_H
 
+#include "migration/vmstate.h"
+
 #define QEMU_VM_FILE_MAGIC           0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x00000002
 #define QEMU_VM_FILE_VERSION         0x00000003
@@ -60,7 +62,7 @@ void qemu_savevm_send_colo_enable(QEMUFile *f);
 void qemu_savevm_live_state(QEMUFile *f);
 int qemu_save_device_state(QEMUFile *f);
 
-int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state(QEMUFile *f, VMStateMode mode);
 void qemu_loadvm_state_cleanup(void);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 03/32] savevm: QMP command for cprsave
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 01/32] savevm: add vmstate handler iterators Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 02/32] savevm: VM handlers mode mask Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 16:12   ` Eric Blake
  2020-07-30 15:14 ` [PATCH V1 04/32] savevm: HMP Command " Steve Sistare
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

To enable live reboot, provide the cprsave QMP command and the VMS_REBOOT
vmstate-saving operation, which saves the state of the virtual machine in a
simple file.

Syntax:
  {'command':'cprsave', 'data':{'file':'str', 'mode':'str'}}

  The mode argument must be 'reboot'.  Additional modes will be defined in
  the future.

Unlike the savevm command, cprsave supports any type of guest image and
block device.  cprsave stops the VM so that guest ram and block devices are
not modified after state is saved.  Guest ram must be mapped to a persistent
memory file such as /dev/dax0.0.  The ram object vmstate handler and block
device handler do not apply to VMS_REBOOT, so restrict them to VMS_MIGRATE
or VMS_SNAPSHOT.  After cprsave completes successfully, qemu exits.

After issuing cprsave, the caller may update qemu, update the host kernel,
reboot, start qemu using the same arguments as the original process, and
issue the cprload command to restore the guest.  cprload is added by
subsequent patches.

If the caller suspends the guest instead of stopping the VM, such as by
issuing guest-suspend-ram to the qemu guest agent, then cprsave and cprload
support guests with vfio devices.  The guest drivers suspend methods flush
outstanding requests and re-initialize the devices, and thus there is no
device state to save and restore.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
---
 include/migration/vmstate.h |  1 +
 include/sysemu/sysemu.h     |  2 ++
 migration/block.c           |  1 +
 migration/ram.c             |  1 +
 migration/savevm.c          | 59 +++++++++++++++++++++++++++++++++++++++++++++
 monitor/qmp-cmds.c          |  6 +++++
 qapi/migration.json         | 14 +++++++++++
 7 files changed, 84 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index fa575f9..c58551a 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -161,6 +161,7 @@ typedef enum {
 typedef enum {
     VMS_MIGRATE  = (1U << 1),
     VMS_SNAPSHOT = (1U << 2),
+    VMS_REBOOT   = (1U << 3),
     VMS_MODE_ALL = ~0U
 } VMStateMode;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 4b6a5c4..6fe86e6 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -24,6 +24,8 @@ extern bool machine_init_done;
 void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
+void save_cpr_snapshot(const char *file, const char *mode, Error **errp);
+
 extern int autostart;
 
 typedef enum {
diff --git a/migration/block.c b/migration/block.c
index 737b649..a69accb 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -1023,6 +1023,7 @@ static SaveVMHandlers savevm_block_handlers = {
     .load_state = block_load,
     .save_cleanup = block_migration_cleanup,
     .is_active = block_is_active,
+    .mode_mask = VMS_MIGRATE | VMS_SNAPSHOT,
 };
 
 void blk_mig_init(void)
diff --git a/migration/ram.c b/migration/ram.c
index 76d4fee..f0d5d9f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3795,6 +3795,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .load_setup = ram_load_setup,
     .load_cleanup = ram_load_cleanup,
     .resume_prepare = ram_resume_prepare,
+    .mode_mask = VMS_MIGRATE | VMS_SNAPSHOT,
 };
 
 void ram_mig_init(void)
diff --git a/migration/savevm.c b/migration/savevm.c
index ce02b6b..ff1a46e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2680,6 +2680,65 @@ int qemu_load_device_state(QEMUFile *f)
     return 0;
 }
 
+static QEMUFile *qf_file_open(const char *filename, int flags, int mode,
+                              Error **errp)
+{
+    QIOChannel *ioc;
+    int fd = qemu_open(filename, flags, mode);
+
+    if (fd < 0) {
+        error_setg_errno(errp, errno, "%s(%s)", __func__, filename);
+        return NULL;
+    }
+
+    ioc = QIO_CHANNEL(qio_channel_file_new_fd(fd));
+
+    if (flags & O_WRONLY) {
+        return qemu_fopen_channel_output(ioc);
+    }
+
+    return qemu_fopen_channel_input(ioc);
+}
+
+void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
+{
+    int ret = 0;
+    QEMUFile *f;
+    VMStateMode op;
+
+    if (!strcmp(mode, "reboot")) {
+        op = VMS_REBOOT;
+    } else {
+        error_setg(errp, "cprsave: bad mode %s", mode);
+        return;
+    }
+
+    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, errp);
+    if (!f) {
+        return;
+    }
+
+    ret = global_state_store();
+    if (ret) {
+        error_setg(errp, "Error saving global state");
+        qemu_fclose(f);
+        return;
+    }
+
+    vm_stop(RUN_STATE_SAVE_VM);
+
+    ret = qemu_savevm_state(f, op, errp);
+    if ((ret < 0) && !*errp) {
+        error_setg(errp, "qemu_savevm_state failed");
+    }
+    qemu_fclose(f);
+
+    if (op == VMS_REBOOT) {
+        no_shutdown = 0;
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
 int save_snapshot(const char *name, Error **errp)
 {
     BlockDriverState *bs, *bs1;
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 864cbfa..9ec7b88 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -35,6 +35,7 @@
 #include "qapi/qapi-commands-machine.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-commands-ui.h"
+#include "qapi/qapi-commands-migration.h"
 #include "qapi/qmp/qerror.h"
 #include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi_dev_interface.h"
@@ -161,6 +162,11 @@ void qmp_cont(Error **errp)
     }
 }
 
+void qmp_cprsave(const char *file, const char *mode, Error **errp)
+{
+    save_cpr_snapshot(file, mode, errp);
+}
+
 void qmp_system_wakeup(Error **errp)
 {
     if (!qemu_wakeup_suspend_enabled()) {
diff --git a/qapi/migration.json b/qapi/migration.json
index d500055..b61df1d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1621,3 +1621,17 @@
 ##
 { 'event': 'UNPLUG_PRIMARY',
   'data': { 'device-id': 'str' } }
+
+##
+# @cprsave:
+#
+# Create a checkpoint of the virtual machine device state in @file.
+# Guest RAM and guest block device blocks are not saved.
+#
+# @file: name of checkpoint file
+# @mode: 'reboot' : checkpoint can be cprload'ed after a host kexec reboot.
+#
+# Since 5.0
+##
+{ 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'str' } }
+
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 04/32] savevm: HMP Command for cprsave
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (2 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 03/32] savevm: QMP command for cprsave Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 05/32] savevm: QMP command for cprload Steve Sistare
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable HMP access to the cprsave QMP command.

Usage: cprsave <filename> <mode>

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 18 ++++++++++++++++++
 include/monitor/hmp.h |  1 +
 monitor/hmp-cmds.c    | 10 ++++++++++
 3 files changed, 29 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 60f395c..c8defd9 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -354,6 +354,24 @@ SRST
 ERST
 
     {
+        .name       = "cprsave",
+        .args_type  = "file:s,mode:s",
+        .params     = "file 'reboot'",
+        .help       = "create a checkpoint of the VM in file",
+        .cmd        = hmp_cprsave,
+    },
+
+SRST
+``cprsave`` *tag*
+  Stop VCPUs, create a checkpoint of the whole virtual machine and save it
+  in *file*.
+  If *mode* is 'reboot', the checkpoint can be cprload'ed after a host kexec
+  reboot.
+  exec() /usr/bin/qemu-exec if it exists, else exec /usr/bin/qemu-system-x86_64,
+  passing all the original command line arguments.  The VCPUs remain paused.
+ERST
+
+    {
         .name       = "delvm",
         .args_type  = "name:s",
         .params     = "tag",
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index c986cfd..af8ee23 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -59,6 +59,7 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_loadvm(Monitor *mon, const QDict *qdict);
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
+void hmp_cprsave(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ae4b6a4..59196ed 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1139,6 +1139,16 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
     qapi_free_AnnounceParameters(params);
 }
 
+void hmp_cprsave(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_cprsave(qdict_get_try_str(qdict, "file"),
+                qdict_get_try_str(qdict, "mode"),
+                &err);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 05/32] savevm: QMP command for cprload
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (3 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 04/32] savevm: HMP Command " Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 16:14   ` Eric Blake
  2020-07-30 15:14 ` [PATCH V1 06/32] savevm: HMP Command " Steve Sistare
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprload QMP command.  The VM is created from the file produced
by the cprsave command.  Guest RAM is restored in-place from the shared
memory backend file, and guest block devices are used as is.  The contents
of such devices must not be modified between the cprsave and cprload
operations.  If the VM was running at cprsave time, then VM execution
resumes.

Syntax:
  {'command':'cprload', 'data':{'file':'str'}}

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
---
 include/sysemu/sysemu.h |  2 ++
 migration/savevm.c      | 34 ++++++++++++++++++++++++++++++++++
 monitor/qmp-cmds.c      |  5 +++++
 qapi/migration.json     | 11 +++++++++++
 softmmu/vl.c            | 15 ++++++++++++++-
 5 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 6fe86e6..5360da5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -25,6 +25,7 @@ void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
 void save_cpr_snapshot(const char *file, const char *mode, Error **errp);
+void load_cpr_snapshot(const char *file, Error **errp);
 
 extern int autostart;
 
@@ -53,6 +54,7 @@ extern uint8_t *boot_splash_filedata;
 extern bool enable_mlock;
 extern bool enable_cpu_pm;
 extern QEMUClockType rtc_clock;
+extern int start_on_wake;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/migration/savevm.c b/migration/savevm.c
index ff1a46e..1509173 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2948,6 +2948,40 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
     migration_incoming_state_destroy();
 }
 
+void load_cpr_snapshot(const char *file, Error **errp)
+{
+    QEMUFile *f;
+    int ret;
+    RunState state;
+
+    if (runstate_is_running()) {
+        error_setg(errp, "cprload called for a running VM");
+        return;
+    }
+
+    f = qf_file_open(file, O_RDONLY, 0, errp);
+    if (!f) {
+        return;
+    }
+
+    ret = qemu_loadvm_state(f, VMS_REBOOT);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while loading VM state", ret);
+        return;
+    }
+
+    state = global_state_get_runstate();
+    if (state == RUN_STATE_RUNNING) {
+        vm_start();
+    } else {
+        runstate_set(state);
+        if (runstate_check(RUN_STATE_SUSPENDED)) {
+            start_on_wake = 1;
+        }
+    }
+}
+
 int load_snapshot(const char *name, Error **errp)
 {
     BlockDriverState *bs, *bs_vm_state;
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 9ec7b88..81e6feb 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -167,6 +167,11 @@ void qmp_cprsave(const char *file, const char *mode, Error **errp)
     save_cpr_snapshot(file, mode, errp);
 }
 
+void qmp_cprload(const char *file, Error **errp)
+{
+    load_cpr_snapshot(file, errp);
+}
+
 void qmp_system_wakeup(Error **errp)
 {
     if (!qemu_wakeup_suspend_enabled()) {
diff --git a/qapi/migration.json b/qapi/migration.json
index b61df1d..ce4d32b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1635,3 +1635,14 @@
 ##
 { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'str' } }
 
+##
+# @cprload:
+#
+# Start virtual machine from checkpoint file that was created earlier using
+# the cprsave command.
+#
+# @file: name of checkpoint file
+#
+# Since 5.0
+##
+{ 'command': 'cprload', 'data': { 'file': 'str' } }
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 660537a..8478778 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -137,6 +137,7 @@ static time_t rtc_ref_start_datetime;
 static int rtc_realtime_clock_offset; /* used only with QEMU_CLOCK_REALTIME */
 static int rtc_host_datetime_offset = -1; /* valid & used only with
                                              RTC_BASE_DATETIME */
+int start_on_wake;
 QEMUClockType rtc_clock;
 int vga_interface_type = VGA_NONE;
 static DisplayOptions dpy;
@@ -602,6 +603,8 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
     { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
@@ -1519,7 +1522,17 @@ void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
     if (!(wakeup_reason_mask & (1 << reason))) {
         return;
     }
-    runstate_set(RUN_STATE_RUNNING);
+
+    /*
+     * Must call vm_start if it has never been called, to invoke the state
+     * change callbacks for the first time.
+     */
+    if (start_on_wake) {
+        start_on_wake = 0;
+        vm_start();
+    } else {
+        runstate_set(RUN_STATE_RUNNING);
+    }
     wakeup_reason = reason;
     qemu_notify_event();
 }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 06/32] savevm: HMP Command for cprload
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (4 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 05/32] savevm: QMP command for cprload Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 07/32] savevm: QMP command for cprinfo Steve Sistare
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable HMP access to the cprload QMP command.

Usage: cprload <file>

Signed-off-bu: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 13 +++++++++++++
 include/monitor/hmp.h |  1 +
 monitor/hmp-cmds.c    |  8 ++++++++
 3 files changed, 22 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index c8defd9..cb67150 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -372,6 +372,19 @@ SRST
 ERST
 
     {
+        .name       = "cprload",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "load VM checkpoint from file",
+        .cmd        = hmp_cprload,
+    },
+
+SRST
+``cprload`` *tag*
+  Load a virtual machine from checkpoint file *file* and continue VCPUs.
+ERST
+
+    {
         .name       = "delvm",
         .args_type  = "name:s",
         .params     = "tag",
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index af8ee23..7b8cdfd 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -60,6 +60,7 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict);
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
 void hmp_cprsave(Monitor *mon, const QDict *qdict);
+void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 59196ed..ba95737 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1149,6 +1149,14 @@ void hmp_cprsave(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, err);
 }
 
+void hmp_cprload(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_cprload(qdict_get_try_str(qdict, "file"), &err);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 07/32] savevm: QMP command for cprinfo
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (5 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 06/32] savevm: HMP Command " Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 16:17   ` Eric Blake
  2020-07-30 15:14 ` [PATCH V1 08/32] savevm: HMP " Steve Sistare
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprinfo QMP command.  This returns a string with a space-
separated list of modes supported by cprsave, and can be used by clients
as a feature test to check if the running QEMU instance supports cprsave.

Syntax:
  {'command':'cprinfo', 'returns':'str'}

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 monitor/qmp-cmds.c  | 5 +++++
 qapi/migration.json | 9 +++++++++
 qapi/pragma.json    | 1 +
 3 files changed, 15 insertions(+)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 81e6feb..8c400e6 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -162,6 +162,11 @@ void qmp_cont(Error **errp)
     }
 }
 
+char *qmp_cprinfo(Error **errp)
+{
+    return g_strdup("reboot");
+}
+
 void qmp_cprsave(const char *file, const char *mode, Error **errp)
 {
     save_cpr_snapshot(file, mode, errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index ce4d32b..8190b16 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1623,6 +1623,15 @@
   'data': { 'device-id': 'str' } }
 
 ##
+# @cprinfo:
+#
+# Return a space-delimited list of modes supported by the cprsave command
+#
+# Since 5.0
+##
+{ 'command': 'cprinfo', 'returns': 'str' }
+
+##
 # @cprsave:
 #
 # Create a checkpoint of the virtual machine device state in @file.
diff --git a/qapi/pragma.json b/qapi/pragma.json
index cffae27..43bdb39 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -5,6 +5,7 @@
 { 'pragma': {
     # Commands allowed to return a non-dictionary:
     'returns-whitelist': [
+        'cprinfo',
         'human-monitor-command',
         'qom-get',
         'query-migrate-cache-size',
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 08/32] savevm: HMP command for cprinfo
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (6 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 07/32] savevm: QMP command for cprinfo Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 09/32] savevm: prevent cprsave if memory is volatile Steve Sistare
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable HMP access to the cprinfo QMP command.

Usage: cprinfo

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 13 +++++++++++++
 include/monitor/hmp.h |  1 +
 monitor/hmp-cmds.c    | 10 ++++++++++
 3 files changed, 24 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index cb67150..7517876 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -354,6 +354,19 @@ SRST
 ERST
 
     {
+        .name       = "cprinfo",
+        .args_type  = "",
+        .params     = "",
+        .help       = "return list of modes supported by cprsave",
+        .cmd        = hmp_cprinfo,
+    },
+
+SRST
+``cprinfo`` *tag*
+  Return a space-delimited list of modes supported by cprsave.
+ERST
+
+    {
         .name       = "cprsave",
         .args_type  = "file:s,mode:s",
         .params     = "file 'reboot'",
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 7b8cdfd..919b9a9 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -59,6 +59,7 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_loadvm(Monitor *mon, const QDict *qdict);
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
+void hmp_cprinfo(Monitor *mon, const QDict *qdict);
 void hmp_cprsave(Monitor *mon, const QDict *qdict);
 void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ba95737..2f6af07 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1139,6 +1139,16 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
     qapi_free_AnnounceParameters(params);
 }
 
+void hmp_cprinfo(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    char *res = qmp_cprinfo(&err);
+
+    monitor_printf(mon, "%s\n", res);
+    g_free(res);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_cprsave(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 09/32] savevm: prevent cprsave if memory is volatile
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (7 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 08/32] savevm: HMP " Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 10/32] kvmclock: restore paused KVM clock Steve Sistare
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave and cprload require that guest ram be backed by an externally
visible shared file.  Check that in cprsave.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 exec.c                | 32 ++++++++++++++++++++++++++++++++
 include/exec/memory.h |  2 ++
 migration/savevm.c    |  4 ++++
 3 files changed, 38 insertions(+)

diff --git a/exec.c b/exec.c
index 6f381f9..02160e0 100644
--- a/exec.c
+++ b/exec.c
@@ -2726,6 +2726,38 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
     return block->offset + offset;
 }
 
+/*
+ * Return true if any memory regions are writable and not backed by shared
+ * memory.  Exclude x86 option rom shadow "pc.rom" by name, even though it is
+ * writable.
+ */
+bool qemu_ram_volatile(Error **errp)
+{
+    RAMBlock *block;
+    MemoryRegion *mr;
+    bool ret = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        mr = block->mr;
+        if (mr &&
+            memory_region_is_ram(mr) &&
+            !memory_region_is_ram_device(mr) &&
+            !memory_region_is_rom(mr) &&
+            (!mr->name || strcmp(mr->name, "pc.rom")) &&
+            (block->fd == -1 || !qemu_ram_is_shared(block))) {
+
+            error_setg(errp, "Memory region %s is volatile",
+                       memory_region_name(mr));
+            ret = true;
+            break;
+        }
+    }
+
+    rcu_read_unlock();
+    return ret;
+}
+
 /* Generate a debug exception if a watchpoint has been hit.  */
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
                           MemTxAttrs attrs, int flags, uintptr_t ra)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 307e527..6aafbb0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2519,6 +2519,8 @@ bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+bool qemu_ram_volatile(Error **errp);
+
 #endif
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 1509173..f101039 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2713,6 +2713,10 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         return;
     }
 
+    if (op == VMS_REBOOT && qemu_ram_volatile(errp)) {
+        return;
+    }
+
     f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, errp);
     if (!f) {
         return;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 10/32] kvmclock: restore paused KVM clock
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (8 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 09/32] savevm: prevent cprsave if memory is volatile Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 11/32] cpu: disable ticks when suspended Steve Sistare
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

If the VM is paused when the KVM clock is serialized to a file, record
that the clock is valid, so the value will be reused rather than
overwritten after cprload with a new call to KVM_GET_CLOCK here:

kvmclock_vm_state_change()
    if (running)
        ...
    else
        if (s->clock_valid)
            return;         <-- instead, return here

        kvm_update_clock()
           kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data)  <-- overwritten

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/i386/kvm/clock.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 6428335..161991a 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -285,18 +285,22 @@ static int kvmclock_pre_save(void *opaque)
     if (!s->runstate_paused) {
         kvm_update_clock(s);
     }
+    if (!runstate_is_running()) {
+        s->clock_valid = true;
+    }
 
     return 0;
 }
 
 static const VMStateDescription kvmclock_vmsd = {
     .name = "kvmclock",
-    .version_id = 1,
+    .version_id = 2,
     .minimum_version_id = 1,
     .pre_load = kvmclock_pre_load,
     .pre_save = kvmclock_pre_save,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(clock, KVMClockState),
+        VMSTATE_BOOL_V(clock_valid, KVMClockState, 2),
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription * []) {
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 11/32] cpu: disable ticks when suspended
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (9 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 10/32] kvmclock: restore paused KVM clock Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 12/32] vl: pause option Steve Sistare
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

After cprload, the guest console misbehaves.  You must type 8 characters
before any are echoed to the terminal.  Qemu was not sending interrupts
to the guest because the QEMU_CLOCK_VIRTUAL timers_state.cpu_clock_offset
was bad.  The offset is usually updated at cprsave time by the path

  save_cpr_snapshot()
    vm_stop()
      do_vm_stop()
        if (runstate_is_running())
          cpu_disable_ticks();
            timers_state.cpu_clock_offset = cpu_get_clock_locked();

However, if the guest is in RUN_STATE_SUSPENDED, then cpu_disable_ticks is
not called.  Further, the earlier transition to suspended in
qemu_system_suspend did not disable ticks.  To fix, call cpu_disable_ticks
from save_cpr_snapshot.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/savevm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index f101039..00f493b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2729,6 +2729,11 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         return;
     }
 
+    /* Update timers_state before saving.  Suspend did not so do. */
+    if (runstate_check(RUN_STATE_SUSPENDED)) {
+        cpu_disable_ticks();
+    }
+
     vm_stop(RUN_STATE_SAVE_VM);
 
     ret = qemu_savevm_state(f, op, errp);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 12/32] vl: pause option
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (10 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 11/32] cpu: disable ticks when suspended Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 16:20   ` Eric Blake
  2020-07-30 17:03   ` Alex Bennée
  2020-07-30 15:14 ` [PATCH V1 13/32] gdbstub: gdb support for suspended state Steve Sistare
                   ` (23 subsequent siblings)
  35 siblings, 2 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the -pause command-line parameter and the QEMU_PAUSE environment
variable to briefly pause QEMU in main and allow a developer to attach gdb.
Useful when the developer does not invoke QEMU directly, such as when using
libvirt.

Usage:
  qemu -pause <seconds>
  or
  export QEMU_PAUSE=<seconds>

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 qemu-options.hx |  9 +++++++++
 softmmu/vl.c    | 15 ++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 708583b..8505cf2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3668,6 +3668,15 @@ SRST
     option is experimental.
 ERST
 
+DEF("pause", HAS_ARG, QEMU_OPTION_pause, \
+    "-pause secs    Pause for secs seconds on entry to main.\n", QEMU_ARCH_ALL)
+
+SRST
+``--pause secs``
+    Pause for a number of seconds on entry to main.  Useful for attaching
+    a debugger after QEMU has been launched by some other entity.
+ERST
+
 DEF("S", 0, QEMU_OPTION_S, \
     "-S              freeze CPU at startup (use 'c' to start execution)\n",
     QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 8478778..951994f 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2844,7 +2844,7 @@ static void create_default_memdev(MachineState *ms, const char *path)
 
 void qemu_init(int argc, char **argv, char **envp)
 {
-    int i;
+    int i, seconds;
     int snapshot, linux_boot;
     const char *initrd_filename;
     const char *kernel_filename, *kernel_cmdline;
@@ -2882,6 +2882,13 @@ void qemu_init(int argc, char **argv, char **envp)
     QemuPluginList plugin_list = QTAILQ_HEAD_INITIALIZER(plugin_list);
     int mem_prealloc = 0; /* force preallocation of physical target memory */
 
+    if (getenv("QEMU_PAUSE")) {
+        seconds = atoi(getenv("QEMU_PAUSE"));
+        printf("Pausing %d seconds for debugger. QEMU PID is %d\n",
+               seconds, getpid());
+        sleep(seconds);
+    }
+
     os_set_line_buffering();
 
     error_init(argv[0]);
@@ -3204,6 +3211,12 @@ void qemu_init(int argc, char **argv, char **envp)
             case QEMU_OPTION_gdb:
                 add_device_config(DEV_GDB, optarg);
                 break;
+            case QEMU_OPTION_pause:
+                seconds = atoi(optarg);
+                printf("Pausing %d seconds for debugger. QEMU PID is %d\n",
+                            seconds, getpid());
+                sleep(seconds);
+                break;
             case QEMU_OPTION_L:
                 if (is_help_option(optarg)) {
                     list_data_dirs = true;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 13/32] gdbstub: gdb support for suspended state
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (11 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 12/32] vl: pause option Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart Steve Sistare
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Modify the gdb server so a continue command appears to resume execution
when in RUN_STATE_SUSPENDED.  Do not print the next gdb prompt, but do not
actually resume instruction fetch.  While in this "fake" running mode, a
ctrl-C returns the user to the gdb prompt.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 gdbstub.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index f3a318c..2f0d9ff 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -461,7 +461,9 @@ static inline void gdb_continue(void)
 #else
     if (!runstate_needs_reset()) {
         trace_gdbstub_op_continue();
-        vm_start();
+        if (!runstate_check(RUN_STATE_SUSPENDED)) {
+            vm_start();
+        }
     }
 #endif
 }
@@ -490,7 +492,7 @@ static int gdb_continue_partial(char *newstates)
     int flag = 0;
 
     if (!runstate_needs_reset()) {
-        if (vm_prepare_start()) {
+        if (!runstate_check(RUN_STATE_SUSPENDED) && vm_prepare_start()) {
             return 0;
         }
 
@@ -2835,6 +2837,9 @@ static void gdb_read_byte(uint8_t ch)
         /* when the CPU is running, we cannot do anything except stop
            it when receiving a char */
         vm_stop(RUN_STATE_PAUSED);
+    } else if (runstate_check(RUN_STATE_SUSPENDED) && ch == 3) {
+        /* Received ctrl-c from gdb */
+        gdb_vm_state_change(0, 0, RUN_STATE_PAUSED);
     } else
 #endif
     {
@@ -3282,6 +3287,8 @@ static void gdb_sigterm_handler(int signal)
 {
     if (runstate_is_running()) {
         vm_stop(RUN_STATE_PAUSED);
+    } else if (runstate_check(RUN_STATE_SUSPENDED)) {
+        gdb_vm_state_change(0, 0, RUN_STATE_PAUSED);
     }
 }
 #endif
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (12 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 13/32] gdbstub: gdb support for suspended state Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 16:22   ` Eric Blake
  2020-07-30 15:14 ` [PATCH V1 15/32] vl: QEMU_START_FREEZE env var Steve Sistare
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add the VMS_RESTART variant of vmstate, for use when upgrading qemu in place
on the same host without a reboot.  Invoke it using:
  cprsave <filename> restart

VMS_RESTART supports guest ram mapped by private anonymous memory, versus
VMS_REBOOT which requires that guest ram be mapped by persistent shared
memory.  Subsequent patches complete its implementation.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx             | 4 +++-
 include/migration/vmstate.h | 1 +
 migration/savevm.c          | 4 +++-
 monitor/qmp-cmds.c          | 2 +-
 qapi/migration.json         | 1 +
 5 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 7517876..11a2089 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -369,7 +369,7 @@ ERST
     {
         .name       = "cprsave",
         .args_type  = "file:s,mode:s",
-        .params     = "file 'reboot'",
+        .params     = "file 'restart'|'reboot'",
         .help       = "create a checkpoint of the VM in file",
         .cmd        = hmp_cprsave,
     },
@@ -380,6 +380,8 @@ SRST
   in *file*.
   If *mode* is 'reboot', the checkpoint can be cprload'ed after a host kexec
   reboot.
+  If *mode* is 'restart', the checkpoint can be cprload'ed after restarting
+  qemu.
   exec() /usr/bin/qemu-exec if it exists, else exec /usr/bin/qemu-system-x86_64,
   passing all the original command line arguments.  The VCPUs remain paused.
 ERST
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index c58551a..8239b84 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -162,6 +162,7 @@ typedef enum {
     VMS_MIGRATE  = (1U << 1),
     VMS_SNAPSHOT = (1U << 2),
     VMS_REBOOT   = (1U << 3),
+    VMS_RESTART  = (1U << 4),
     VMS_MODE_ALL = ~0U
 } VMStateMode;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 00f493b..38cc63a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2708,6 +2708,8 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
 
     if (!strcmp(mode, "reboot")) {
         op = VMS_REBOOT;
+    } else if (!strcmp(mode, "restart")) {
+        op = VMS_RESTART;
     } else {
         error_setg(errp, "cprsave: bad mode %s", mode);
         return;
@@ -2973,7 +2975,7 @@ void load_cpr_snapshot(const char *file, Error **errp)
         return;
     }
 
-    ret = qemu_loadvm_state(f, VMS_REBOOT);
+    ret = qemu_loadvm_state(f, VMS_REBOOT | VMS_RESTART);
     qemu_fclose(f);
     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 8c400e6..8a74c6e 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -164,7 +164,7 @@ void qmp_cont(Error **errp)
 
 char *qmp_cprinfo(Error **errp)
 {
-    return g_strdup("reboot");
+    return g_strdup("reboot restart");
 }
 
 void qmp_cprsave(const char *file, const char *mode, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 8190b16..d22992b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1639,6 +1639,7 @@
 #
 # @file: name of checkpoint file
 # @mode: 'reboot' : checkpoint can be cprload'ed after a host kexec reboot.
+#        'restart': checkpoint can be cprload'ed after restarting qemu.
 #
 # Since 5.0
 ##
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 15/32] vl: QEMU_START_FREEZE env var
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (13 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 16/32] oslib: add qemu_clr_cloexec Steve Sistare
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

For qemu upgrade and restart, we will re-exec() qemu with the same argv.
However, qemu must start in a paused state and wait for the cprload command,
and the original argv might not contain the -S option.  To avoid modifying
argv, provide the QEMU_START_FREEZE environment variable.  If
QEMU_START_FREEZE is set, then set autostart=0, like the -S option.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 softmmu/vl.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 951994f..7016e39 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -4501,6 +4501,11 @@ void qemu_init(int argc, char **argv, char **envp)
         exit(0);
     }
 
+    if (getenv("QEMU_START_FREEZE")) {
+        unsetenv("QEMU_START_FREEZE");
+        autostart = 0;
+    }
+
     if (incoming) {
         Error *local_err = NULL;
         qemu_start_incoming_migration(incoming, &local_err);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 16/32] oslib: add qemu_clr_cloexec
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (14 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 15/32] vl: QEMU_START_FREEZE env var Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 17/32] util: env var helpers Steve Sistare
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/qemu/osdep.h | 1 +
 util/oslib-posix.c   | 9 +++++++++
 util/oslib-win32.c   | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 45c217a..bb28df1 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -551,6 +551,7 @@ static inline void qemu_timersub(const struct timeval *val1,
 #endif
 
 void qemu_set_cloexec(int fd);
+void qemu_clr_cloexec(int fd);
 
 /* Starting on QEMU 2.5, qemu_hw_version() returns "2.5+" by default
  * instead of QEMU_VERSION, so setting hw_version on MachineClass
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index d923674..28fee45 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -314,6 +314,15 @@ void qemu_set_cloexec(int fd)
     assert(f != -1);
 }
 
+void qemu_clr_cloexec(int fd)
+{
+    int f;
+    f = fcntl(fd, F_GETFD);
+    assert(f != -1);
+    f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
+    assert(f != -1);
+}
+
 /*
  * Creates a pipe with FD_CLOEXEC set on both file descriptors
  */
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 7eedbe5..e5d0c7c 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -254,6 +254,10 @@ void qemu_set_cloexec(int fd)
 {
 }
 
+void qemu_clr_cloexec(int fd)
+{
+}
+
 /* Offset between 1/1/1601 and 1/1/1970 in 100 nanosec units */
 #define _W32_FT_OFFSET (116444736000000000ULL)
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 17/32] util: env var helpers
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (15 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 16/32] oslib: add qemu_clr_cloexec Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 18/32] osdep: import MADV_DOEXEC Steve Sistare
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add functions for saving fd's and ram extents in the environment via
setenv, and for reading them back via getenv.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
---
 MAINTAINERS           |   7 +++
 include/qemu/cutils.h |   1 +
 include/qemu/env.h    |  31 ++++++++++++
 util/Makefile.objs    |   2 +-
 util/env.c            | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/env.h
 create mode 100644 util/env.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3395abd..8d377a7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3115,3 +3115,10 @@ Performance Tools and Tests
 M: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
 S: Maintained
 F: scripts/performance/
+
+Environment variable helpers
+M: Steve Sistare <steven.sistare@oracle.com>
+M: Mark Kanda <mark.kanda@oracle.com>
+S: Maintained
+F: include/qemu/env.h
+F: util/env.c
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index eb59852..d4c7d70 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -1,6 +1,7 @@
 #ifndef QEMU_CUTILS_H
 #define QEMU_CUTILS_H
 
+#include "qemu/env.h"
 /**
  * pstrcpy:
  * @buf: buffer to copy string into
diff --git a/include/qemu/env.h b/include/qemu/env.h
new file mode 100644
index 0000000..53cc121
--- /dev/null
+++ b/include/qemu/env.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_ENV_H
+#define QEMU_ENV_H
+
+#define FD_PREFIX "QEMU_FD_"
+#define ADDR_PREFIX "QEMU_ADDR_"
+#define LEN_PREFIX "QEMU_LEN_"
+#define BOOL_PREFIX "QEMU_BOOL_"
+
+typedef int (*walkenv_cb)(const char *name, const char *val, void *handle);
+
+bool getenv_ram(const char *name, void **addrp, size_t *lenp);
+void setenv_ram(const char *name, void *addr, size_t len);
+void unsetenv_ram(const char *name);
+int getenv_fd(const char *name);
+void setenv_fd(const char *name, int fd);
+void unsetenv_fd(const char *name);
+bool getenv_bool(const char *name);
+void setenv_bool(const char *name, bool val);
+void unsetenv_bool(const char *name);
+int walkenv(const char *prefix, walkenv_cb cb, void *handle);
+void printenv(void);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index cc5e371..d357932 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -1,4 +1,4 @@
-util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o
+util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o env.o
 util-obj-$(call lnot,$(CONFIG_ATOMIC64)) += atomic64.o
 util-obj-$(CONFIG_POSIX) += aio-posix.o
 util-obj-$(CONFIG_POSIX) += fdmon-poll.o
diff --git a/util/env.c b/util/env.c
new file mode 100644
index 0000000..0cc4a9f
--- /dev/null
+++ b/util/env.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/env.h"
+
+static uint64_t getenv_ulong(const char *prefix, const char *name, bool *found)
+{
+    char var[80], *val;
+    uint64_t res;
+
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    val = getenv(var);
+    if (val) {
+        *found = true;
+        res = strtol(val, 0, 10);
+    } else {
+        *found = false;
+        res = 0;
+    }
+    return res;
+}
+
+static void setenv_ulong(const char *prefix, const char *name, uint64_t val)
+{
+    char var[80], val_str[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
+    setenv(var, val_str, 1);
+}
+
+static void unsetenv_ulong(const char *prefix, const char *name)
+{
+    char var[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    unsetenv(var);
+}
+
+bool getenv_ram(const char *name, void **addrp, size_t *lenp)
+{
+    bool found1, found2;
+    *addrp = (void *) getenv_ulong(ADDR_PREFIX, name, &found1);
+    *lenp = getenv_ulong(LEN_PREFIX, name, &found2);
+    assert(found1 == found2);
+    return found1;
+}
+
+void setenv_ram(const char *name, void *addr, size_t len)
+{
+    setenv_ulong(ADDR_PREFIX, name, (uint64_t)addr);
+    setenv_ulong(LEN_PREFIX, name, len);
+}
+
+void unsetenv_ram(const char *name)
+{
+    unsetenv_ulong(ADDR_PREFIX, name);
+    unsetenv_ulong(LEN_PREFIX, name);
+}
+
+int getenv_fd(const char *name)
+{
+    bool found;
+    int fd = getenv_ulong(FD_PREFIX, name, &found);
+    if (!found) {
+        fd = -1;
+    }
+    return fd;
+}
+
+void setenv_fd(const char *name, int fd)
+{
+    setenv_ulong(FD_PREFIX, name, fd);
+}
+
+void unsetenv_fd(const char *name)
+{
+    unsetenv_ulong(FD_PREFIX, name);
+}
+
+bool getenv_bool(const char *name)
+{
+    bool found;
+    bool val = getenv_ulong(BOOL_PREFIX, name, &found);
+    if (!found) {
+        val = -1;
+    }
+    return val;
+}
+
+void setenv_bool(const char *name, bool val)
+{
+    setenv_ulong(BOOL_PREFIX, name, val);
+}
+
+void unsetenv_bool(const char *name)
+{
+    unsetenv_ulong(BOOL_PREFIX, name);
+}
+
+int walkenv(const char *prefix, walkenv_cb cb, void *handle)
+{
+    char *str, name[128];
+    char **envp = environ;
+    size_t prefix_len = strlen(prefix);
+
+    while (*envp) {
+        str = *envp++;
+        if (!strncmp(str, prefix, prefix_len)) {
+            char *val = strchr(str, '=');
+            str += prefix_len;
+            strncpy(name, str, val - str);
+            name[val - str] = 0;
+            if (cb(name, val + 1, handle)) {
+                return 1;
+            }
+        }
+    }
+    return 0;
+}
+
+void printenv(void)
+{
+    char **ptr = environ;
+    while (*ptr) {
+        puts(*ptr++);
+    }
+}
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 18/32] osdep: import MADV_DOEXEC
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (16 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 17/32] util: env var helpers Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 19/32] memory: ram_block_add cosmetic changes Steve Sistare
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Anonymous memory segments used by the guest are preserved across a re-exec
of qemu, mapped at the same VA, via a proposed madvise(MADV_DOEXEC) option
in the Linux kernel. For the madvise patches, see:

https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/qemu/osdep.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index bb28df1..7ce555a 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -390,6 +390,11 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #else
 #define QEMU_MADV_REMOVE QEMU_MADV_INVALID
 #endif
+#ifdef MADV_DOEXEC
+#define QEMU_MADV_DOEXEC MADV_DOEXEC
+#else
+#define QEMU_MADV_DOEXEC QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)
 
@@ -403,6 +408,7 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#define QEMU_MADV_DOEXEC  QEMU_MADV_INVALID
 
 #else /* no-op */
 
@@ -416,6 +422,7 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#define QEMU_MADV_DOEXEC  QEMU_MADV_INVALID
 
 #endif
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 19/32] memory: ram_block_add cosmetic changes
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (17 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 18/32] osdep: import MADV_DOEXEC Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 20/32] vl: add helper to request re-exec Steve Sistare
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Massage the code to simplify the later patch "exec, memory: exec(3) to
restart".

No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 exec.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index 02160e0..359e437 100644
--- a/exec.c
+++ b/exec.c
@@ -2233,32 +2233,37 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
     RAMBlock *last_block = NULL;
     ram_addr_t old_ram_size, new_ram_size;
     Error *err = NULL;
+    const char *name;
+    void *addr;
+    size_t maxlen;
 
     old_ram_size = last_ram_page();
 
     qemu_mutex_lock_ramlist();
-    new_block->offset = find_ram_offset(new_block->max_length);
+    maxlen = new_block->max_length;
+    new_block->offset = find_ram_offset(maxlen);
 
     if (!new_block->host) {
         if (xen_enabled()) {
-            xen_ram_alloc(new_block->offset, new_block->max_length,
-                          new_block->mr, &err);
+            xen_ram_alloc(new_block->offset, maxlen, new_block->mr, &err);
             if (err) {
                 error_propagate(errp, err);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
         } else {
-            new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align, shared);
-            if (!new_block->host) {
+            name = memory_region_name(new_block->mr);
+            addr = phys_mem_alloc(maxlen, &new_block->mr->align, shared);
+
+            if (!addr) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
-                                 memory_region_name(new_block->mr));
+                                 name);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
+            memory_try_enable_merging(addr, maxlen);
+            new_block->host = addr;
         }
     }
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 20/32] vl: add helper to request re-exec
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (18 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 19/32] memory: ram_block_add cosmetic changes Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 21/32] exec, memory: exec(3) to restart Steve Sistare
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a qemu_exec_requested() hook that causes the main loop to exit and
re-exec qemu using the same initial arguments.  If /usr/bin/qemu-exec
exists, exec that instead.  This is an optional site-specific trampoline
that may alter the environment before exec'ing the qemu binary.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/sysemu.h |  1 +
 softmmu/vl.c            | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 5360da5..4dfc4ca 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -15,6 +15,7 @@ extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
 
 void qemu_add_data_dir(const char *path);
+void qemu_system_exec_request(void);
 
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 7016e39..72f0e08 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -116,6 +116,7 @@
 
 #define MAX_VIRTIO_CONSOLES 1
 
+static char **argv_main;
 static const char *data_dir[16];
 static int data_dir_idx;
 const char *bios_name = NULL;
@@ -1296,6 +1297,7 @@ static ShutdownCause reset_requested;
 static ShutdownCause shutdown_requested;
 static int shutdown_signal;
 static pid_t shutdown_pid;
+static int exec_requested;
 static int powerdown_requested;
 static int debug_requested;
 static int suspend_requested;
@@ -1326,6 +1328,11 @@ static int qemu_shutdown_requested(void)
     return atomic_xchg(&shutdown_requested, SHUTDOWN_CAUSE_NONE);
 }
 
+static int qemu_exec_requested(void)
+{
+    return atomic_xchg(&exec_requested, 0);
+}
+
 static void qemu_kill_report(void)
 {
     if (!qtest_driver() && shutdown_signal) {
@@ -1582,6 +1589,13 @@ void qemu_system_shutdown_request(ShutdownCause reason)
     qemu_notify_event();
 }
 
+void qemu_system_exec_request(void)
+{
+    shutdown_requested = 1;
+    exec_requested = 1;
+    qemu_notify_event();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown();
@@ -1617,6 +1631,16 @@ void qemu_system_debug_request(void)
     qemu_notify_event();
 }
 
+static void qemu_exec(void)
+{
+    const char *helper = "/usr/bin/qemu-exec";
+    const char *bin = !access(helper, X_OK) ? helper : argv_main[0];
+
+    execvp(bin, argv_main);
+    error_report("execvp failed, errno %d.", errno);
+    exit(1);
+}
+
 static bool main_loop_should_exit(void)
 {
     RunState r;
@@ -1637,6 +1661,11 @@ static bool main_loop_should_exit(void)
     }
     request = qemu_shutdown_requested();
     if (request) {
+
+        if (qemu_exec_requested()) {
+            qemu_exec();
+            /* not reached */
+        }
         qemu_kill_report();
         qemu_system_shutdown(request);
         if (no_shutdown) {
@@ -2891,6 +2920,7 @@ void qemu_init(int argc, char **argv, char **envp)
 
     os_set_line_buffering();
 
+    argv_main = argv;
     error_init(argv[0]);
     module_call_init(MODULE_INIT_TRACE);
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 21/32] exec, memory: exec(3) to restart
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (19 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 20/32] vl: add helper to request re-exec Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 22/32] char: qio_channel_socket_accept reuse fd Steve Sistare
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Use exec() to restart qemu to a potentially new version, while preserving
guest RAM.  The guest pauses briefly.

cprsave saves the address and length of RAM blocks to the environment via
setenv, tags the RAM with the new madvise(MADV_DOEXEC) option to preserve
it across exec, then exec()'s the (typically updated) qemu binary with the
original argv.

On qemu restart, ram_block_add() finds the env vars that describe preserved
RAM segments and does not reallocate them.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 exec.c                | 36 ++++++++++++++++++++++++++++++++++--
 include/exec/memory.h |  2 ++
 migration/savevm.c    | 16 ++++++++++++++++
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 359e437..5473c09 100644
--- a/exec.c
+++ b/exec.c
@@ -2235,7 +2235,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
     Error *err = NULL;
     const char *name;
     void *addr;
-    size_t maxlen;
+    size_t len, maxlen;
 
     old_ram_size = last_ram_page();
 
@@ -2253,7 +2253,12 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
             }
         } else {
             name = memory_region_name(new_block->mr);
-            addr = phys_mem_alloc(maxlen, &new_block->mr->align, shared);
+            if (getenv_ram(name, &addr, &len)) {
+                assert(len == maxlen);
+            } else {
+                addr = phys_mem_alloc(maxlen, &new_block->mr->align, shared);
+                setenv_ram(name, addr, maxlen);
+            }
 
             if (!addr) {
                 error_setg_errno(errp, errno,
@@ -2499,6 +2504,8 @@ void qemu_ram_free(RAMBlock *block)
         return;
     }
 
+    unsetenv_ram(memory_region_name(block->mr));
+
     if (block->host) {
         ram_block_notify_remove(block->host, block->max_length);
     }
@@ -2763,6 +2770,31 @@ bool qemu_ram_volatile(Error **errp)
     return ret;
 }
 
+static int preserve_ram(const char *name, const char *val, void *handle)
+{
+    void *addr;
+    size_t len;
+    Error **errp = handle;
+
+    getenv_ram(name, &addr, &len);
+    if (qemu_madvise(addr, len, QEMU_MADV_DOEXEC)) {
+        error_setg_errno(errp, errno,
+                         "MADV_DOEXEC failed on memory region %s", name);
+        return 1;
+    }
+    return 0;
+}
+
+
+int qemu_preserve_ram(Error **errp)
+{
+    int ret;
+    qemu_mutex_lock_ramlist();
+    ret = walkenv(ADDR_PREFIX, preserve_ram, errp);
+    qemu_mutex_unlock_ramlist();
+    return ret;
+}
+
 /* Generate a debug exception if a watchpoint has been hit.  */
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
                           MemTxAttrs attrs, int flags, uintptr_t ra)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6aafbb0..e2d297d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2521,6 +2521,8 @@ bool ram_block_discard_is_required(void);
 
 bool qemu_ram_volatile(Error **errp);
 
+int qemu_preserve_ram(Error **errp);
+
 #endif
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 38cc63a..2902006 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2719,6 +2719,16 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         return;
     }
 
+    if (op == VMS_RESTART && QEMU_MADV_DOEXEC == QEMU_MADV_INVALID) {
+        error_setg(errp, "kernel does not support MADV_DOEXEC.");
+        return;
+    }
+
+    if (op == VMS_RESTART && xen_enabled()) {
+        error_setg(errp, "xen does not support cprsave restart");
+        return;
+    }
+
     f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, errp);
     if (!f) {
         return;
@@ -2747,6 +2757,12 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
     if (op == VMS_REBOOT) {
         no_shutdown = 0;
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    } else if (op == VMS_RESTART) {
+        if (qemu_preserve_ram(errp)) {
+            return;
+        }
+        qemu_system_exec_request();
+        putenv((char *)"QEMU_START_FREEZE=");
     }
 }
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 22/32] char: qio_channel_socket_accept reuse fd
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (20 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 21/32] exec, memory: exec(3) to restart Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 23/32] char: save/restore chardev socket fds Steve Sistare
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Mark Kanda <mark.kanda@oracle.com>

Add an fd argument to qio_channel_socket_accept.  If not -1, the channel
uses that fd instead of accepting a new socket connection.  All callers
pass -1 in this patch, so no functional change.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/io/channel-socket.h    |  3 ++-
 io/channel-socket.c            | 12 +++++++++---
 io/net-listener.c              |  4 ++--
 scsi/qemu-pr-helper.c          |  2 +-
 tests/qtest/tpm-emu.c          |  2 +-
 tests/test-char.c              |  2 +-
 tests/test-io-channel-socket.c |  4 ++--
 7 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index 777ff59..0ffc560 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -248,6 +248,7 @@ qio_channel_socket_get_remote_address(QIOChannelSocket *ioc,
 /**
  * qio_channel_socket_accept:
  * @ioc: the socket channel object
+ * @reuse_fd: fd to reuse; -1 otherwise
  * @errp: pointer to a NULL-initialized error object
  *
  * If the socket represents a server, then this accepts
@@ -258,7 +259,7 @@ qio_channel_socket_get_remote_address(QIOChannelSocket *ioc,
  */
 QIOChannelSocket *
 qio_channel_socket_accept(QIOChannelSocket *ioc,
-                          Error **errp);
+                          int reuse_fd, Error **errp);
 
 
 #endif /* QIO_CHANNEL_SOCKET_H */
diff --git a/io/channel-socket.c b/io/channel-socket.c
index e1b4667..dde12bf 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -352,7 +352,7 @@ void qio_channel_socket_dgram_async(QIOChannelSocket *ioc,
 
 QIOChannelSocket *
 qio_channel_socket_accept(QIOChannelSocket *ioc,
-                          Error **errp)
+                          int reuse_fd, Error **errp)
 {
     QIOChannelSocket *cioc;
 
@@ -362,8 +362,14 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
 
  retry:
     trace_qio_channel_socket_accept(ioc);
-    cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
-                           &cioc->remoteAddrLen);
+
+    if (reuse_fd != -1) {
+        cioc->fd = reuse_fd;
+    } else {
+        cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
+                               &cioc->remoteAddrLen);
+    }
+
     if (cioc->fd < 0) {
         if (errno == EINTR) {
             goto retry;
diff --git a/io/net-listener.c b/io/net-listener.c
index 5d8a226..bbdea1e 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -45,7 +45,7 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
     QIOChannelSocket *sioc;
 
     sioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
-                                     NULL);
+                                     -1, NULL);
     if (!sioc) {
         return TRUE;
     }
@@ -194,7 +194,7 @@ static gboolean qio_net_listener_wait_client_func(QIOChannel *ioc,
     QIOChannelSocket *sioc;
 
     sioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
-                                     NULL);
+                                     -1, NULL);
     if (!sioc) {
         return TRUE;
     }
diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
index 57ad830..0e6d683 100644
--- a/scsi/qemu-pr-helper.c
+++ b/scsi/qemu-pr-helper.c
@@ -800,7 +800,7 @@ static gboolean accept_client(QIOChannel *ioc, GIOCondition cond, gpointer opaqu
     PRHelperClient *prh;
 
     cioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
-                                     NULL);
+                                     -1, NULL);
     if (!cioc) {
         return TRUE;
     }
diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
index 2e8eb7b..19e5dab 100644
--- a/tests/qtest/tpm-emu.c
+++ b/tests/qtest/tpm-emu.c
@@ -83,7 +83,7 @@ void *tpm_emu_ctrl_thread(void *data)
     g_cond_signal(&s->data_cond);
 
     qio_channel_wait(QIO_CHANNEL(lioc), G_IO_IN);
-    ioc = QIO_CHANNEL(qio_channel_socket_accept(lioc, &error_abort));
+    ioc = QIO_CHANNEL(qio_channel_socket_accept(lioc, -1, &error_abort));
     g_assert(ioc);
 
     {
diff --git a/tests/test-char.c b/tests/test-char.c
index 614bdac..1bb6ae0 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -884,7 +884,7 @@ char_socket_client_server_thread(gpointer data)
     QIOChannelSocket *cioc;
 
 retry:
-    cioc = qio_channel_socket_accept(ioc, &error_abort);
+    cioc = qio_channel_socket_accept(ioc, -1, &error_abort);
     g_assert_nonnull(cioc);
 
     if (char_socket_ping_pong(QIO_CHANNEL(cioc), NULL) != 0) {
diff --git a/tests/test-io-channel-socket.c b/tests/test-io-channel-socket.c
index d43083a..0d410cf 100644
--- a/tests/test-io-channel-socket.c
+++ b/tests/test-io-channel-socket.c
@@ -75,7 +75,7 @@ static void test_io_channel_setup_sync(SocketAddress *listen_addr,
     qio_channel_set_delay(*src, false);
 
     qio_channel_wait(QIO_CHANNEL(lioc), G_IO_IN);
-    *dst = QIO_CHANNEL(qio_channel_socket_accept(lioc, &error_abort));
+    *dst = QIO_CHANNEL(qio_channel_socket_accept(lioc, -1, &error_abort));
     g_assert(*dst);
 
     test_io_channel_set_socket_bufs(*src, *dst);
@@ -143,7 +143,7 @@ static void test_io_channel_setup_async(SocketAddress *listen_addr,
     g_assert(!data.err);
 
     qio_channel_wait(QIO_CHANNEL(lioc), G_IO_IN);
-    *dst = QIO_CHANNEL(qio_channel_socket_accept(lioc, &error_abort));
+    *dst = QIO_CHANNEL(qio_channel_socket_accept(lioc, -1, &error_abort));
     g_assert(*dst);
 
     qio_channel_set_delay(*src, false);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 23/32] char: save/restore chardev socket fds
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (21 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 22/32] char: qio_channel_socket_accept reuse fd Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 24/32] ui: save/restore vnc " Steve Sistare
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Mark Kanda <mark.kanda@oracle.com>

Iterate through the character devices and save/restore the socket fds.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-socket.c   | 35 +++++++++++++++++++++++++++++++++++
 chardev/char.c          | 14 ++++++++++++++
 include/chardev/char.h  |  5 +++++
 include/sysemu/sysemu.h |  1 +
 migration/savevm.c      |  8 ++++++++
 5 files changed, 63 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index ef62dbf..e08e7e1 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -36,6 +36,8 @@
 #include "qapi/qapi-visit-sockets.h"
 
 #include "chardev/char-io.h"
+#include "sysemu/sysemu.h"
+#include "qemu/cutils.h"
 
 /***********************************************************/
 /* TCP Net console */
@@ -400,6 +402,7 @@ static void tcp_chr_free_connection(Chardev *chr)
     SocketChardev *s = SOCKET_CHARDEV(chr);
     int i;
 
+    unsetenv_fd(chr->label);
     if (s->read_msgfds_num) {
         for (i = 0; i < s->read_msgfds_num; i++) {
             close(s->read_msgfds[i]);
@@ -1375,6 +1378,9 @@ static void qmp_chardev_open_socket(Chardev *chr,
             return;
         }
     }
+
+    load_char_socket_fd(chr);
+
 }
 
 static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
@@ -1517,3 +1523,32 @@ static void register_types(void)
 }
 
 type_init(register_types);
+
+void save_char_socket_fd(Chardev *chr)
+{
+    SocketChardev *sockchar = SOCKET_CHARDEV(chr);
+
+    if (sockchar->sioc) {
+        setenv_fd(chr->label, sockchar->sioc->fd);
+    }
+}
+
+void load_char_socket_fd(Chardev *chr)
+{
+    SocketChardev *sockchar;
+    QIOChannelSocket *sioc;
+
+    int fd = getenv_fd(chr->label);
+
+    if (fd != -1) {
+        unsetenv_fd(chr->label);
+        sockchar = SOCKET_CHARDEV(chr);
+        sioc = qio_channel_socket_accept(*sockchar->listener->sioc, fd, NULL);
+        if (sioc) {
+            tcp_chr_accept(sockchar->listener, sioc, chr);
+        } else {
+            error_printf("error: could not restore socket for %s\n",
+                         chr->label);
+        }
+    }
+}
diff --git a/chardev/char.c b/chardev/char.c
index 77e7ec8..8fd54cc 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -34,6 +34,7 @@
 #include "qapi/qapi-commands-char.h"
 #include "qapi/qmp/qerror.h"
 #include "sysemu/replay.h"
+#include "sysemu/sysemu.h"
 #include "qemu/help_option.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -1174,3 +1175,16 @@ static void register_types(void)
 }
 
 type_init(register_types);
+
+static int chardev_is_socket(Object *child, void *opaque)
+{
+    if (CHARDEV_IS_SOCKET(child)) {
+        save_char_socket_fd((Chardev *) child);
+    }
+    return 0;
+}
+
+void save_chardev_fds(void)
+{
+    object_child_foreach(get_chardevs_root(), chardev_is_socket, NULL);
+}
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 00589a6..80a9cf8 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -250,6 +250,8 @@ int qemu_chr_wait_connected(Chardev *chr, Error **errp);
     object_dynamic_cast(OBJECT(chr), TYPE_CHARDEV_RINGBUF)
 #define CHARDEV_IS_PTY(chr) \
     object_dynamic_cast(OBJECT(chr), TYPE_CHARDEV_PTY)
+#define CHARDEV_IS_SOCKET(chr) \
+    object_dynamic_cast(OBJECT(chr), TYPE_CHARDEV_SOCKET)
 
 typedef struct ChardevClass {
     ObjectClass parent_class;
@@ -290,4 +292,7 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
 /* console.c */
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
+void save_char_socket_fd(Chardev *);
+void load_char_socket_fd(Chardev *);
+
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 4dfc4ca..fa1a5c3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -27,6 +27,7 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
 void save_cpr_snapshot(const char *file, const char *mode, Error **errp);
 void load_cpr_snapshot(const char *file, Error **errp);
+void save_chardev_fds(void);
 
 extern int autostart;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 2902006..81f38c4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2700,6 +2700,12 @@ static QEMUFile *qf_file_open(const char *filename, int flags, int mode,
     return qemu_fopen_channel_input(ioc);
 }
 
+static int preserve_fd(const char *name, const char *val, void *handle)
+{
+    qemu_clr_cloexec(atoi(val));
+    return 0;
+}
+
 void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
 {
     int ret = 0;
@@ -2761,6 +2767,8 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         if (qemu_preserve_ram(errp)) {
             return;
         }
+        save_chardev_fds();
+        walkenv(FD_PREFIX, preserve_fd, 0);
         qemu_system_exec_request();
         putenv((char *)"QEMU_START_FREEZE=");
     }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 24/32] ui: save/restore vnc socket fds
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (22 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 23/32] char: save/restore chardev socket fds Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-31  9:06   ` Daniel P. Berrangé
  2020-07-30 15:14 ` [PATCH V1 25/32] char: save/restore chardev pty fds Steve Sistare
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Mark Kanda <mark.kanda@oracle.com>

Iterate through the VNC displays and save/restore the socket fds.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/sysemu.h |   2 +
 migration/savevm.c      |   3 +
 ui/vnc.c                | 153 +++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 130 insertions(+), 28 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index fa1a5c3..3e7bfee 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -28,6 +28,8 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify);
 void save_cpr_snapshot(const char *file, const char *mode, Error **errp);
 void load_cpr_snapshot(const char *file, Error **errp);
 void save_chardev_fds(void);
+void save_vnc_fds(void);
+void load_vnc_fds(void);
 
 extern int autostart;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 81f38c4..35fafb7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2768,6 +2768,7 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
             return;
         }
         save_chardev_fds();
+        save_vnc_fds();
         walkenv(FD_PREFIX, preserve_fd, 0);
         qemu_system_exec_request();
         putenv((char *)"QEMU_START_FREEZE=");
@@ -3015,6 +3016,8 @@ void load_cpr_snapshot(const char *file, Error **errp)
             start_on_wake = 1;
         }
     }
+
+    load_vnc_fds();
 }
 
 int load_snapshot(const char *name, Error **errp)
diff --git a/ui/vnc.c b/ui/vnc.c
index f006aa1..947ddf5 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -50,6 +50,7 @@
 #include "qom/object_interfaces.h"
 #include "qemu/cutils.h"
 #include "io/dns-resolver.h"
+#include "sysemu/sysemu.h"
 
 #define VNC_REFRESH_INTERVAL_BASE GUI_REFRESH_INTERVAL_DEFAULT
 #define VNC_REFRESH_INTERVAL_INC  50
@@ -2214,28 +2215,34 @@ static void set_pixel_format(VncState *vs, int bits_per_pixel,
     graphic_hw_update(vs->vd->dcl.con);
 }
 
-static void pixel_format_message (VncState *vs) {
+/*
+ * reuse - true if we are using an existing (already initialized)
+ * connection to a vnc client
+ */
+static void pixel_format_message(VncState *vs, bool reuse)
+{
     char pad[3] = { 0, 0, 0 };
 
     vs->client_pf = qemu_default_pixelformat(32);
 
-    vnc_write_u8(vs, vs->client_pf.bits_per_pixel); /* bits-per-pixel */
-    vnc_write_u8(vs, vs->client_pf.depth); /* depth */
+    if (!reuse) {
+        vnc_write_u8(vs, vs->client_pf.bits_per_pixel); /* bits-per-pixel */
+        vnc_write_u8(vs, vs->client_pf.depth); /* depth */
 
 #ifdef HOST_WORDS_BIGENDIAN
-    vnc_write_u8(vs, 1);             /* big-endian-flag */
+        vnc_write_u8(vs, 1);             /* big-endian-flag */
 #else
-    vnc_write_u8(vs, 0);             /* big-endian-flag */
+        vnc_write_u8(vs, 0);             /* big-endian-flag */
 #endif
-    vnc_write_u8(vs, 1);             /* true-color-flag */
-    vnc_write_u16(vs, vs->client_pf.rmax);     /* red-max */
-    vnc_write_u16(vs, vs->client_pf.gmax);     /* green-max */
-    vnc_write_u16(vs, vs->client_pf.bmax);     /* blue-max */
-    vnc_write_u8(vs, vs->client_pf.rshift);    /* red-shift */
-    vnc_write_u8(vs, vs->client_pf.gshift);    /* green-shift */
-    vnc_write_u8(vs, vs->client_pf.bshift);    /* blue-shift */
-    vnc_write(vs, pad, 3);           /* padding */
-
+        vnc_write_u8(vs, 1);             /* true-color-flag */
+        vnc_write_u16(vs, vs->client_pf.rmax);     /* red-max */
+        vnc_write_u16(vs, vs->client_pf.gmax);     /* green-max */
+        vnc_write_u16(vs, vs->client_pf.bmax);     /* blue-max */
+        vnc_write_u8(vs, vs->client_pf.rshift);    /* red-shift */
+        vnc_write_u8(vs, vs->client_pf.gshift);    /* green-shift */
+        vnc_write_u8(vs, vs->client_pf.bshift);    /* blue-shift */
+        vnc_write(vs, pad, 3);           /* padding */
+    }
     vnc_hextile_set_pixel_conversion(vs, 0);
     vs->write_pixels = vnc_write_pixels_copy;
 }
@@ -2252,7 +2259,7 @@ static void vnc_colordepth(VncState *vs)
                                pixman_image_get_width(vs->vd->server),
                                pixman_image_get_height(vs->vd->server),
                                VNC_ENCODING_WMVi);
-        pixel_format_message(vs);
+        pixel_format_message(vs, false);
         vnc_unlock_output(vs);
         vnc_flush(vs);
     } else {
@@ -2420,7 +2427,8 @@ static int protocol_client_msg(VncState *vs, uint8_t *data, size_t len)
     return 0;
 }
 
-static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
+static int protocol_client_init_base(VncState *vs, uint8_t *data, size_t len,
+                                     bool reuse)
 {
     char buf[1024];
     VncShareMode mode;
@@ -2495,10 +2503,11 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
            pixman_image_get_height(vs->vd->server) >= 0);
     vs->client_width = pixman_image_get_width(vs->vd->server);
     vs->client_height = pixman_image_get_height(vs->vd->server);
-    vnc_write_u16(vs, vs->client_width);
-    vnc_write_u16(vs, vs->client_height);
-
-    pixel_format_message(vs);
+    if (!reuse) {
+        vnc_write_u16(vs, vs->client_width);
+        vnc_write_u16(vs, vs->client_height);
+    }
+    pixel_format_message(vs, reuse);
 
     if (qemu_name) {
         size = snprintf(buf, sizeof(buf), "QEMU (%s)", qemu_name);
@@ -2509,9 +2518,11 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
         size = snprintf(buf, sizeof(buf), "QEMU");
     }
 
-    vnc_write_u32(vs, size);
-    vnc_write(vs, buf, size);
-    vnc_flush(vs);
+    if (!reuse) {
+        vnc_write_u32(vs, size);
+        vnc_write(vs, buf, size);
+        vnc_flush(vs);
+    }
 
     vnc_client_cache_auth(vs);
     vnc_qmp_event(vs, QAPI_EVENT_VNC_INITIALIZED);
@@ -2521,6 +2532,11 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
     return 0;
 }
 
+static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
+{
+    return protocol_client_init_base(vs, data, len, false);
+}
+
 void start_client_init(VncState *vs)
 {
     vnc_read_when(vs, protocol_client_init, 1);
@@ -3012,8 +3028,12 @@ static void vnc_refresh(DisplayChangeListener *dcl)
     }
 }
 
+/*
+ * reuse - true if we are using an existing (already initialized)
+ * connection to a vnc client
+ */
 static void vnc_connect(VncDisplay *vd, QIOChannelSocket *sioc,
-                        bool skipauth, bool websocket)
+                        bool skipauth, bool websocket, bool reuse)
 {
     VncState *vs = g_new0(VncState, 1);
     bool first_client = QTAILQ_EMPTY(&vd->clients);
@@ -3109,10 +3129,15 @@ static void vnc_connect(VncDisplay *vd, QIOChannelSocket *sioc,
 
     graphic_hw_update(vd->dcl.con);
 
-    if (!vs->websocket) {
+    if ((!vs->websocket) && !reuse) {
         vnc_start_protocol(vs);
     }
 
+    if (reuse) {
+        uint8_t data[1] = {0};
+        (void) protocol_client_init_base(vs, data, sizeof(data), true);
+    }
+
     if (vd->num_connecting > vd->connections_limit) {
         QTAILQ_FOREACH(vs, &vd->clients, next) {
             if (vs->share_mode == VNC_SHARE_MODE_CONNECTING) {
@@ -3143,7 +3168,7 @@ static void vnc_listen_io(QIONetListener *listener,
     qio_channel_set_name(QIO_CHANNEL(cioc),
                          isWebsock ? "vnc-ws-server" : "vnc-server");
     qio_channel_set_delay(QIO_CHANNEL(cioc), false);
-    vnc_connect(vd, cioc, false, isWebsock);
+    vnc_connect(vd, cioc, false, isWebsock, false);
 }
 
 static const DisplayChangeListenerOps dcl_ops = {
@@ -3733,7 +3758,7 @@ static int vnc_display_connect(VncDisplay *vd,
     if (qio_channel_socket_connect_sync(sioc, saddr[0], errp) < 0) {
         return -1;
     }
-    vnc_connect(vd, sioc, false, false);
+    vnc_connect(vd, sioc, false, false, false);
     object_unref(OBJECT(sioc));
     return 0;
 }
@@ -4057,7 +4082,7 @@ void vnc_display_add_client(const char *id, int csock, bool skipauth)
     sioc = qio_channel_socket_new_fd(csock, NULL);
     if (sioc) {
         qio_channel_set_name(QIO_CHANNEL(sioc), "vnc-server");
-        vnc_connect(vd, sioc, skipauth, false);
+        vnc_connect(vd, sioc, skipauth, false, false);
         object_unref(OBJECT(sioc));
     }
 }
@@ -4117,3 +4142,75 @@ static void vnc_register_config(void)
     qemu_add_opts(&qemu_vnc_opts);
 }
 opts_init(vnc_register_config);
+
+void save_vnc_fds(void)
+{
+    VncDisplay *vd;
+    VncState *vs;
+    int disp_num = 0;
+    char name[40];
+
+    QTAILQ_FOREACH(vd, &vnc_displays, next) {
+        QTAILQ_FOREACH(vs, &vd->clients, next) {
+            if (vs->sioc) {
+                snprintf(name, sizeof(name), "%s_%d", vs->sioc->parent.name,
+                         disp_num);
+                setenv_fd(name, vs->sioc->fd);
+                break;
+            }
+        }
+        disp_num++;
+    }
+}
+
+static void set_vnc_fd(char *name, QIOChannelSocket *cioc, VncDisplay *vd,
+                       bool isWebsock)
+{
+    VncState *vs;
+    QIOChannelSocket *sioc;
+
+    int fd = getenv_fd(name);
+    if (fd != -1) {
+        sioc = qio_channel_socket_accept(cioc, fd, NULL);
+        if (sioc) {
+            unsetenv_fd(name);
+            qio_channel_set_name(QIO_CHANNEL(sioc),
+                                 isWebsock ? "vnc-ws-server" : "vnc-server");
+
+            qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+            vnc_connect(vd, sioc, false, isWebsock, true);
+            object_unref(OBJECT(sioc));
+
+            /* force update on all clients */
+            QTAILQ_FOREACH(vs, &vd->clients, next) {
+                vs->update = VNC_STATE_UPDATE_FORCE;
+            }
+        } else {
+            error_printf("Could not restore vnc channel %s; "
+                     "client must reconnect.\n", name);
+        }
+    }
+}
+
+void load_vnc_fds(void)
+{
+    VncDisplay *vd;
+    QIOChannelSocket *cioc = NULL;
+    int disp_num = 0;
+    char name[40];
+
+    QTAILQ_FOREACH(vd, &vnc_displays, next) {
+        if (vd->listener) {
+            cioc = *vd->listener->sioc;
+            snprintf(name, sizeof(name), "vnc-server_%d", disp_num);
+            set_vnc_fd(name, cioc, vd, false);
+        }
+
+        if (vd->wslistener) {
+            cioc = *vd->wslistener->sioc;
+            snprintf(name, sizeof(name), "vnc-ws-server_%d", disp_num);
+            set_vnc_fd(name, cioc, vd, true);
+        }
+        disp_num++;
+    }
+}
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 25/32] char: save/restore chardev pty fds
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (23 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 24/32] ui: save/restore vnc " Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 26/32] monitor: save/restore QMP negotiation status Steve Sistare
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Save and restore pty descriptors across cprsave and cprload.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-pty.c     | 38 +++++++++++++++++++++++++++-----------
 chardev/char.c         |  2 ++
 include/chardev/char.h |  1 +
 3 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 1cc501a..0785429 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -30,6 +30,7 @@
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "qemu/cutils.h"
 #include "qemu/qemu-print.h"
 
 #include "chardev/char-io.h"
@@ -183,6 +184,16 @@ static void pty_chr_state(Chardev *chr, int connected)
     }
 }
 
+void save_char_pty_fd(Chardev *chr)
+{
+    PtyChardev *s = PTY_CHARDEV(chr);
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(s->ioc);
+
+    if (fioc) {
+        setenv_fd(chr->label, fioc->fd);
+    }
+}
+
 static void char_pty_finalize(Object *obj)
 {
     Chardev *chr = CHARDEV(obj);
@@ -204,18 +215,23 @@ static void char_pty_open(Chardev *chr,
     char pty_name[PATH_MAX];
     char *name;
 
-    master_fd = qemu_openpty_raw(&slave_fd, pty_name);
-    if (master_fd < 0) {
-        error_setg_errno(errp, errno, "Failed to create PTY");
-        return;
-    }
-
-    close(slave_fd);
-    qemu_set_nonblock(master_fd);
+    master_fd = getenv_fd(chr->label);
+    if (master_fd >= 0) {
+        unsetenv_fd(chr->label);
+        chr->filename = g_strdup_printf("pty:unknown");
+    } else {
+        master_fd = qemu_openpty_raw(&slave_fd, pty_name);
+        if (master_fd < 0) {
+            error_setg_errno(errp, errno, "Failed to create PTY");
+            return;
+        }
 
-    chr->filename = g_strdup_printf("pty:%s", pty_name);
-    qemu_printf("char device redirected to %s (label %s)\n",
-                pty_name, chr->label);
+        close(slave_fd);
+        qemu_set_nonblock(master_fd);
+        chr->filename = g_strdup_printf("pty:%s", pty_name);
+        qemu_printf("char device redirected to %s (label %s)\n",
+                    pty_name, chr->label);
+    }
 
     s = PTY_CHARDEV(chr);
     s->ioc = QIO_CHANNEL(qio_channel_file_new_fd(master_fd));
diff --git a/chardev/char.c b/chardev/char.c
index 8fd54cc..da75a04 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -1180,6 +1180,8 @@ static int chardev_is_socket(Object *child, void *opaque)
 {
     if (CHARDEV_IS_SOCKET(child)) {
         save_char_socket_fd((Chardev *) child);
+    } else if (CHARDEV_IS_PTY(child)) {
+        save_char_pty_fd((Chardev *) child);
     }
     return 0;
 }
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 80a9cf8..c18bda8 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -294,5 +294,6 @@ void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
 void save_char_socket_fd(Chardev *);
 void load_char_socket_fd(Chardev *);
+void save_char_pty_fd(Chardev *);
 
 #endif
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 26/32] monitor: save/restore QMP negotiation status
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (24 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 25/32] char: save/restore chardev pty fds Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 27/32] vhost: reset vhost devices upon cprsave Steve Sistare
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Mark Kanda <mark.kanda@oracle.com>

Save and restore QMP compatibility negotiation status across cprsave and
cprload.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/sysemu.h |  1 +
 migration/savevm.c      |  1 +
 monitor/qmp.c           | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3e7bfee..c5b2f24 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -30,6 +30,7 @@ void load_cpr_snapshot(const char *file, Error **errp);
 void save_chardev_fds(void);
 void save_vnc_fds(void);
 void load_vnc_fds(void);
+void save_qmp_negotiation_status(void);
 
 extern int autostart;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 35fafb7..225eaa6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2770,6 +2770,7 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         save_chardev_fds();
         save_vnc_fds();
         walkenv(FD_PREFIX, preserve_fd, 0);
+        save_qmp_negotiation_status();
         qemu_system_exec_request();
         putenv((char *)"QEMU_START_FREEZE=");
     }
diff --git a/monitor/qmp.c b/monitor/qmp.c
index d433cea..9944ce5 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -33,6 +33,8 @@
 #include "qapi/qmp/qlist.h"
 #include "qapi/qmp/qstring.h"
 #include "trace.h"
+#include "qemu/env.h"
+#include "sysemu/sysemu.h"
 
 struct QMPRequest {
     /* Owner of the request */
@@ -398,6 +400,21 @@ static void monitor_qmp_setup_handlers_bh(void *opaque)
     monitor_list_append(&mon->common);
 }
 
+static void setenv_qmp(const char *name, bool val)
+{
+    setenv_bool(name, val);
+}
+
+static bool getenv_qmp(const char *name)
+{
+    bool ret = getenv_bool(name);
+    if (ret != -1) {
+        unsetenv_bool(name);
+        return ret;
+    }
+    return false;
+}
+
 void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
 {
     MonitorQMP *mon = g_new0(MonitorQMP, 1);
@@ -438,4 +455,29 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
                                  NULL, &mon->common, NULL, true);
         monitor_list_append(&mon->common);
     }
+
+    /*
+     * If a chr->label qmp env var is true, this is a restored qmp
+     * connection with capabilities negotiated.
+     */
+    if (getenv_qmp(chr->label) == true) {
+        mon->commands = &qmp_commands;
+    }
+}
+
+void save_qmp_negotiation_status(void)
+{
+    Monitor *mon;
+    MonitorQMP *qmp_mon;
+
+    QTAILQ_FOREACH(mon, &mon_list, entry) {
+        if (!monitor_is_qmp(mon)) {
+            continue;
+        }
+
+        qmp_mon = container_of(mon, MonitorQMP, common);
+        if (qmp_mon->commands == &qmp_commands) {
+            setenv_qmp(mon->chr.chr->label, true);
+        }
+    }
 }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 27/32] vhost: reset vhost devices upon cprsave
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (25 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 26/32] monitor: save/restore QMP negotiation status Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 28/32] char: restore terminal on restart Steve Sistare
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Mark Kanda <mark.kanda@oracle.com>

A vhost device is implicitly preserved across re-exec because its fd is not
closed, and the value of the fd is specified on the command line for the
new qemu to find.  However, new qemu issues an VHOST_RESET_OWNER ioctl,
which fails because the device already has an owner.  To fix, reset the
owner prior to exec.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/virtio/vhost.c       | 12 ++++++++++++
 include/sysemu/sysemu.h |  1 +
 migration/savevm.c      |  1 +
 3 files changed, 14 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 1a1384e..d065b53 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -29,6 +29,7 @@
 #include "sysemu/dma.h"
 #include "sysemu/tcg.h"
 #include "trace.h"
+#include "sysemu/sysemu.h"
 
 /* enabled until disconnected backend stabilizes */
 #define _VHOST_DEBUG 1
@@ -1773,3 +1774,14 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -1;
 }
+
+void reset_vhost_devices(void)
+{
+    struct vhost_dev *dev;
+
+    QLIST_FOREACH(dev, &vhost_devices, entry) {
+        if (dev->vhost_ops->vhost_reset_device(dev) < 0) {
+            VHOST_OPS_DEBUG("vhost_reset_device failed");
+        }
+    }
+}
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c5b2f24..e19c15b 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -30,6 +30,7 @@ void load_cpr_snapshot(const char *file, Error **errp);
 void save_chardev_fds(void);
 void save_vnc_fds(void);
 void load_vnc_fds(void);
+void reset_vhost_devices(void);
 void save_qmp_negotiation_status(void);
 
 extern int autostart;
diff --git a/migration/savevm.c b/migration/savevm.c
index 225eaa6..732dfb5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2770,6 +2770,7 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         save_chardev_fds();
         save_vnc_fds();
         walkenv(FD_PREFIX, preserve_fd, 0);
+        reset_vhost_devices();
         save_qmp_negotiation_status();
         qemu_system_exec_request();
         putenv((char *)"QEMU_START_FREEZE=");
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 28/32] char: restore terminal on restart
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (26 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 27/32] vhost: reset vhost devices upon cprsave Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 29/32] pci: export pci_update_mappings Steve Sistare
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

If stdin is is a char backend device, then restore original stdin terminal
settings in before re-exec'ing.  Otherwise, the new qemu sees the modified
settings as initial settings, and does not restore the true initial settings
when it exits.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-stdio.c   | 7 +++++++
 include/chardev/char.h | 2 ++
 migration/savevm.c     | 2 ++
 3 files changed, 11 insertions(+)

diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c
index 82eaebc..6481d08 100644
--- a/chardev/char-stdio.c
+++ b/chardev/char-stdio.c
@@ -119,6 +119,13 @@ static void qemu_chr_open_stdio(Chardev *chr,
 }
 #endif
 
+void qemu_term_exit(void)
+{
+#ifndef _WIN32
+    term_exit();
+#endif
+}
+
 static void qemu_chr_parse_stdio(QemuOpts *opts, ChardevBackend *backend,
                                  Error **errp)
 {
diff --git a/include/chardev/char.h b/include/chardev/char.h
index c18bda8..5fd3ecc 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -296,4 +296,6 @@ void save_char_socket_fd(Chardev *);
 void load_char_socket_fd(Chardev *);
 void save_char_pty_fd(Chardev *);
 
+void qemu_term_exit(void);
+
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 732dfb5..881dc13 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -32,6 +32,7 @@
 #include "migration.h"
 #include "migration/snapshot.h"
 #include "migration/vmstate.h"
+#include "chardev/char.h"
 #include "migration/misc.h"
 #include "migration/register.h"
 #include "migration/global_state.h"
@@ -2772,6 +2773,7 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
         walkenv(FD_PREFIX, preserve_fd, 0);
         reset_vhost_devices();
         save_qmp_negotiation_status();
+        qemu_term_exit();
         qemu_system_exec_request();
         putenv((char *)"QEMU_START_FREEZE=");
     }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 29/32] pci: export pci_update_mappings
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (27 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 28/32] char: restore terminal on restart Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 30/32] vfio-pci: save and restore Steve Sistare
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Allow pci_update_mappings to be called from other modules.
No change in functionality.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/pci/pci.c         | 3 +--
 include/hw/pci/pci.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index de0fae1..7343e00 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -216,7 +216,6 @@ static const TypeInfo pcie_bus_info = {
 };
 
 static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num);
-static void pci_update_mappings(PCIDevice *d);
 static void pci_irq_handler(void *opaque, int irq_num, int level);
 static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, Error **);
 static void pci_del_option_rom(PCIDevice *pdev);
@@ -1316,7 +1315,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
     return new_addr;
 }
 
-static void pci_update_mappings(PCIDevice *d)
+void pci_update_mappings(PCIDevice *d)
 {
     PCIIORegion *r;
     int i;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c1bf7d5..bd07c86 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -865,5 +865,6 @@ extern const VMStateDescription vmstate_pci_device;
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+void pci_update_mappings(PCIDevice *d);
 
 #endif
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 30/32] vfio-pci: save and restore
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (28 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 29/32] pci: export pci_update_mappings Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-08-06 10:22   ` Jason Zeng
  2020-07-30 15:14 ` [PATCH V1 31/32] vfio-pci: trace pci config Steve Sistare
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable vfio-pci devices to be saved and restored across an exec restart
of qemu.

At vfio creation time, save the value of vfio container, group, and device
descriptors in the environment.

In cprsave, save the msi message area as part of vfio-pci vmstate, and
clear the close-on-exec flag for the vfio descriptors.  The flag is not
cleared earlier because the descriptors should not persist across misc
fork and exec calls that may be performed during normal operation.

On qemu restart, vfio_realize() finds the descriptor env vars, uses
the descriptors, and notes that the device is being reused.  Device and
iommu state is already configured, so operations in vfio_realize that
would modify the configuration are skipped for a reused device, including
vfio ioctl's and writes to PCI configuration space.  The result is that
vfio_realize constructs qemu data structures that reflect the current
state of the device.  However, the reconstruction is not complete until
cprload is called, and vfio_pci_post_load uses the msi data to rebuild
interrupt structures and attach the interrupts to the new KVM instance.
Lastly, vfio device reset is suppressed when the VM is started.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/pci/pci.c                  |  4 ++
 hw/vfio/common.c              | 99 ++++++++++++++++++++++++++++++++++---------
 hw/vfio/pci.c                 | 79 ++++++++++++++++++++++++++++++++--
 hw/vfio/platform.c            |  2 +-
 include/hw/pci/pci.h          |  1 +
 include/hw/vfio/vfio-common.h |  4 +-
 migration/savevm.c            |  2 +-
 7 files changed, 163 insertions(+), 28 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 7343e00..c2e1509 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -291,6 +291,10 @@ static void pci_do_device_reset(PCIDevice *dev)
 {
     int r;
 
+    if (dev->reused) {
+        return;
+    }
+
     pci_device_deassert_intx(dev);
     assert(dev->irq_state == 0);
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3335714..a51a093 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -37,6 +37,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qemu/cutils.h"
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -299,6 +300,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
         .size = size,
     };
 
+    if (container->reused) {
+        return 0;
+    }
+
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         /*
          * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -336,6 +341,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
         .size = size,
     };
 
+    if (container->reused) {
+        return 0;
+    }
+
     if (!readonly) {
         map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
     }
@@ -1179,25 +1188,27 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
         return iommu_type;
     }
 
-    ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
-    if (ret) {
-        error_setg_errno(errp, errno, "Failed to set group container");
-        return -errno;
-    }
+    if (!container->reused) {
+        ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
+        if (ret) {
+            error_setg_errno(errp, errno, "Failed to set group container");
+            return -errno;
+        }
 
-    while (ioctl(container->fd, VFIO_SET_IOMMU, iommu_type)) {
-        if (iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
-            /*
-             * On sPAPR, despite the IOMMU subdriver always advertises v1 and
-             * v2, the running platform may not support v2 and there is no
-             * way to guess it until an IOMMU group gets added to the container.
-             * So in case it fails with v2, try v1 as a fallback.
-             */
-            iommu_type = VFIO_SPAPR_TCE_IOMMU;
-            continue;
+        while (ioctl(container->fd, VFIO_SET_IOMMU, iommu_type)) {
+            if (iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+                /*
+                 * On sPAPR, despite the IOMMU subdriver always advertises v1
+                 * and v2, the running platform may not support v2 and there is
+                 * no way to guess it until an IOMMU group gets added to the
+                 * container. So in case it fails with v2, try v1 as a fallback.
+                 */
+                iommu_type = VFIO_SPAPR_TCE_IOMMU;
+                continue;
+            }
+            error_setg_errno(errp, errno, "Failed to set iommu for container");
+            return -errno;
         }
-        error_setg_errno(errp, errno, "Failed to set iommu for container");
-        return -errno;
     }
 
     container->iommu_type = iommu_type;
@@ -1210,6 +1221,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     VFIOContainer *container;
     int ret, fd;
     VFIOAddressSpace *space;
+    char name[40];
+    bool reused;
 
     space = vfio_get_address_space(as);
 
@@ -1254,7 +1267,13 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         }
     }
 
-    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    snprintf(name, sizeof(name), "vfio_container_%d", group->groupid);
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+    if (fd < 0) {
+        fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    }
+
     if (fd < 0) {
         error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio");
         ret = -errno;
@@ -1272,6 +1291,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->space = space;
     container->fd = fd;
+    container->cid = group->groupid;
+    container->reused = reused;
     container->error = NULL;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->hostwin_list);
@@ -1395,6 +1416,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     container->initialized = true;
 
+    if (!reused) {
+        setenv_fd(name, fd);
+    }
+
     return 0;
 listener_release_exit:
     QLIST_REMOVE(group, container_next);
@@ -1418,6 +1443,7 @@ put_space_exit:
 static void vfio_disconnect_container(VFIOGroup *group)
 {
     VFIOContainer *container = group->container;
+    char name[40];
 
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
@@ -1450,6 +1476,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
         }
 
         trace_vfio_disconnect_container(container->fd);
+        snprintf(name, sizeof(name), "vfio_container_%d", container->cid);
+        unsetenv_fd(name);
         close(container->fd);
         g_free(container);
 
@@ -1462,6 +1490,7 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     VFIOGroup *group;
     char path[32];
     struct vfio_group_status status = { .argsz = sizeof(status) };
+    bool reused;
 
     QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
@@ -1479,7 +1508,13 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     group = g_malloc0(sizeof(*group));
 
     snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open(path, O_RDWR);
+
+    group->fd = getenv_fd(path);
+    reused = (group->fd >= 0);
+    if (group->fd < 0) {
+        group->fd = qemu_open(path, O_RDWR);
+    }
+
     if (group->fd < 0) {
         error_setg_errno(errp, errno, "failed to open %s", path);
         goto free_group_exit;
@@ -1513,6 +1548,10 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 
     QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
+    if (!reused) {
+        setenv_fd(path, group->fd);
+    }
+
     return group;
 
 close_fd_exit:
@@ -1526,6 +1565,8 @@ free_group_exit:
 
 void vfio_put_group(VFIOGroup *group)
 {
+    char path[32];
+
     if (!group || !QLIST_EMPTY(&group->device_list)) {
         return;
     }
@@ -1537,6 +1578,8 @@ void vfio_put_group(VFIOGroup *group)
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
     trace_vfio_put_group(group->fd);
+    snprintf(path, sizeof(path), "/dev/vfio/%d", group->groupid);
+    unsetenv_fd(path);
     close(group->fd);
     g_free(group);
 
@@ -1546,12 +1589,18 @@ void vfio_put_group(VFIOGroup *group)
 }
 
 int vfio_get_device(VFIOGroup *group, const char *name,
-                    VFIODevice *vbasedev, Error **errp)
+                    VFIODevice *vbasedev, bool *reusedp, Error **errp)
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     int ret, fd;
+    bool reused;
+
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+    if (fd < 0) {
+        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    }
 
-    fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
     if (fd < 0) {
         error_setg_errno(errp, errno, "error getting device from group %d",
                          group->groupid);
@@ -1601,6 +1650,13 @@ int vfio_get_device(VFIOGroup *group, const char *name,
                           dev_info.num_irqs);
 
     vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    if (!reused) {
+        setenv_fd(name, fd);
+    }
+    if (reusedp) {
+        *reusedp = reused;
+    }
     return 0;
 }
 
@@ -1612,6 +1668,7 @@ void vfio_put_base_device(VFIODevice *vbasedev)
     QLIST_REMOVE(vbasedev, next);
     vbasedev->group = NULL;
     trace_vfio_put_base_device(vbasedev->fd);
+    unsetenv_fd(vbasedev->name);
     close(vbasedev->fd);
 }
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2e561c0..5743807 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -49,6 +49,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static const VMStateDescription vfio_pci_vmstate;
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1585,6 +1586,14 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
     }
 }
 
+static void vfio_config_sync(VFIOPCIDevice *vdev, uint32_t offset, size_t len)
+{
+    if (pread(vdev->vbasedev.fd, vdev->pdev.config + offset, len,
+          vdev->config_offset + offset) != len) {
+        error_report("vfio_config_sync pread failed");
+    }
+}
+
 static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
@@ -1626,6 +1635,7 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
     char *name;
+    PCIDevice *pdev = &vdev->pdev;
 
     if (!bar->size) {
         return;
@@ -1646,6 +1656,9 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
     }
 
     pci_register_bar(&vdev->pdev, nr, bar->type, bar->mr);
+    if (pdev->reused) {
+        vfio_config_sync(vdev, pci_bar(pdev, nr), 8);
+    }
 }
 
 static void vfio_bars_register(VFIOPCIDevice *vdev)
@@ -2805,7 +2818,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         goto error;
     }
 
-    ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev, errp);
+    ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev,
+                          &pdev->reused, errp);
     if (ret) {
         vfio_put_group(group);
         goto error;
@@ -2972,9 +2986,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                              vfio_intx_routing_notifier);
         vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
         kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
-        ret = vfio_intx_enable(vdev, errp);
-        if (ret) {
-            goto out_deregister;
+        if (!pdev->reused) {
+            ret = vfio_intx_enable(vdev, errp);
+            if (ret) {
+                goto out_deregister;
+            }
         }
     }
 
@@ -3017,6 +3033,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    vfio_config_sync(vdev, pdev->msix_cap + PCI_MSIX_FLAGS, 2);
+    if (pdev->reused) {
+        pci_update_mappings(pdev);
+    }
+
     return;
 
 out_deregister:
@@ -3080,6 +3101,10 @@ static void vfio_pci_reset(DeviceState *dev)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(dev);
 
+    if (vdev->pdev.reused) {
+        return;
+    }
+
     trace_vfio_pci_reset(vdev->vbasedev.name);
 
     vfio_pci_pre_reset(vdev);
@@ -3182,6 +3207,51 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static int vfio_pci_post_load(void *opaque, int version_id)
+{
+    int vector;
+    MSIMessage msg;
+    Error *err = 0;
+    VFIOPCIDevice *vdev = opaque;
+    PCIDevice *pdev = &vdev->pdev;
+
+    if (msix_enabled(pdev)) {
+        vfio_msix_enable(vdev);
+        pdev->msix_function_masked = false;
+
+        for (vector = 0; vector < vdev->pdev.msix_entries_nr; vector++) {
+            if (!msix_is_masked(pdev, vector)) {
+                msg = msix_get_message(pdev, vector);
+                vfio_msix_vector_use(pdev, vector, msg);
+            }
+        }
+
+    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+        vfio_intx_enable(vdev, &err);
+        if (err) {
+            error_report_err(err);
+        }
+    }
+
+    vdev->vbasedev.group->container->reused = false;
+    vdev->pdev.reused = false;
+
+    return 0;
+}
+
+static const VMStateDescription vfio_pci_vmstate = {
+    .name = "vfio-pci",
+    .unmigratable = 1,
+    .mode_mask = VMS_RESTART,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .post_load = vfio_pci_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_MSIX(pdev, VFIOPCIDevice),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3189,6 +3259,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 
     dc->reset = vfio_pci_reset;
     device_class_set_props(dc, vfio_pci_dev_properties);
+    dc->vmsd = &vfio_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index ac2cefc..e6e1a5d 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -592,7 +592,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
             return -EBUSY;
         }
     }
-    ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
+    ret = vfio_get_device(group, vbasedev->name, vbasedev, 0, errp);
     if (ret) {
         vfio_put_group(group);
         return ret;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index bd07c86..c926a24 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -358,6 +358,7 @@ struct PCIDevice {
 
     /* ID of standby device in net_failover pair */
     char *failover_pair_id;
+    bool reused;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c78f3ff..4e2a332 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -73,6 +73,8 @@ typedef struct VFIOContainer {
     unsigned iommu_type;
     Error *error;
     bool initialized;
+    bool reused;
+    int cid;
     unsigned long pgsizes;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
@@ -177,7 +179,7 @@ void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
-                    VFIODevice *vbasedev, Error **errp);
+                    VFIODevice *vbasedev, bool *reused, Error **errp);
 
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
diff --git a/migration/savevm.c b/migration/savevm.c
index 881dc13..2606cf0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1568,7 +1568,7 @@ static int qemu_savevm_state(QEMUFile *f, VMStateMode mode, Error **errp)
         return -EINVAL;
     }
 
-    if (migrate_use_block()) {
+    if ((mode & (VMS_SNAPSHOT | VMS_MIGRATE)) && migrate_use_block()) {
         error_setg(errp, "Block migration and snapshots are incompatible");
         return -EINVAL;
     }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 31/32] vfio-pci: trace pci config
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (29 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 30/32] vfio-pci: save and restore Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 15:14 ` [PATCH V1 32/32] vfio-pci: improved tracing Steve Sistare
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add new trace points trace_vfio_pci_config and trace_vfio_msix_table to dump
PCI config space and MSI data.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/pci.c        | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/trace-events |  2 ++
 2 files changed, 101 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5743807..f72e277 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2715,6 +2715,90 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+/* To limit output, trace only this many bytes of config. */
+#define CONFIG_LEN 512
+
+static void vfio_dump_config(const char *name, int fd, off_t offset)
+{
+    int i, j, n, config[CONFIG_LEN / 4];
+    char buf[128];
+    const char *fmt;
+    char *ptr = buf;
+    int *v = config;
+    int len = sizeof(buf) - 1;
+
+#ifdef CONFIG_TRACE_DTRACE
+    if (!QEMU_VFIO_PCI_CONFIG_ENABLED()) {
+        return;
+    }
+#endif
+
+    if (pread(fd, &config, sizeof(config), offset) < 0) {
+        perror("pread");
+        return;
+    }
+
+    trace_vfio_pci_config(name);
+
+    for (i = 0; i < CONFIG_LEN; i += 32, v += 8) {
+        n = snprintf(buf, len, "+%3d:", i);
+        ptr += n;
+        len -= n;
+        for (j = 0; j < 8; j++) {
+            fmt = v[j] ?  " %08x" : " %8x";
+            n = snprintf(ptr, len, fmt, v[j]);
+            ptr += n;
+            len -= n;
+        }
+        *ptr = 0;   /* terminate in case of truncation above */
+        trace_vfio_pci_config(buf);
+    }
+}
+
+static void vfio_dump_config_vdev(VFIOPCIDevice *vdev)
+{
+    vfio_dump_config(vdev->vbasedev.name, vdev->vbasedev.fd,
+                     vdev->config_offset);
+}
+
+static void vfio_dump_msix_vdev(VFIOPCIDevice *vdev)
+{
+    int i;
+    int *ptr = (int *) vdev->pdev.msix_table;
+
+    for (i = 0; i < vdev->pdev.msix_entries_nr; i++, ptr += 4) {
+        trace_vfio_msix_table(vdev->vbasedev.name, i,
+                              ptr[0], ptr[1], ptr[2], ptr[3]);
+    }
+}
+
+static void vfio_diff_config(VFIOPCIDevice *vdev)
+{
+    int i;
+    unsigned char config[CONFIG_LEN];
+    int n = sizeof(config);
+    unsigned char *c1 = (unsigned char *)config;
+    unsigned char *c2 = (unsigned char *)vdev->pdev.config;
+    char buf[128];
+
+#ifdef CONFIG_TRACE_DTRACE
+    if (!QEMU_VFIO_PCI_CONFIG_ENABLED()) {
+        return;
+    }
+#endif
+
+    if (pread(vdev->vbasedev.fd, &config, n, vdev->config_offset) != n) {
+        error_report("vfio_diff_config pread failed");
+    }
+    for (i = 0; i < CONFIG_LEN; i++) {
+        if (c1[i] != c2[i]) {
+            snprintf(buf, sizeof(buf),
+                     "config mismatch at %d: %x vs %x", i, c1[i], c2[i]);
+            trace_vfio_pci_config(buf);
+        }
+    }
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3037,6 +3121,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     if (pdev->reused) {
         pci_update_mappings(pdev);
     }
+    vfio_diff_config(vdev);
+    vfio_dump_config_vdev(vdev);
+    vfio_dump_msix_vdev(vdev);
 
     return;
 
@@ -3207,6 +3294,15 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static int vfio_pci_pre_save(void *opaque)
+{
+    VFIOPCIDevice *vdev = opaque;
+
+    vfio_dump_config_vdev(vdev);
+    vfio_dump_msix_vdev(vdev);
+    return 0;
+}
+
 static int vfio_pci_post_load(void *opaque, int version_id)
 {
     int vector;
@@ -3226,6 +3322,8 @@ static int vfio_pci_post_load(void *opaque, int version_id)
             }
         }
 
+        vfio_dump_msix_vdev(vdev);
+
     } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
         vfio_intx_enable(vdev, &err);
         if (err) {
@@ -3246,6 +3344,7 @@ static const VMStateDescription vfio_pci_vmstate = {
     .version_id = 0,
     .minimum_version_id = 0,
     .post_load = vfio_pci_post_load,
+    .pre_save = vfio_pci_pre_save,
     .fields = (VMStateField[]) {
         VMSTATE_MSIX(pdev, VFIOPCIDevice),
         VMSTATE_END_OF_LIST()
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b1ef55a..10d899c 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -47,6 +47,8 @@ vfio_pci_emulated_vendor_id(const char *name, uint16_t val) "%s 0x%04x"
 vfio_pci_emulated_device_id(const char *name, uint16_t val) "%s 0x%04x"
 vfio_pci_emulated_sub_vendor_id(const char *name, uint16_t val) "%s 0x%04x"
 vfio_pci_emulated_sub_device_id(const char *name, uint16_t val) "%s 0x%04x"
+vfio_msix_table(const char *name, int index, int x0, int x1, int x2, int x3) "%s MSI-X[%d] = { %x %x %x %x }"
+vfio_pci_config(const char *buf) "%s"
 
 # pci-quirks.c
 vfio_quirk_rom_blacklisted(const char *name, uint16_t vid, uint16_t did) "%s %04x:%04x"
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH V1 32/32] vfio-pci: improved tracing
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (30 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 31/32] vfio-pci: trace pci config Steve Sistare
@ 2020-07-30 15:14 ` Steve Sistare
  2020-07-30 16:52 ` [PATCH V1 00/32] Live Update Daniel P. Berrangé
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 66+ messages in thread
From: Steve Sistare @ 2020-07-30 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Steve Sistare, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Print more info for existing trace points:
  trace_kvm_irqchip_add_msi_route.
  trace_pci_update_mappings_del
  trace_pci_update_mappings_add

Add new trace points:
  trace_kvm_irqchip_assign_irqfd
  trace_msix_table_mmio_write
  trace_vfio_dma_unmap
  trace_vfio_dma_map
  trace_vfio_region
  trace_vfio_descriptors
  trace_ram_block_add

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 accel/kvm/kvm-all.c    |  8 ++++++--
 accel/kvm/trace-events |  3 ++-
 exec.c                 |  3 +++
 hw/pci/msix.c          |  1 +
 hw/pci/pci.c           | 10 ++++++----
 hw/pci/trace-events    |  5 +++--
 hw/vfio/common.c       | 16 +++++++++++++++-
 hw/vfio/pci.c          |  1 +
 hw/vfio/trace-events   |  9 ++++++---
 trace-events           |  2 ++
 10 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 63ef6af..5511ea7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -46,6 +46,7 @@
 #include "sysemu/reset.h"
 
 #include "hw/boards.h"
+#include "trace-root.h"
 
 /* This check must be after config-host.h is included */
 #ifdef CONFIG_EVENTFD
@@ -1670,7 +1671,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
     }
 
     trace_kvm_irqchip_add_msi_route(dev ? dev->name : (char *)"N/A",
-                                    vector, virq);
+                                    vector, virq, msg.address, msg.data);
 
     kvm_add_routing_entry(s, &kroute);
     kvm_arch_add_msi_route_post(&kroute, vector, dev);
@@ -1717,6 +1718,7 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, EventNotifier *event,
 {
     int fd = event_notifier_get_fd(event);
     int rfd = resample ? event_notifier_get_fd(resample) : -1;
+    int ret;
 
     struct kvm_irqfd irqfd = {
         .fd = fd,
@@ -1758,7 +1760,9 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, EventNotifier *event,
         return -ENOSYS;
     }
 
-    return kvm_vm_ioctl(s, KVM_IRQFD, &irqfd);
+    ret = kvm_vm_ioctl(s, KVM_IRQFD, &irqfd);
+    trace_kvm_irqchip_assign_irqfd(fd, virq, rfd, ret);
+    return ret;
 }
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index a68eb66..67a01e6 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -9,7 +9,8 @@ kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, type 0x%x, arg %p"
 kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve ONEREG %" PRIu64 " from KVM: %s"
 kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set ONEREG %" PRIu64 " to KVM: %s"
 kvm_irqchip_commit_routes(void) ""
-kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
+kvm_irqchip_add_msi_route(char *name, int vector, int virq, uint64_t addr, uint32_t data) "%s, vector %d, virq %d, msg {addr 0x%"PRIx64", data 0x%x}"
+kvm_irqchip_assign_irqfd(int fd, int virq, int rfd, int status) "(fd=%d, virq=%d, rfd=%d) KVM_IRQFD returns %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
diff --git a/exec.c b/exec.c
index 5473c09..dd99ee0 100644
--- a/exec.c
+++ b/exec.c
@@ -2319,6 +2319,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
         }
         ram_block_notify_add(new_block->host, new_block->max_length);
     }
+    trace_ram_block_add(new_block->host, new_block->max_length,
+                        memory_region_name(new_block->mr),
+                        new_block->mr->readonly ? "ro" : "rw");
 }
 
 #ifdef CONFIG_POSIX
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 67e34f3..65a2882 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -189,6 +189,7 @@ static void msix_table_mmio_write(void *opaque, hwaddr addr,
     int vector = addr / PCI_MSIX_ENTRY_SIZE;
     bool was_masked;
 
+    trace_msix_table_mmio_write(dev->name, addr, val, size);
     was_masked = msix_is_masked(dev, vector);
     pci_set_long(dev->msix_table + addr, val);
     msix_handle_mask_update(dev, vector, was_masked);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c2e1509..6142411 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1324,9 +1324,11 @@ void pci_update_mappings(PCIDevice *d)
     PCIIORegion *r;
     int i;
     pcibus_t new_addr;
+    const char *name;
 
     for(i = 0; i < PCI_NUM_REGIONS; i++) {
         r = &d->io_regions[i];
+        name = r->memory ? r->memory->name : "";
 
         /* this region isn't registered */
         if (!r->size)
@@ -1340,18 +1342,18 @@ void pci_update_mappings(PCIDevice *d)
 
         /* now do the real mapping */
         if (r->addr != PCI_BAR_UNMAPPED) {
-            trace_pci_update_mappings_del(d, pci_dev_bus_num(d),
+            trace_pci_update_mappings_del(d->name, pci_dev_bus_num(d),
                                           PCI_SLOT(d->devfn),
                                           PCI_FUNC(d->devfn),
-                                          i, r->addr, r->size);
+                                          i, r->addr, r->size, name);
             memory_region_del_subregion(r->address_space, r->memory);
         }
         r->addr = new_addr;
         if (r->addr != PCI_BAR_UNMAPPED) {
-            trace_pci_update_mappings_add(d, pci_dev_bus_num(d),
+            trace_pci_update_mappings_add(d->name, pci_dev_bus_num(d),
                                           PCI_SLOT(d->devfn),
                                           PCI_FUNC(d->devfn),
-                                          i, r->addr, r->size);
+                                          i, r->addr, r->size, name);
             memory_region_add_subregion_overlap(r->address_space,
                                                 r->addr, r->memory, 1);
         }
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index def4b39..6dd7015 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -1,8 +1,8 @@
 # See docs/devel/tracing.txt for syntax documentation.
 
 # pci.c
-pci_update_mappings_del(void *d, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,0x%"PRIx64"+0x%"PRIx64
-pci_update_mappings_add(void *d, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,0x%"PRIx64"+0x%"PRIx64
+pci_update_mappings_del(const char *dname, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size, const char *name) "%s %02x:%02x.%x [%d] 0x%"PRIx64", 0x%"PRIx64"B \"%s\""
+pci_update_mappings_add(const char *dname, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size, const char *name) "%s %02x:%02x.%x [%d] 0x%"PRIx64", 0x%"PRIx64"B \"%s\""
 
 # pci_host.c
 pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
@@ -10,3 +10,4 @@ pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, uns
 
 # msix.c
 msix_write_config(char *name, bool enabled, bool masked) "dev %s enabled %d masked %d"
+msix_table_mmio_write(char *name, uint64_t addr, uint64_t val, unsigned size)  "(%s, @%"PRId64", 0x%"PRIx64", %dB)"
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a51a093..23c8bf3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -304,6 +304,8 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return 0;
     }
 
+    trace_vfio_dma_unmap(container->fd, iova, size);
+
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         /*
          * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -327,6 +329,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
+    if (unmap.size != size) {
+        error_printf("warn: VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
+                     iova, size, unmap.size);
+    }
+
     return 0;
 }
 
@@ -345,6 +352,9 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
         return 0;
     }
 
+    trace_vfio_dma_map(container->fd, iova, size, vaddr,
+                       (readonly ? "r" : "rw"));
+
     if (!readonly) {
         map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
     }
@@ -985,7 +995,8 @@ int vfio_region_mmap(VFIORegion *region)
         trace_vfio_region_mmap(memory_region_name(&region->mmaps[i].mem),
                                region->mmaps[i].offset,
                                region->mmaps[i].offset +
-                               region->mmaps[i].size - 1);
+                               region->mmaps[i].size - 1,
+                               region->mmaps[i].mmap);
     }
 
     return 0;
@@ -1696,6 +1707,9 @@ retry:
         goto retry;
     }
 
+    trace_vfio_region(vbasedev->name, index, (*info)->offset, (*info)->size,
+                      (*info)->cap_offset, (*info)->flags);
+
     return 0;
 }
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f72e277..d74e078 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/blocker.h"
+#include "trace-root.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)    OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 10d899c..83cd0a6 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -25,7 +25,7 @@ vfio_pci_size_rom(const char *name, int size) "%s ROM size 0x%x"
 vfio_vga_write(uint64_t addr, uint64_t data, int size) " (0x%"PRIx64", 0x%"PRIx64", %d)"
 vfio_vga_read(uint64_t addr, int size, uint64_t data) " (0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s, @0x%x, len=0x%x) 0x%x"
-vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
+vfio_pci_write_config(const char *name, int addr, int val, int len) "(%s, @0x%x, 0x%x, 0x%xB)"
 vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
 vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
 vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
@@ -37,7 +37,7 @@ vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int
 vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %s"
 vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_get_irq_info_failure(const char *errstr) "VFIO_DEVICE_GET_IRQ_INFO failure: %s"
-vfio_realize(const char *name, int group_id) " (%s) group %d"
+vfio_realize(const char *name, int group_id) "(%s) group %d"
 vfio_mdev(const char *name, bool is_mdev) " (%s) is_mdev %d"
 vfio_add_ext_cap_dropped(const char *name, uint16_t cap, uint16_t offset) "%s 0x%x@0x%x"
 vfio_pci_reset(const char *name) " (%s)"
@@ -109,7 +109,7 @@ vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions,
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 vfio_region_setup(const char *dev, int index, const char *name, unsigned long flags, unsigned long offset, unsigned long size) "Device %s, region %d \"%s\", flags: 0x%lx, offset: 0x%lx, size: 0x%lx"
 vfio_region_mmap_fault(const char *name, int index, unsigned long offset, unsigned long size, int fault) "Region %s mmaps[%d], [0x%lx - 0x%lx], fault: %d"
-vfio_region_mmap(const char *name, unsigned long offset, unsigned long end) "Region %s [0x%lx - 0x%lx]"
+vfio_region_mmap(const char *name, unsigned long offset, unsigned long end, void *addr) "%s [0x%lx - 0x%lx] maps to %p"
 vfio_region_exit(const char *name, int index) "Device %s, region %d"
 vfio_region_finalize(const char *name, int index) "Device %s, region %d"
 vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps enabled: %d"
@@ -117,6 +117,9 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_dma_unmap(int fd, uint64_t iova, uint64_t size) "fd %d, iova 0x%"PRIx64", len 0x%"PRIx64
+vfio_dma_map(int fd, uint64_t iova, uint64_t size, void *addr, const char *access) "fd %d, iova 0x%"PRIx64", len 0x%"PRIx64", va %p, %s"
+vfio_region(const char *name, int index, uint64_t offset, uint64_t size, int cap_offset, int flags) "%s [%d]: +0x%"PRIx64", 0x%"PRIx64"B, cap +0x%x, flags 0x%x"
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
diff --git a/trace-events b/trace-events
index 42107eb..98589a4 100644
--- a/trace-events
+++ b/trace-events
@@ -107,6 +107,8 @@ qmp_job_complete(void *job) "job %p"
 qmp_job_finalize(void *job) "job %p"
 qmp_job_dismiss(void *job) "job %p"
 
+# exec.c
+ram_block_add(void *host, uint64_t maxlen, const char *name, const char *mode) "host=%p, maxlen=0x%"PRIx64", mr = {name=%s, %s}"
 
 ### Guest events, keep at bottom
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 03/32] savevm: QMP command for cprsave
  2020-07-30 15:14 ` [PATCH V1 03/32] savevm: QMP command for cprsave Steve Sistare
@ 2020-07-30 16:12   ` Eric Blake
  2020-07-30 17:52     ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Eric Blake @ 2020-07-30 16:12 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/20 10:14 AM, Steve Sistare wrote:
> To enable live reboot, provide the cprsave QMP command and the VMS_REBOOT
> vmstate-saving operation, which saves the state of the virtual machine in a
> simple file.
> 
> Syntax:
>    {'command':'cprsave', 'data':{'file':'str', 'mode':'str'}}
> 
>    The mode argument must be 'reboot'.  Additional modes will be defined in
>    the future.
> 

Focusing on just the UI:

> +++ b/qapi/migration.json
> @@ -1621,3 +1621,17 @@
>   ##
>   { 'event': 'UNPLUG_PRIMARY',
>     'data': { 'device-id': 'str' } }
> +
> +##
> +# @cprsave:
> +#
> +# Create a checkpoint of the virtual machine device state in @file.
> +# Guest RAM and guest block device blocks are not saved.
> +#
> +# @file: name of checkpoint file

Since you used qemu_open() in the code, this can include a 
'/dev/fdset/NNN' magic name for saving into a previously-passed-in file 
descriptor instead of directly opening a local file name.  That's a good 
thing, but I don't know if it needs explicit mention in the docs.

> +# @mode: 'reboot' : checkpoint can be cprload'ed after a host kexec reboot.
> +#
> +# Since 5.0

5.2 (you've missed 5.0 by a long shot, and even 5.1 is too late now).

> +##
> +{ 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'str' } }

'mode' should be an enum type, rather than an open-coded string:

{ 'enum': 'CprMode', 'data': ['reboot'] }
{ 'command': 'cprsave', 'data': {'file': 'str', 'mode': 'CprMode' } }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 05/32] savevm: QMP command for cprload
  2020-07-30 15:14 ` [PATCH V1 05/32] savevm: QMP command for cprload Steve Sistare
@ 2020-07-30 16:14   ` Eric Blake
  2020-07-30 18:00     ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Eric Blake @ 2020-07-30 16:14 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/20 10:14 AM, Steve Sistare wrote:
> Provide the cprload QMP command.  The VM is created from the file produced
> by the cprsave command.  Guest RAM is restored in-place from the shared
> memory backend file, and guest block devices are used as is.  The contents
> of such devices must not be modified between the cprsave and cprload
> operations.  If the VM was running at cprsave time, then VM execution
> resumes.

Is it always wise to unconditionally resume, or might this command need 
an additional optional knob that says what state (paused or running) to 
move into?

> 
> Syntax:
>    {'command':'cprload', 'data':{'file':'str'}}
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
> ---

> +++ b/qapi/migration.json
> @@ -1635,3 +1635,14 @@
>   ##
>   { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'str' } }
>   
> +##
> +# @cprload:
> +#
> +# Start virtual machine from checkpoint file that was created earlier using
> +# the cprsave command.
> +#
> +# @file: name of checkpoint file
> +#
> +# Since 5.0

another 5.2 instance. I'll quit pointing it out for the rest of the series.

> +##
> +{ 'command': 'cprload', 'data': { 'file': 'str' } }
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 660537a..8478778 100644

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 07/32] savevm: QMP command for cprinfo
  2020-07-30 15:14 ` [PATCH V1 07/32] savevm: QMP command for cprinfo Steve Sistare
@ 2020-07-30 16:17   ` Eric Blake
  2020-07-30 18:02     ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Eric Blake @ 2020-07-30 16:17 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/20 10:14 AM, Steve Sistare wrote:
> Provide the cprinfo QMP command.  This returns a string with a space-
> separated list of modes supported by cprsave, and can be used by clients
> as a feature test to check if the running QEMU instance supports cprsave.

When you've already got array support in the QMP language, why are you 
making the user parse a string into an array after the fact?

> 
> Syntax:
>    {'command':'cprinfo', 'returns':'str'}
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---

> +++ b/qapi/migration.json
> @@ -1623,6 +1623,15 @@
>     'data': { 'device-id': 'str' } }
>   
>   ##
> +# @cprinfo:
> +#
> +# Return a space-delimited list of modes supported by the cprsave command
> +#
> +# Since 5.0
> +##
> +{ 'command': 'cprinfo', 'returns': 'str' }

Returning a 'str' is non-extensible.  The fact that you had to edit the 
whitelist is proof that you should have done something better.  I recommend:

{ 'command': 'cprinfo', 'returns': { 'modes': [ 'CprMode' ] }

using the CprMode enum I proposed earlier.

> +
> +##
>   # @cprsave:
>   #
>   # Create a checkpoint of the virtual machine device state in @file.
> diff --git a/qapi/pragma.json b/qapi/pragma.json
> index cffae27..43bdb39 100644
> --- a/qapi/pragma.json
> +++ b/qapi/pragma.json
> @@ -5,6 +5,7 @@
>   { 'pragma': {
>       # Commands allowed to return a non-dictionary:
>       'returns-whitelist': [
> +        'cprinfo',

This should not be needed.  Design the return value correctly in the 
first place.

>           'human-monitor-command',
>           'qom-get',
>           'query-migrate-cache-size',
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-30 15:14 ` [PATCH V1 12/32] vl: pause option Steve Sistare
@ 2020-07-30 16:20   ` Eric Blake
  2020-07-30 18:11     ` Steven Sistare
  2020-07-30 17:03   ` Alex Bennée
  1 sibling, 1 reply; 66+ messages in thread
From: Eric Blake @ 2020-07-30 16:20 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/20 10:14 AM, Steve Sistare wrote:
> Provide the -pause command-line parameter and the QEMU_PAUSE environment
> variable to briefly pause QEMU in main and allow a developer to attach gdb.
> Useful when the developer does not invoke QEMU directly, such as when using
> libvirt.

How would you set this option with libvirt?

It feels like you are trying to reinvent something that is already 
well-documented:

https://www.berrange.com/posts/2011/10/12/debugging-early-startup-of-kvm-with-gdb-when-launched-by-libvirtd/

> 
> Usage:
>    qemu -pause <seconds>
>    or
>    export QEMU_PAUSE=<seconds>
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>   qemu-options.hx |  9 +++++++++
>   softmmu/vl.c    | 15 ++++++++++++++-
>   2 files changed, 23 insertions(+), 1 deletion(-)

> @@ -3204,6 +3211,12 @@ void qemu_init(int argc, char **argv, char **envp)
>               case QEMU_OPTION_gdb:
>                   add_device_config(DEV_GDB, optarg);
>                   break;
> +            case QEMU_OPTION_pause:
> +                seconds = atoi(optarg);

atoi() cannot detect overflow.  You should never use it in robust 
parsing of untrusted input.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart
  2020-07-30 15:14 ` [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart Steve Sistare
@ 2020-07-30 16:22   ` Eric Blake
  2020-07-30 18:14     ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Eric Blake @ 2020-07-30 16:22 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/20 10:14 AM, Steve Sistare wrote:
> Add the VMS_RESTART variant of vmstate, for use when upgrading qemu in place
> on the same host without a reboot.  Invoke it using:
>    cprsave <filename> restart
> 
> VMS_RESTART supports guest ram mapped by private anonymous memory, versus
> VMS_REBOOT which requires that guest ram be mapped by persistent shared
> memory.  Subsequent patches complete its implementation.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---

> +++ b/qapi/migration.json
> @@ -1639,6 +1639,7 @@
>   #
>   # @file: name of checkpoint file
>   # @mode: 'reboot' : checkpoint can be cprload'ed after a host kexec reboot.
> +#        'restart': checkpoint can be cprload'ed after restarting qemu.

This should be a modification to an enum type (the 'CprMode' type I 
suggested earlier in the series).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (31 preceding siblings ...)
  2020-07-30 15:14 ` [PATCH V1 32/32] vfio-pci: improved tracing Steve Sistare
@ 2020-07-30 16:52 ` Daniel P. Berrangé
  2020-07-30 18:48   ` Steven Sistare
  2020-07-30 17:15 ` Paolo Bonzini
                   ` (2 subsequent siblings)
  35 siblings, 1 reply; 66+ messages in thread
From: Daniel P. Berrangé @ 2020-07-30 16:52 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Markus Armbruster, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée, Dr. David Alan Gilbert

On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> Improve and extend the qemu functions that save and restore VM state so a
> guest may be suspended and resumed with minimal pause time.  qemu may be
> updated to a new version in between.
> 
> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.
> 
> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.
> 
>    1 savevm: add vmstate handler iterators
>    2 savevm: VM handlers mode mask
>    3 savevm: QMP command for cprsave
>    4 savevm: HMP Command for cprsave
>    5 savevm: QMP command for cprload
>    6 savevm: HMP Command for cprload
>    7 savevm: QMP command for cprinfo
>    8 savevm: HMP command for cprinfo
>    9 savevm: prevent cprsave if memory is volatile
>   10 kvmclock: restore paused KVM clock
>   11 cpu: disable ticks when suspended
>   12 vl: pause option
>   13 gdbstub: gdb support for suspended state
> 
> The next patches add a restart method that eliminates the persistent memory
> constraint, and allows qemu to be updated across the restart, but does not
> allow host reboot.  Anonymous memory segments used by the guest are
> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> 
>   14 savevm: VMS_RESTART and cprsave restart
>   15 vl: QEMU_START_FREEZE env var
>   16 oslib: add qemu_clr_cloexec
>   17 util: env var helpers
>   18 osdep: import MADV_DOEXEC
>   19 memory: ram_block_add cosmetic changes
>   20 vl: add helper to request re-exec
>   21 exec, memory: exec(3) to restart
>   22 char: qio_channel_socket_accept reuse fd
>   23 char: save/restore chardev socket fds
>   24 ui: save/restore vnc socket fds
>   25 char: save/restore chardev pty fds

Keeping FDs open across re-exec is a nice trick, but how are you dealing
with the state associated with them, most especially the TLS encryption
state ? AFAIK, there's no way to serialize/deserialize the TLS state that
GNUTLS maintains, and the patches don't show any sign of dealing with
this. IOW it looks like while the FD will be preserved, any TLS session
running on it will fail.

I'm going to presume that you're probably just considering the TLS features
out of scope for your patch series.  It would be useful if you have any
info about this and other things you've considered out of scope for this
patch series.

I'm not seeing anything in the block layer about preserving open FDs, so
I presume you're just letting the block layer close and then re-open any
FDs it has ?  This would have the side effect that any locks held on the
FDs are lost, so there's a potential race condition where another process
could acquire the lock and prevent the re-exec completing. That said this
is unavoidable, because Linux kernel is completely broken wrt keeping
fnctl() locks held across a re-exec, always throwing away the locks if
more than 1 thread is running [1].

Regards,
Daniel

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1552621
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-30 15:14 ` [PATCH V1 12/32] vl: pause option Steve Sistare
  2020-07-30 16:20   ` Eric Blake
@ 2020-07-30 17:03   ` Alex Bennée
  2020-07-30 18:14     ` Steven Sistare
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2020-07-30 17:03 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé


Steve Sistare <steven.sistare@oracle.com> writes:

> Provide the -pause command-line parameter and the QEMU_PAUSE environment
> variable to briefly pause QEMU in main and allow a developer to attach gdb.
> Useful when the developer does not invoke QEMU directly, such as when using
> libvirt.

How does this differ from -S?

>
> Usage:
>   qemu -pause <seconds>
>   or
>   export QEMU_PAUSE=<seconds>
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  qemu-options.hx |  9 +++++++++
>  softmmu/vl.c    | 15 ++++++++++++++-
>  2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 708583b..8505cf2 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3668,6 +3668,15 @@ SRST
>      option is experimental.
>  ERST
>  
> +DEF("pause", HAS_ARG, QEMU_OPTION_pause, \
> +    "-pause secs    Pause for secs seconds on entry to main.\n", QEMU_ARCH_ALL)
> +
> +SRST
> +``--pause secs``
> +    Pause for a number of seconds on entry to main.  Useful for attaching
> +    a debugger after QEMU has been launched by some other entity.
> +ERST
> +

It seems like having an option to race with the debugger is just asking
for trouble.

>  DEF("S", 0, QEMU_OPTION_S, \
>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>      QEMU_ARCH_ALL)
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 8478778..951994f 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2844,7 +2844,7 @@ static void create_default_memdev(MachineState *ms, const char *path)
>  
>  void qemu_init(int argc, char **argv, char **envp)
>  {
> -    int i;
> +    int i, seconds;
>      int snapshot, linux_boot;
>      const char *initrd_filename;
>      const char *kernel_filename, *kernel_cmdline;
> @@ -2882,6 +2882,13 @@ void qemu_init(int argc, char **argv, char **envp)
>      QemuPluginList plugin_list = QTAILQ_HEAD_INITIALIZER(plugin_list);
>      int mem_prealloc = 0; /* force preallocation of physical target memory */
>  
> +    if (getenv("QEMU_PAUSE")) {
> +        seconds = atoi(getenv("QEMU_PAUSE"));
> +        printf("Pausing %d seconds for debugger. QEMU PID is %d\n",
> +               seconds, getpid());
> +        sleep(seconds);
> +    }
> +
>      os_set_line_buffering();
>  
>      error_init(argv[0]);
> @@ -3204,6 +3211,12 @@ void qemu_init(int argc, char **argv, char **envp)
>              case QEMU_OPTION_gdb:
>                  add_device_config(DEV_GDB, optarg);
>                  break;
> +            case QEMU_OPTION_pause:
> +                seconds = atoi(optarg);
> +                printf("Pausing %d seconds for debugger. QEMU PID is %d\n",
> +                            seconds, getpid());
> +                sleep(seconds);
> +                break;
>              case QEMU_OPTION_L:
>                  if (is_help_option(optarg)) {
>                      list_data_dirs = true;


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (32 preceding siblings ...)
  2020-07-30 16:52 ` [PATCH V1 00/32] Live Update Daniel P. Berrangé
@ 2020-07-30 17:15 ` Paolo Bonzini
  2020-07-30 19:09   ` Steven Sistare
  2020-07-30 17:49 ` Dr. David Alan Gilbert
  2020-08-04 18:18 ` Steven Sistare
  35 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2020-07-30 17:15 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin,
	Philippe Mathieu-Daudé,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Alex Bennée

On 30/07/20 17:14, Steve Sistare wrote:
> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.

Stupid question, what does cpr stand for?  If it is checkpoint/restore,
please spell it out.  Also, how does the functionality compare to
xen-save-devices-state and xen-load-devices-state?

> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.

This probably should be allowed even for regular migration.  Can you
generalize the code as a separate series?

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (33 preceding siblings ...)
  2020-07-30 17:15 ` Paolo Bonzini
@ 2020-07-30 17:49 ` Dr. David Alan Gilbert
  2020-07-30 19:31   ` Steven Sistare
  2020-08-04 18:18 ` Steven Sistare
  35 siblings, 1 reply; 66+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-30 17:49 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, qemu-devel, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

* Steve Sistare (steven.sistare@oracle.com) wrote:
> Improve and extend the qemu functions that save and restore VM state so a
> guest may be suspended and resumed with minimal pause time.  qemu may be
> updated to a new version in between.

Nice.

> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.

can I ask why you don't just add a migration flag to skip the devices
you don't want, and then do a migrate to a file?
(i.e. migrate "exec:cat > afile")
We already have the 'x-ignore-shared' capability that's used for doing
RAM snapshots of VMs; primarily I think for being able to start a VM
from a RAM snapshot as a fast VM start trick.
(There's also a xen_save_devices that does something similar).
If you backed the RAM as you say, enabled x-ignore-shared and then did:

   migrate "exec:cat > afile"

and restarted the destination with:

    migrate_incoming "exec:cat afile"

what is different (except the later stuff about the vfio magic and
chardevs).

Dave

> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.
> 
>    1 savevm: add vmstate handler iterators
>    2 savevm: VM handlers mode mask
>    3 savevm: QMP command for cprsave
>    4 savevm: HMP Command for cprsave
>    5 savevm: QMP command for cprload
>    6 savevm: HMP Command for cprload
>    7 savevm: QMP command for cprinfo
>    8 savevm: HMP command for cprinfo
>    9 savevm: prevent cprsave if memory is volatile
>   10 kvmclock: restore paused KVM clock
>   11 cpu: disable ticks when suspended
>   12 vl: pause option
>   13 gdbstub: gdb support for suspended state
> 
> The next patches add a restart method that eliminates the persistent memory
> constraint, and allows qemu to be updated across the restart, but does not
> allow host reboot.  Anonymous memory segments used by the guest are
> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> 
>   14 savevm: VMS_RESTART and cprsave restart
>   15 vl: QEMU_START_FREEZE env var
>   16 oslib: add qemu_clr_cloexec
>   17 util: env var helpers
>   18 osdep: import MADV_DOEXEC
>   19 memory: ram_block_add cosmetic changes
>   20 vl: add helper to request re-exec
>   21 exec, memory: exec(3) to restart
>   22 char: qio_channel_socket_accept reuse fd
>   23 char: save/restore chardev socket fds
>   24 ui: save/restore vnc socket fds
>   25 char: save/restore chardev pty fds
>   26 monitor: save/restore QMP negotiation status
>   27 vhost: reset vhost devices upon cprsave
>   28 char: restore terminal on restart
> 
> The next patches extend the restart method to save and restore vfio-pci
> state, eliminating the requirement for a guest agent.  The vfio container,
> group, and device descriptors are preserved across the qemu re-exec.
> 
>   29 pci: export pci_update_mappings
>   30 vfio-pci: save and restore
>   31 vfio-pci: trace pci config
>   32 vfio-pci: improved tracing
> 
> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
> "cprload restart".  The software update is performed while the guest is
> running to minimize downtime.
> 
> window 1				| window 2
> 					|
> # qemu-system-x86_64 ... 		|
> QEMU 4.2.0 monitor - type 'help' ...	|
> (qemu) info status			|
> VM status: running			|
> 					| # yum update qemu
> (qemu) cprsave /tmp/qemu.sav restart	|
> QEMU 4.2.1 monitor - type 'help' ...	|
> (qemu) info status			|
> VM status: paused (prelaunch)		|
> (qemu) cprload /tmp/qemu.sav		|
> (qemu) info status			|
> VM status: running			|
> 
> 
> Here is an example of updating the host kernel using "cprload reboot"
> 
> window 1					| window 2
> 						|
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...		|
> (qemu) info status				|
> VM status: running				|
> 						| # yum update kernel-uek
> (qemu) cprsave /tmp/qemu.sav restart		|
> 						|
> # systemctl kexec				|
> kexec_core: Starting new kernel			|
> ...						|
> 						|
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...		|
> (qemu) info status				|
> VM status: paused (prelaunch)			|
> (qemu) cprload /tmp/qemu.sav			|
> (qemu) info status				|
> VM status: running				|
> 
> 
> Mark Kanda (5):
>   char: qio_channel_socket_accept reuse fd
>   char: save/restore chardev socket fds
>   ui: save/restore vnc socket fds
>   monitor: save/restore QMP negotiation status
>   vhost: reset vhost devices upon cprsave
> 
> Steve Sistare (27):
>   savevm: add vmstate handler iterators
>   savevm: VM handlers mode mask
>   savevm: QMP command for cprsave
>   savevm: HMP Command for cprsave
>   savevm: QMP command for cprload
>   savevm: HMP Command for cprload
>   savevm: QMP command for cprinfo
>   savevm: HMP command for cprinfo
>   savevm: prevent cprsave if memory is volatile
>   kvmclock: restore paused KVM clock
>   cpu: disable ticks when suspended
>   vl: pause option
>   gdbstub: gdb support for suspended state
>   savevm: VMS_RESTART and cprsave restart
>   vl: QEMU_START_FREEZE env var
>   oslib: add qemu_clr_cloexec
>   util: env var helpers
>   osdep: import MADV_DOEXEC
>   memory: ram_block_add cosmetic changes
>   vl: add helper to request re-exec
>   exec, memory: exec(3) to restart
>   char: save/restore chardev pty fds
>   char: restore terminal on restart
>   pci: export pci_update_mappings
>   vfio-pci: save and restore
>   vfio-pci: trace pci config
>   vfio-pci: improved tracing
> 
>  MAINTAINERS                    |   7 ++
>  accel/kvm/kvm-all.c            |   8 +-
>  accel/kvm/trace-events         |   3 +-
>  chardev/char-pty.c             |  38 +++++--
>  chardev/char-socket.c          |  35 ++++++
>  chardev/char-stdio.c           |   7 ++
>  chardev/char.c                 |  16 +++
>  exec.c                         |  88 +++++++++++++--
>  gdbstub.c                      |  11 +-
>  hmp-commands.hx                |  46 ++++++++
>  hw/i386/kvm/clock.c            |   6 +-
>  hw/pci/msix.c                  |   1 +
>  hw/pci/pci.c                   |  17 +--
>  hw/pci/trace-events            |   5 +-
>  hw/vfio/common.c               | 115 ++++++++++++++++----
>  hw/vfio/pci.c                  | 179 ++++++++++++++++++++++++++++++-
>  hw/vfio/platform.c             |   2 +-
>  hw/vfio/trace-events           |  11 +-
>  hw/virtio/vhost.c              |  12 +++
>  include/chardev/char.h         |   8 ++
>  include/exec/memory.h          |   4 +
>  include/hw/pci/pci.h           |   2 +
>  include/hw/vfio/vfio-common.h  |   4 +-
>  include/io/channel-socket.h    |   3 +-
>  include/migration/register.h   |   3 +
>  include/migration/vmstate.h    |  11 ++
>  include/monitor/hmp.h          |   3 +
>  include/qemu/cutils.h          |   1 +
>  include/qemu/env.h             |  31 ++++++
>  include/qemu/osdep.h           |   8 ++
>  include/sysemu/sysemu.h        |  10 ++
>  io/channel-socket.c            |  12 ++-
>  io/net-listener.c              |   4 +-
>  migration/block.c              |   1 +
>  migration/migration.c          |   4 +-
>  migration/ram.c                |   1 +
>  migration/savevm.c             | 237 ++++++++++++++++++++++++++++++++++++-----
>  migration/savevm.h             |   4 +-
>  monitor/hmp-cmds.c             |  28 +++++
>  monitor/qmp-cmds.c             |  16 +++
>  monitor/qmp.c                  |  42 ++++++++
>  qapi/migration.json            |  35 ++++++
>  qapi/pragma.json               |   1 +
>  qemu-options.hx                |   9 ++
>  scsi/qemu-pr-helper.c          |   2 +-
>  softmmu/vl.c                   |  65 ++++++++++-
>  tests/qtest/tpm-emu.c          |   2 +-
>  tests/test-char.c              |   2 +-
>  tests/test-io-channel-socket.c |   4 +-
>  trace-events                   |   2 +
>  ui/vnc.c                       | 153 +++++++++++++++++++++-----
>  util/Makefile.objs             |   2 +-
>  util/env.c                     | 132 +++++++++++++++++++++++
>  util/oslib-posix.c             |   9 ++
>  util/oslib-win32.c             |   4 +
>  55 files changed, 1331 insertions(+), 135 deletions(-)
>  create mode 100644 include/qemu/env.h
>  create mode 100644 util/env.c
> 
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 03/32] savevm: QMP command for cprsave
  2020-07-30 16:12   ` Eric Blake
@ 2020-07-30 17:52     ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 17:52 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/2020 12:12 PM, Eric Blake wrote:
> On 7/30/20 10:14 AM, Steve Sistare wrote:
>> To enable live reboot, provide the cprsave QMP command and the VMS_REBOOT
>> vmstate-saving operation, which saves the state of the virtual machine in a
>> simple file.
>>
>> Syntax:
>>    {'command':'cprsave', 'data':{'file':'str', 'mode':'str'}}
>>
>>    The mode argument must be 'reboot'.  Additional modes will be defined in
>>    the future.
>>
> 
> Focusing on just the UI:
> 
>> +++ b/qapi/migration.json
>> @@ -1621,3 +1621,17 @@
>>   ##
>>   { 'event': 'UNPLUG_PRIMARY',
>>     'data': { 'device-id': 'str' } }
>> +
>> +##
>> +# @cprsave:
>> +#
>> +# Create a checkpoint of the virtual machine device state in @file.
>> +# Guest RAM and guest block device blocks are not saved.
>> +#
>> +# @file: name of checkpoint file
> 
> Since you used qemu_open() in the code, this can include a '/dev/fdset/NNN' magic name for saving into a previously-passed-in file descriptor instead of directly opening a local file name.  That's a good thing, but I don't know if it needs explicit mention in the docs.

OK, I'll look for other uses of file and fdset in the docs and see if it fits naturally here.

>> +# @mode: 'reboot' : checkpoint can be cprload'ed after a host kexec reboot.
>> +#
>> +# Since 5.0
> 
> 5.2 (you've missed 5.0 by a long shot, and even 5.1 is too late now).

Yup!  Will fix here and in the other patches, thanks.

>> +##
>> +{ 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'str' } }
> 
> 'mode' should be an enum type, rather than an open-coded string:
> 
> { 'enum': 'CprMode', 'data': ['reboot'] }
> { 'command': 'cprsave', 'data': {'file': 'str', 'mode': 'CprMode' } }

Will do, thanks for the syntax.

- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 05/32] savevm: QMP command for cprload
  2020-07-30 16:14   ` Eric Blake
@ 2020-07-30 18:00     ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 18:00 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/2020 12:14 PM, Eric Blake wrote:
> On 7/30/20 10:14 AM, Steve Sistare wrote:
>> Provide the cprload QMP command.  The VM is created from the file produced
>> by the cprsave command.  Guest RAM is restored in-place from the shared
>> memory backend file, and guest block devices are used as is.  The contents
>> of such devices must not be modified between the cprsave and cprload
>> operations.  If the VM was running at cprsave time, then VM execution
>> resumes.
> 
> Is it always wise to unconditionally resume, or might this command need an additional optional knob that says what state (paused or running) to move into?

This can already be done.  Issue a stop command before cprsave, then cprload will finish in a
paused state.

Also, cprsave re-execs and leaves the guest in a paused state.  One can

send device add commands, then send cprload which continues
.

>> Syntax:
>>    {'command':'cprload', 'data':{'file':'str'}}
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
>> ---
> 
>> +++ b/qapi/migration.json
>> @@ -1635,3 +1635,14 @@
>>   ##
>>   { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'str' } }
>>   +##
>> +# @cprload:
>> +#
>> +# Start virtual machine from checkpoint file that was created earlier using
>> +# the cprsave command.
>> +#
>> +# @file: name of checkpoint file
>> +#
>> +# Since 5.0
> 
> another 5.2 instance. I'll quit pointing it out for the rest of the series.

Will find and fix all, thanks.

- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 07/32] savevm: QMP command for cprinfo
  2020-07-30 16:17   ` Eric Blake
@ 2020-07-30 18:02     ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 18:02 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/2020 12:17 PM, Eric Blake wrote:
> On 7/30/20 10:14 AM, Steve Sistare wrote:
>> Provide the cprinfo QMP command.  This returns a string with a space-
>> separated list of modes supported by cprsave, and can be used by clients
>> as a feature test to check if the running QEMU instance supports cprsave.
> 
> When you've already got array support in the QMP language, why are you making the user parse a string into an array after the fact?

Will fix as you suggest, thanks.  I had HMP on the brain - Steve

>> Syntax:
>>    {'command':'cprinfo', 'returns':'str'}
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
> 
>> +++ b/qapi/migration.json
>> @@ -1623,6 +1623,15 @@
>>     'data': { 'device-id': 'str' } }
>>     ##
>> +# @cprinfo:
>> +#
>> +# Return a space-delimited list of modes supported by the cprsave command
>> +#
>> +# Since 5.0
>> +##
>> +{ 'command': 'cprinfo', 'returns': 'str' }
> 
> Returning a 'str' is non-extensible.  The fact that you had to edit the whitelist is proof that you should have done something better.  I recommend:
> 
> { 'command': 'cprinfo', 'returns': { 'modes': [ 'CprMode' ] }
> 
> using the CprMode enum I proposed earlier.
> 
>> +
>> +##
>>   # @cprsave:
>>   #
>>   # Create a checkpoint of the virtual machine device state in @file.
>> diff --git a/qapi/pragma.json b/qapi/pragma.json
>> index cffae27..43bdb39 100644
>> --- a/qapi/pragma.json
>> +++ b/qapi/pragma.json
>> @@ -5,6 +5,7 @@
>>   { 'pragma': {
>>       # Commands allowed to return a non-dictionary:
>>       'returns-whitelist': [
>> +        'cprinfo',
> 
> This should not be needed.  Design the return value correctly in the first place.
> 
>>           'human-monitor-command',
>>           'qom-get',
>>           'query-migrate-cache-size',
>>
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-30 16:20   ` Eric Blake
@ 2020-07-30 18:11     ` Steven Sistare
  2020-07-31 10:07       ` Daniel P. Berrangé
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 18:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/2020 12:20 PM, Eric Blake wrote:
> On 7/30/20 10:14 AM, Steve Sistare wrote:
>> Provide the -pause command-line parameter and the QEMU_PAUSE environment
>> variable to briefly pause QEMU in main and allow a developer to attach gdb.
>> Useful when the developer does not invoke QEMU directly, such as when using
>> libvirt.
> 
> How would you set this option with libvirt?

Add -pause in the qemu args in the xml.
 
> It feels like you are trying to reinvent something that is already well-documented:
> 
> https://www.berrange.com/posts/2011/10/12/debugging-early-startup-of-kvm-with-gdb-when-launched-by-libvirtd/

Too many steps to reach BINGO for my taste.  Easier is better.  Also, in our shop we start qemu 
in other ways, such as via services.

These new hooks helped me and my colleagues, and I hope others may also find them useful, 
but if not then we drop them.

>> Usage:
>>    qemu -pause <seconds>
>>    or
>>    export QEMU_PAUSE=<seconds>
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   qemu-options.hx |  9 +++++++++
>>   softmmu/vl.c    | 15 ++++++++++++++-
>>   2 files changed, 23 insertions(+), 1 deletion(-)
> 
>> @@ -3204,6 +3211,12 @@ void qemu_init(int argc, char **argv, char **envp)
>>               case QEMU_OPTION_gdb:
>>                   add_device_config(DEV_GDB, optarg);
>>                   break;
>> +            case QEMU_OPTION_pause:
>> +                seconds = atoi(optarg);
> 
> atoi() cannot detect overflow.  You should never use it in robust parsing of untrusted input.

OK.

- Steve





^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-30 17:03   ` Alex Bennée
@ 2020-07-30 18:14     ` Steven Sistare
  2020-07-31  9:44       ` Alex Bennée
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 18:14 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé

On 7/30/2020 1:03 PM, Alex Bennée wrote:
> 
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> Provide the -pause command-line parameter and the QEMU_PAUSE environment
>> variable to briefly pause QEMU in main and allow a developer to attach gdb.
>> Useful when the developer does not invoke QEMU directly, such as when using
>> libvirt.
> 
> How does this differ from -S?

The -S flag runs qemu to the main loop but does not start the guest.  Lots of code
that you may need to debug runs before you get there.

- Steve
>> Usage:
>>   qemu -pause <seconds>
>>   or
>>   export QEMU_PAUSE=<seconds>
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  qemu-options.hx |  9 +++++++++
>>  softmmu/vl.c    | 15 ++++++++++++++-
>>  2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 708583b..8505cf2 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -3668,6 +3668,15 @@ SRST
>>      option is experimental.
>>  ERST
>>  
>> +DEF("pause", HAS_ARG, QEMU_OPTION_pause, \
>> +    "-pause secs    Pause for secs seconds on entry to main.\n", QEMU_ARCH_ALL)
>> +
>> +SRST
>> +``--pause secs``
>> +    Pause for a number of seconds on entry to main.  Useful for attaching
>> +    a debugger after QEMU has been launched by some other entity.
>> +ERST
>> +
> 
> It seems like having an option to race with the debugger is just asking
> for trouble.
> 
>>  DEF("S", 0, QEMU_OPTION_S, \
>>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>>      QEMU_ARCH_ALL)
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 8478778..951994f 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -2844,7 +2844,7 @@ static void create_default_memdev(MachineState *ms, const char *path)
>>  
>>  void qemu_init(int argc, char **argv, char **envp)
>>  {
>> -    int i;
>> +    int i, seconds;
>>      int snapshot, linux_boot;
>>      const char *initrd_filename;
>>      const char *kernel_filename, *kernel_cmdline;
>> @@ -2882,6 +2882,13 @@ void qemu_init(int argc, char **argv, char **envp)
>>      QemuPluginList plugin_list = QTAILQ_HEAD_INITIALIZER(plugin_list);
>>      int mem_prealloc = 0; /* force preallocation of physical target memory */
>>  
>> +    if (getenv("QEMU_PAUSE")) {
>> +        seconds = atoi(getenv("QEMU_PAUSE"));
>> +        printf("Pausing %d seconds for debugger. QEMU PID is %d\n",
>> +               seconds, getpid());
>> +        sleep(seconds);
>> +    }
>> +
>>      os_set_line_buffering();
>>  
>>      error_init(argv[0]);
>> @@ -3204,6 +3211,12 @@ void qemu_init(int argc, char **argv, char **envp)
>>              case QEMU_OPTION_gdb:
>>                  add_device_config(DEV_GDB, optarg);
>>                  break;
>> +            case QEMU_OPTION_pause:
>> +                seconds = atoi(optarg);
>> +                printf("Pausing %d seconds for debugger. QEMU PID is %d\n",
>> +                            seconds, getpid());
>> +                sleep(seconds);
>> +                break;
>>              case QEMU_OPTION_L:
>>                  if (is_help_option(optarg)) {
>>                      list_data_dirs = true;
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart
  2020-07-30 16:22   ` Eric Blake
@ 2020-07-30 18:14     ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 18:14 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée

On 7/30/2020 12:22 PM, Eric Blake wrote:
> On 7/30/20 10:14 AM, Steve Sistare wrote:
>> Add the VMS_RESTART variant of vmstate, for use when upgrading qemu in place
>> on the same host without a reboot.  Invoke it using:
>>    cprsave <filename> restart
>>
>> VMS_RESTART supports guest ram mapped by private anonymous memory, versus
>> VMS_REBOOT which requires that guest ram be mapped by persistent shared
>> memory.  Subsequent patches complete its implementation.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
> 
>> +++ b/qapi/migration.json
>> @@ -1639,6 +1639,7 @@
>>   #
>>   # @file: name of checkpoint file
>>   # @mode: 'reboot' : checkpoint can be cprload'ed after a host kexec reboot.
>> +#        'restart': checkpoint can be cprload'ed after restarting qemu.
> 
> This should be a modification to an enum type (the 'CprMode' type I suggested earlier in the series).

Will do - steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 16:52 ` [PATCH V1 00/32] Live Update Daniel P. Berrangé
@ 2020-07-30 18:48   ` Steven Sistare
  2020-07-31  8:53     ` Daniel P. Berrangé
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 18:48 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Markus Armbruster, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée, Dr. David Alan Gilbert

On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>> Improve and extend the qemu functions that save and restore VM state so a
>> guest may be suspended and resumed with minimal pause time.  qemu may be
>> updated to a new version in between.
>>
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
>>
>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
>>
>>    1 savevm: add vmstate handler iterators
>>    2 savevm: VM handlers mode mask
>>    3 savevm: QMP command for cprsave
>>    4 savevm: HMP Command for cprsave
>>    5 savevm: QMP command for cprload
>>    6 savevm: HMP Command for cprload
>>    7 savevm: QMP command for cprinfo
>>    8 savevm: HMP command for cprinfo
>>    9 savevm: prevent cprsave if memory is volatile
>>   10 kvmclock: restore paused KVM clock
>>   11 cpu: disable ticks when suspended
>>   12 vl: pause option
>>   13 gdbstub: gdb support for suspended state
>>
>> The next patches add a restart method that eliminates the persistent memory
>> constraint, and allows qemu to be updated across the restart, but does not
>> allow host reboot.  Anonymous memory segments used by the guest are
>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
>>
>>   14 savevm: VMS_RESTART and cprsave restart
>>   15 vl: QEMU_START_FREEZE env var
>>   16 oslib: add qemu_clr_cloexec
>>   17 util: env var helpers
>>   18 osdep: import MADV_DOEXEC
>>   19 memory: ram_block_add cosmetic changes
>>   20 vl: add helper to request re-exec
>>   21 exec, memory: exec(3) to restart
>>   22 char: qio_channel_socket_accept reuse fd
>>   23 char: save/restore chardev socket fds
>>   24 ui: save/restore vnc socket fds
>>   25 char: save/restore chardev pty fds
> 
> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> with the state associated with them, most especially the TLS encryption
> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> GNUTLS maintains, and the patches don't show any sign of dealing with
> this. IOW it looks like while the FD will be preserved, any TLS session
> running on it will fail.

I had not considered TLS.  If a non-qemu library maintains connection state, then
we won't be able to support it for live update until the library provides interfaces
to serialize the state.

For qemu objects, so far vmstate has been adequate to represent the devices with
descriptors that we preserve.

> I'm going to presume that you're probably just considering the TLS features
> out of scope for your patch series.  It would be useful if you have any
> info about this and other things you've considered out of scope for this
> patch series.

The descriptors covered in these patches are needed for our use case.  I realize
there are others that could perhaps be preserved, but we have not tried them.
Those descriptors are closed on exec as usual, and are reopened after exec. I
expect that we or others will support more over time.

> I'm not seeing anything in the block layer about preserving open FDs, so
> I presume you're just letting the block layer close and then re-open any
> FDs it has ?  

Correct.

> This would have the side effect that any locks held on the
> FDs are lost, so there's a potential race condition where another process
> could acquire the lock and prevent the re-exec completing. That said this
> is unavoidable, because Linux kernel is completely broken wrt keeping
> fnctl() locks held across a re-exec, always throwing away the locks if
> more than 1 thread is running [1].

Ouch.

- Steve

> 
> Regards,
> Daniel
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1552621
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 17:15 ` Paolo Bonzini
@ 2020-07-30 19:09   ` Steven Sistare
  2020-07-30 21:39     ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 19:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin,
	Philippe Mathieu-Daudé,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Alex Bennée

On 7/30/2020 1:15 PM, Paolo Bonzini wrote:
> On 30/07/20 17:14, Steve Sistare wrote:
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
> 
> Stupid question, what does cpr stand for?  If it is checkpoint/restore,

Checkpoint/restart.  An acronym from my HPC days.  I will spell it out.

> please spell it out.  Also, how does the functionality compare to
> xen-save-devices-state and xen-load-devices-state?

qmp_xen_save_devices_state serializes device state to a file which is loaded 
on the target for a live migration.  It performs some of the same actions
as cprsave/cprload but does not support live update-in-place.

>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
> 
> This probably should be allowed even for regular migration.  Can you
> generalize the code as a separate series?

Maybe.  I think that would be a distinct patch that ignores the vfio migration blocker 
if the state is suspended.  Plus a qemu agent call to do the suspend.  Needs more
thought.

- Steve

> 
> Thanks,
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 17:49 ` Dr. David Alan Gilbert
@ 2020-07-30 19:31   ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-30 19:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Daniel P. Berrange, Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, qemu-devel, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée


[-- Attachment #1: Type: text/plain, Size: 10222 bytes --]

On 7/30/2020 1:49 PM, Dr. David Alan Gilbert wrote:
> * Steve Sistare (steven.sistare@oracle.com) wrote:
>> Improve and extend the qemu functions that save and restore VM state so a
>> guest may be suspended and resumed with minimal pause time.  qemu may be
>> updated to a new version in between.
> 
> Nice.
> 
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
> 
> can I ask why you don't just add a migration flag to skip the devices
> you don't want, and then do a migrate to a file?
> (i.e. migrate "exec:cat > afile")
> We already have the 'x-ignore-shared' capability that's used for doing
> RAM snapshots of VMs; primarily I think for being able to start a VM
> from a RAM snapshot as a fast VM start trick.
> (There's also a xen_save_devices that does something similar).
> If you backed the RAM as you say, enabled x-ignore-shared and then did:
> 
>    migrate "exec:cat > afile"
> 
> and restarted the destination with:
> 
>     migrate_incoming "exec:cat afile"
> 
> what is different (except the later stuff about the vfio magic and
> chardevs).
> 
> Dave

Yes, I did consider whether to extend the migration syntax and implemention in
save_vmstate and load_vmstate, versus creating something new.  Those functions 
handle stuff like bdrv snapshot, aio, and migration which are n/a for the cpr 
use case, and the cpr functions handle state that is n/a for the migration case. 
I judged that a single function handling both would be less readable and 
maintainable.  At their core all these routines call qemu_loadvm_state() and 
qemu_savevm_state().
 The surrounding code is mostly different.


Take a look at 
  savevm.c:save_vmstate()   vs   save_cpr_snapshot() attached
and
  savevm.c:load_vmstate()   vs   load_cpr_snapshot() attached

I attached the complete versions of the cpr functions because they are built up
over multiple patches in this series, thus hard to visualize in patch form.

- Steve

> 
>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
>>
>>    1 savevm: add vmstate handler iterators
>>    2 savevm: VM handlers mode mask
>>    3 savevm: QMP command for cprsave
>>    4 savevm: HMP Command for cprsave
>>    5 savevm: QMP command for cprload
>>    6 savevm: HMP Command for cprload
>>    7 savevm: QMP command for cprinfo
>>    8 savevm: HMP command for cprinfo
>>    9 savevm: prevent cprsave if memory is volatile
>>   10 kvmclock: restore paused KVM clock
>>   11 cpu: disable ticks when suspended
>>   12 vl: pause option
>>   13 gdbstub: gdb support for suspended state
>>
>> The next patches add a restart method that eliminates the persistent memory
>> constraint, and allows qemu to be updated across the restart, but does not
>> allow host reboot.  Anonymous memory segments used by the guest are
>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
>>
>>   14 savevm: VMS_RESTART and cprsave restart
>>   15 vl: QEMU_START_FREEZE env var
>>   16 oslib: add qemu_clr_cloexec
>>   17 util: env var helpers
>>   18 osdep: import MADV_DOEXEC
>>   19 memory: ram_block_add cosmetic changes
>>   20 vl: add helper to request re-exec
>>   21 exec, memory: exec(3) to restart
>>   22 char: qio_channel_socket_accept reuse fd
>>   23 char: save/restore chardev socket fds
>>   24 ui: save/restore vnc socket fds
>>   25 char: save/restore chardev pty fds
>>   26 monitor: save/restore QMP negotiation status
>>   27 vhost: reset vhost devices upon cprsave
>>   28 char: restore terminal on restart
>>
>> The next patches extend the restart method to save and restore vfio-pci
>> state, eliminating the requirement for a guest agent.  The vfio container,
>> group, and device descriptors are preserved across the qemu re-exec.
>>
>>   29 pci: export pci_update_mappings
>>   30 vfio-pci: save and restore
>>   31 vfio-pci: trace pci config
>>   32 vfio-pci: improved tracing
>>
>> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
>> "cprload restart".  The software update is performed while the guest is
>> running to minimize downtime.
>>
>> window 1				| window 2
>> 					|
>> # qemu-system-x86_64 ... 		|
>> QEMU 4.2.0 monitor - type 'help' ...	|
>> (qemu) info status			|
>> VM status: running			|
>> 					| # yum update qemu
>> (qemu) cprsave /tmp/qemu.sav restart	|
>> QEMU 4.2.1 monitor - type 'help' ...	|
>> (qemu) info status			|
>> VM status: paused (prelaunch)		|
>> (qemu) cprload /tmp/qemu.sav		|
>> (qemu) info status			|
>> VM status: running			|
>>
>>
>> Here is an example of updating the host kernel using "cprload reboot"
>>
>> window 1					| window 2
>> 						|
>> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
>> QEMU 4.2.1 monitor - type 'help' ...		|
>> (qemu) info status				|
>> VM status: running				|
>> 						| # yum update kernel-uek
>> (qemu) cprsave /tmp/qemu.sav restart		|
>> 						|
>> # systemctl kexec				|
>> kexec_core: Starting new kernel			|
>> ...						|
>> 						|
>> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
>> QEMU 4.2.1 monitor - type 'help' ...		|
>> (qemu) info status				|
>> VM status: paused (prelaunch)			|
>> (qemu) cprload /tmp/qemu.sav			|
>> (qemu) info status				|
>> VM status: running				|
>>
>>
>> Mark Kanda (5):
>>   char: qio_channel_socket_accept reuse fd
>>   char: save/restore chardev socket fds
>>   ui: save/restore vnc socket fds
>>   monitor: save/restore QMP negotiation status
>>   vhost: reset vhost devices upon cprsave
>>
>> Steve Sistare (27):
>>   savevm: add vmstate handler iterators
>>   savevm: VM handlers mode mask
>>   savevm: QMP command for cprsave
>>   savevm: HMP Command for cprsave
>>   savevm: QMP command for cprload
>>   savevm: HMP Command for cprload
>>   savevm: QMP command for cprinfo
>>   savevm: HMP command for cprinfo
>>   savevm: prevent cprsave if memory is volatile
>>   kvmclock: restore paused KVM clock
>>   cpu: disable ticks when suspended
>>   vl: pause option
>>   gdbstub: gdb support for suspended state
>>   savevm: VMS_RESTART and cprsave restart
>>   vl: QEMU_START_FREEZE env var
>>   oslib: add qemu_clr_cloexec
>>   util: env var helpers
>>   osdep: import MADV_DOEXEC
>>   memory: ram_block_add cosmetic changes
>>   vl: add helper to request re-exec
>>   exec, memory: exec(3) to restart
>>   char: save/restore chardev pty fds
>>   char: restore terminal on restart
>>   pci: export pci_update_mappings
>>   vfio-pci: save and restore
>>   vfio-pci: trace pci config
>>   vfio-pci: improved tracing
>>
>>  MAINTAINERS                    |   7 ++
>>  accel/kvm/kvm-all.c            |   8 +-
>>  accel/kvm/trace-events         |   3 +-
>>  chardev/char-pty.c             |  38 +++++--
>>  chardev/char-socket.c          |  35 ++++++
>>  chardev/char-stdio.c           |   7 ++
>>  chardev/char.c                 |  16 +++
>>  exec.c                         |  88 +++++++++++++--
>>  gdbstub.c                      |  11 +-
>>  hmp-commands.hx                |  46 ++++++++
>>  hw/i386/kvm/clock.c            |   6 +-
>>  hw/pci/msix.c                  |   1 +
>>  hw/pci/pci.c                   |  17 +--
>>  hw/pci/trace-events            |   5 +-
>>  hw/vfio/common.c               | 115 ++++++++++++++++----
>>  hw/vfio/pci.c                  | 179 ++++++++++++++++++++++++++++++-
>>  hw/vfio/platform.c             |   2 +-
>>  hw/vfio/trace-events           |  11 +-
>>  hw/virtio/vhost.c              |  12 +++
>>  include/chardev/char.h         |   8 ++
>>  include/exec/memory.h          |   4 +
>>  include/hw/pci/pci.h           |   2 +
>>  include/hw/vfio/vfio-common.h  |   4 +-
>>  include/io/channel-socket.h    |   3 +-
>>  include/migration/register.h   |   3 +
>>  include/migration/vmstate.h    |  11 ++
>>  include/monitor/hmp.h          |   3 +
>>  include/qemu/cutils.h          |   1 +
>>  include/qemu/env.h             |  31 ++++++
>>  include/qemu/osdep.h           |   8 ++
>>  include/sysemu/sysemu.h        |  10 ++
>>  io/channel-socket.c            |  12 ++-
>>  io/net-listener.c              |   4 +-
>>  migration/block.c              |   1 +
>>  migration/migration.c          |   4 +-
>>  migration/ram.c                |   1 +
>>  migration/savevm.c             | 237 ++++++++++++++++++++++++++++++++++++-----
>>  migration/savevm.h             |   4 +-
>>  monitor/hmp-cmds.c             |  28 +++++
>>  monitor/qmp-cmds.c             |  16 +++
>>  monitor/qmp.c                  |  42 ++++++++
>>  qapi/migration.json            |  35 ++++++
>>  qapi/pragma.json               |   1 +
>>  qemu-options.hx                |   9 ++
>>  scsi/qemu-pr-helper.c          |   2 +-
>>  softmmu/vl.c                   |  65 ++++++++++-
>>  tests/qtest/tpm-emu.c          |   2 +-
>>  tests/test-char.c              |   2 +-
>>  tests/test-io-channel-socket.c |   4 +-
>>  trace-events                   |   2 +
>>  ui/vnc.c                       | 153 +++++++++++++++++++++-----
>>  util/Makefile.objs             |   2 +-
>>  util/env.c                     | 132 +++++++++++++++++++++++
>>  util/oslib-posix.c             |   9 ++
>>  util/oslib-win32.c             |   4 +
>>  55 files changed, 1331 insertions(+), 135 deletions(-)
>>  create mode 100644 include/qemu/env.h
>>  create mode 100644 util/env.c
>>
>> -- 
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

[-- Attachment #2: save_cpr_snapshot.c --]
[-- Type: text/plain, Size: 1947 bytes --]

void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
{
    int ret = 0;
    QEMUFile *f;
    VMStateMode op;

    if (!strcmp(mode, "reboot")) {
        op = VMS_REBOOT;
    } else if (!strcmp(mode, "restart")) {
        op = VMS_RESTART;
    } else {
        error_setg(errp, "cprsave: bad mode %s", mode);
        return;
    }

    if (op == VMS_REBOOT && qemu_ram_volatile(errp)) {
        return;
    }

    if (op == VMS_RESTART && QEMU_MADV_DOEXEC == QEMU_MADV_INVALID) {
        error_setg(errp, "kernel does not support MADV_DOEXEC.");
        return;
    }

    if (op == VMS_RESTART && xen_enabled()) {
        error_setg(errp, "xen does not support cprsave restart");
        return;
    }

    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, errp);
    if (!f) {
        return;
    }

    ret = global_state_store();
    if (ret) {
        error_setg(errp, "Error saving global state");
        qemu_fclose(f);
        return;
    }

    /* Update timers_state before saving.  Suspend did not so do. */
    if (runstate_check(RUN_STATE_SUSPENDED)) {
        cpu_disable_ticks();
    }

    vm_stop(RUN_STATE_SAVE_VM);

    ret = qemu_savevm_state(f, op, errp);
    if ((ret < 0) && !*errp) {
        error_setg(errp, "qemu_savevm_state failed");
    }
    qemu_fclose(f);

    if (op == VMS_REBOOT) {
        no_shutdown = 0;
        qemu_system_shutdown_request();
    } else if (op == VMS_RESTART) {
        if (qemu_preserve_ram(errp)) {
            return;
        }
        save_chardev_fds();
        save_vnc_fds();
        save_named_fd("mntfd");          /* was received from qemu-cpr */
        save_named_fd("ctlfd");          /* was received from qemu-cpr */
        walkenv(FD_PREFIX, preserve_fd, 0);
        reset_vhost_devices();
        save_qmp_negotiation_status();
        qemu_term_exit();
        qemu_system_exec_request();
        putenv((char *)"QEMU_START_FREEZE=");
    }
}


[-- Attachment #3: load_cpr_snapshot.c --]
[-- Type: text/plain, Size: 758 bytes --]

void load_cpr_snapshot(const char *file, Error **errp)
{
    QEMUFile *f;
    int ret;
    RunState state;

    if (runstate_is_running()) {
        error_setg(errp, "cprload called for a running VM");
        return;
    }

    f = qf_file_open(file, O_RDONLY, 0, errp);
    if (!f) {
        return;
    }

    ret = qemu_loadvm_state(f, VMS_REBOOT | VMS_RESTART);
    qemu_fclose(f);
    if (ret < 0) {
        error_setg(errp, "Error %d while loading VM state", ret);
        return;
    }

    state = global_state_get_runstate();
    if (state == RUN_STATE_RUNNING) {
        vm_start();
    } else {
        runstate_set(state);
        if (runstate_check(RUN_STATE_SUSPENDED)) {
            start_on_wake = 1;
        }
    }

    load_vnc_fds();
}


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 19:09   ` Steven Sistare
@ 2020-07-30 21:39     ` Paolo Bonzini
  2020-07-31 19:22       ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2020-07-30 21:39 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin,
	Philippe Mathieu-Daudé,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Alex Bennée

On 30/07/20 21:09, Steven Sistare wrote:
>> please spell it out.  Also, how does the functionality compare to
>> xen-save-devices-state and xen-load-devices-state?
>
> qmp_xen_save_devices_state serializes device state to a file which is loaded 
> on the target for a live migration.  It performs some of the same actions
> as cprsave/cprload but does not support live update-in-place.

So it is a subset, can code be reused across both?  Also, live migration
across versions is supported, so can you describe the special
update-in-place support more precisely?  I am confused about the use
cases, which require (or try) to keep file descriptors across re-exec,
which are for kexec, and so on.

>>> cprsave and cprload support guests with vfio devices if the caller first
>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>>> The guest drivers suspend methods flush outstanding requests and re-
>>> initialize the devices, and thus there is no device state to save and
>>> restore.
>> This probably should be allowed even for regular migration.  Can you
>> generalize the code as a separate series?
>
> Maybe.  I think that would be a distinct patch that ignores the vfio migration blocker 
> if the state is suspended.  Plus a qemu agent call to do the suspend.  Needs more
> thought.

The agent already supports suspend, so that should be relatively easy.
Only the code to add/remove the VFIO migration blocker from a VM state
change notifier, or something like that, would be needed.

Paolo



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 18:48   ` Steven Sistare
@ 2020-07-31  8:53     ` Daniel P. Berrangé
  2020-07-31 15:27       ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Daniel P. Berrangé @ 2020-07-31  8:53 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Michael S. Tsirkin, Alex Bennée, Juan Quintela, qemu-devel,
	Markus Armbruster, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> >> Improve and extend the qemu functions that save and restore VM state so a
> >> guest may be suspended and resumed with minimal pause time.  qemu may be
> >> updated to a new version in between.
> >>
> >> The first set of patches adds the cprsave and cprload commands to save and
> >> restore VM state, and allow the host kernel to be updated and rebooted in
> >> between.  The VM must create guest RAM in a persistent shared memory file,
> >> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> >> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> >>
> >> cprsave stops the VCPUs and saves VM device state in a simple file, and
> >> thus supports any type of guest image and block device.  The caller must
> >> not modify the VM's block devices between cprsave and cprload.
> >>
> >> cprsave and cprload support guests with vfio devices if the caller first
> >> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> >> The guest drivers suspend methods flush outstanding requests and re-
> >> initialize the devices, and thus there is no device state to save and
> >> restore.
> >>
> >>    1 savevm: add vmstate handler iterators
> >>    2 savevm: VM handlers mode mask
> >>    3 savevm: QMP command for cprsave
> >>    4 savevm: HMP Command for cprsave
> >>    5 savevm: QMP command for cprload
> >>    6 savevm: HMP Command for cprload
> >>    7 savevm: QMP command for cprinfo
> >>    8 savevm: HMP command for cprinfo
> >>    9 savevm: prevent cprsave if memory is volatile
> >>   10 kvmclock: restore paused KVM clock
> >>   11 cpu: disable ticks when suspended
> >>   12 vl: pause option
> >>   13 gdbstub: gdb support for suspended state
> >>
> >> The next patches add a restart method that eliminates the persistent memory
> >> constraint, and allows qemu to be updated across the restart, but does not
> >> allow host reboot.  Anonymous memory segments used by the guest are
> >> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> >> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> >> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> >>
> >>   14 savevm: VMS_RESTART and cprsave restart
> >>   15 vl: QEMU_START_FREEZE env var
> >>   16 oslib: add qemu_clr_cloexec
> >>   17 util: env var helpers
> >>   18 osdep: import MADV_DOEXEC
> >>   19 memory: ram_block_add cosmetic changes
> >>   20 vl: add helper to request re-exec
> >>   21 exec, memory: exec(3) to restart
> >>   22 char: qio_channel_socket_accept reuse fd
> >>   23 char: save/restore chardev socket fds
> >>   24 ui: save/restore vnc socket fds
> >>   25 char: save/restore chardev pty fds
> > 
> > Keeping FDs open across re-exec is a nice trick, but how are you dealing
> > with the state associated with them, most especially the TLS encryption
> > state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> > GNUTLS maintains, and the patches don't show any sign of dealing with
> > this. IOW it looks like while the FD will be preserved, any TLS session
> > running on it will fail.
> 
> I had not considered TLS.  If a non-qemu library maintains connection state, then
> we won't be able to support it for live update until the library provides interfaces
> to serialize the state.
> 
> For qemu objects, so far vmstate has been adequate to represent the devices with
> descriptors that we preserve.

My main concern about this series is that there is an implicit assumption
that QEMU is *not* configured with certain features that are not handled
If QEMU is using one of the unsupported features, I don't see anything in
the series which attempts to prevent the actions.

IOW, users can have an arbitrary QEMU config, attempt to use these new features,
the commands may well succeed, but the user is silently left with a broken QEMU.
Such silent failure modes are really undesirable as they'll lead to a never
ending stream of hard to diagnose bug reports for QEMU maintainers.

TLS is one example of this, the live upgrade  will "succeed", but the TLS
connections will be totally non-functional.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 24/32] ui: save/restore vnc socket fds
  2020-07-30 15:14 ` [PATCH V1 24/32] ui: save/restore vnc " Steve Sistare
@ 2020-07-31  9:06   ` Daniel P. Berrangé
  2020-07-31 16:51     ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Daniel P. Berrangé @ 2020-07-31  9:06 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Markus Armbruster, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée, Dr. David Alan Gilbert

On Thu, Jul 30, 2020 at 08:14:28AM -0700, Steve Sistare wrote:
> From: Mark Kanda <mark.kanda@oracle.com>
> 
> Iterate through the VNC displays and save/restore the socket fds.

This patch doesn't appear to do anything around the client state, so I
can't see how this will work in general.  eg QEMU is 1/2 way through
receiving a message from the client, and we trigger re-exec.

The new QEMU is going to startup considering the VNC client is in an
idle state, and will then read the 2nd 1/2 of the message off the
client socket. Everything will go rapidly downhill from there.
Or the reverse, the server has sent a message, but this outbound
message is still in the buffer and only been partially sent on the
wire. We re'exec and now we've lost the unsent part of the buffer.


> 
> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  include/sysemu/sysemu.h |   2 +
>  migration/savevm.c      |   3 +
>  ui/vnc.c                | 153 +++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 130 insertions(+), 28 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index fa1a5c3..3e7bfee 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -28,6 +28,8 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify);
>  void save_cpr_snapshot(const char *file, const char *mode, Error **errp);
>  void load_cpr_snapshot(const char *file, Error **errp);
>  void save_chardev_fds(void);
> +void save_vnc_fds(void);
> +void load_vnc_fds(void);
>  
>  extern int autostart;
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 81f38c4..35fafb7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2768,6 +2768,7 @@ void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
>              return;
>          }
>          save_chardev_fds();
> +        save_vnc_fds();
>          walkenv(FD_PREFIX, preserve_fd, 0);
>          qemu_system_exec_request();
>          putenv((char *)"QEMU_START_FREEZE=");
> @@ -3015,6 +3016,8 @@ void load_cpr_snapshot(const char *file, Error **errp)
>              start_on_wake = 1;
>          }
>      }
> +
> +    load_vnc_fds();
>  }
>  
>  int load_snapshot(const char *name, Error **errp)
> diff --git a/ui/vnc.c b/ui/vnc.c
> index f006aa1..947ddf5 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -50,6 +50,7 @@
>  #include "qom/object_interfaces.h"
>  #include "qemu/cutils.h"
>  #include "io/dns-resolver.h"
> +#include "sysemu/sysemu.h"
>  
>  #define VNC_REFRESH_INTERVAL_BASE GUI_REFRESH_INTERVAL_DEFAULT
>  #define VNC_REFRESH_INTERVAL_INC  50
> @@ -2214,28 +2215,34 @@ static void set_pixel_format(VncState *vs, int bits_per_pixel,
>      graphic_hw_update(vs->vd->dcl.con);
>  }
>  
> -static void pixel_format_message (VncState *vs) {
> +/*
> + * reuse - true if we are using an existing (already initialized)
> + * connection to a vnc client
> + */
> +static void pixel_format_message(VncState *vs, bool reuse)
> +{
>      char pad[3] = { 0, 0, 0 };
>  
>      vs->client_pf = qemu_default_pixelformat(32);
>  
> -    vnc_write_u8(vs, vs->client_pf.bits_per_pixel); /* bits-per-pixel */
> -    vnc_write_u8(vs, vs->client_pf.depth); /* depth */
> +    if (!reuse) {
> +        vnc_write_u8(vs, vs->client_pf.bits_per_pixel); /* bits-per-pixel */
> +        vnc_write_u8(vs, vs->client_pf.depth); /* depth */
>  
>  #ifdef HOST_WORDS_BIGENDIAN
> -    vnc_write_u8(vs, 1);             /* big-endian-flag */
> +        vnc_write_u8(vs, 1);             /* big-endian-flag */
>  #else
> -    vnc_write_u8(vs, 0);             /* big-endian-flag */
> +        vnc_write_u8(vs, 0);             /* big-endian-flag */
>  #endif
> -    vnc_write_u8(vs, 1);             /* true-color-flag */
> -    vnc_write_u16(vs, vs->client_pf.rmax);     /* red-max */
> -    vnc_write_u16(vs, vs->client_pf.gmax);     /* green-max */
> -    vnc_write_u16(vs, vs->client_pf.bmax);     /* blue-max */
> -    vnc_write_u8(vs, vs->client_pf.rshift);    /* red-shift */
> -    vnc_write_u8(vs, vs->client_pf.gshift);    /* green-shift */
> -    vnc_write_u8(vs, vs->client_pf.bshift);    /* blue-shift */
> -    vnc_write(vs, pad, 3);           /* padding */
> -
> +        vnc_write_u8(vs, 1);             /* true-color-flag */
> +        vnc_write_u16(vs, vs->client_pf.rmax);     /* red-max */
> +        vnc_write_u16(vs, vs->client_pf.gmax);     /* green-max */
> +        vnc_write_u16(vs, vs->client_pf.bmax);     /* blue-max */
> +        vnc_write_u8(vs, vs->client_pf.rshift);    /* red-shift */
> +        vnc_write_u8(vs, vs->client_pf.gshift);    /* green-shift */
> +        vnc_write_u8(vs, vs->client_pf.bshift);    /* blue-shift */
> +        vnc_write(vs, pad, 3);           /* padding */
> +    }
>      vnc_hextile_set_pixel_conversion(vs, 0);
>      vs->write_pixels = vnc_write_pixels_copy;
>  }
> @@ -2252,7 +2259,7 @@ static void vnc_colordepth(VncState *vs)
>                                 pixman_image_get_width(vs->vd->server),
>                                 pixman_image_get_height(vs->vd->server),
>                                 VNC_ENCODING_WMVi);
> -        pixel_format_message(vs);
> +        pixel_format_message(vs, false);
>          vnc_unlock_output(vs);
>          vnc_flush(vs);
>      } else {
> @@ -2420,7 +2427,8 @@ static int protocol_client_msg(VncState *vs, uint8_t *data, size_t len)
>      return 0;
>  }
>  
> -static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
> +static int protocol_client_init_base(VncState *vs, uint8_t *data, size_t len,
> +                                     bool reuse)
>  {
>      char buf[1024];
>      VncShareMode mode;
> @@ -2495,10 +2503,11 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
>             pixman_image_get_height(vs->vd->server) >= 0);
>      vs->client_width = pixman_image_get_width(vs->vd->server);
>      vs->client_height = pixman_image_get_height(vs->vd->server);
> -    vnc_write_u16(vs, vs->client_width);
> -    vnc_write_u16(vs, vs->client_height);
> -
> -    pixel_format_message(vs);
> +    if (!reuse) {
> +        vnc_write_u16(vs, vs->client_width);
> +        vnc_write_u16(vs, vs->client_height);
> +    }
> +    pixel_format_message(vs, reuse);
>  
>      if (qemu_name) {
>          size = snprintf(buf, sizeof(buf), "QEMU (%s)", qemu_name);
> @@ -2509,9 +2518,11 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
>          size = snprintf(buf, sizeof(buf), "QEMU");
>      }
>  
> -    vnc_write_u32(vs, size);
> -    vnc_write(vs, buf, size);
> -    vnc_flush(vs);
> +    if (!reuse) {
> +        vnc_write_u32(vs, size);
> +        vnc_write(vs, buf, size);
> +        vnc_flush(vs);
> +    }
>  
>      vnc_client_cache_auth(vs);
>      vnc_qmp_event(vs, QAPI_EVENT_VNC_INITIALIZED);
> @@ -2521,6 +2532,11 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
>      return 0;
>  }
>  
> +static int protocol_client_init(VncState *vs, uint8_t *data, size_t len)
> +{
> +    return protocol_client_init_base(vs, data, len, false);
> +}
> +
>  void start_client_init(VncState *vs)
>  {
>      vnc_read_when(vs, protocol_client_init, 1);
> @@ -3012,8 +3028,12 @@ static void vnc_refresh(DisplayChangeListener *dcl)
>      }
>  }
>  
> +/*
> + * reuse - true if we are using an existing (already initialized)
> + * connection to a vnc client
> + */
>  static void vnc_connect(VncDisplay *vd, QIOChannelSocket *sioc,
> -                        bool skipauth, bool websocket)
> +                        bool skipauth, bool websocket, bool reuse)
>  {
>      VncState *vs = g_new0(VncState, 1);
>      bool first_client = QTAILQ_EMPTY(&vd->clients);
> @@ -3109,10 +3129,15 @@ static void vnc_connect(VncDisplay *vd, QIOChannelSocket *sioc,
>  
>      graphic_hw_update(vd->dcl.con);
>  
> -    if (!vs->websocket) {
> +    if ((!vs->websocket) && !reuse) {
>          vnc_start_protocol(vs);
>      }
>  
> +    if (reuse) {
> +        uint8_t data[1] = {0};
> +        (void) protocol_client_init_base(vs, data, sizeof(data), true);
> +    }
> +
>      if (vd->num_connecting > vd->connections_limit) {
>          QTAILQ_FOREACH(vs, &vd->clients, next) {
>              if (vs->share_mode == VNC_SHARE_MODE_CONNECTING) {
> @@ -3143,7 +3168,7 @@ static void vnc_listen_io(QIONetListener *listener,
>      qio_channel_set_name(QIO_CHANNEL(cioc),
>                           isWebsock ? "vnc-ws-server" : "vnc-server");
>      qio_channel_set_delay(QIO_CHANNEL(cioc), false);
> -    vnc_connect(vd, cioc, false, isWebsock);
> +    vnc_connect(vd, cioc, false, isWebsock, false);
>  }
>  
>  static const DisplayChangeListenerOps dcl_ops = {
> @@ -3733,7 +3758,7 @@ static int vnc_display_connect(VncDisplay *vd,
>      if (qio_channel_socket_connect_sync(sioc, saddr[0], errp) < 0) {
>          return -1;
>      }
> -    vnc_connect(vd, sioc, false, false);
> +    vnc_connect(vd, sioc, false, false, false);
>      object_unref(OBJECT(sioc));
>      return 0;
>  }
> @@ -4057,7 +4082,7 @@ void vnc_display_add_client(const char *id, int csock, bool skipauth)
>      sioc = qio_channel_socket_new_fd(csock, NULL);
>      if (sioc) {
>          qio_channel_set_name(QIO_CHANNEL(sioc), "vnc-server");
> -        vnc_connect(vd, sioc, skipauth, false);
> +        vnc_connect(vd, sioc, skipauth, false, false);
>          object_unref(OBJECT(sioc));
>      }
>  }
> @@ -4117,3 +4142,75 @@ static void vnc_register_config(void)
>      qemu_add_opts(&qemu_vnc_opts);
>  }
>  opts_init(vnc_register_config);
> +
> +void save_vnc_fds(void)
> +{
> +    VncDisplay *vd;
> +    VncState *vs;
> +    int disp_num = 0;
> +    char name[40];
> +
> +    QTAILQ_FOREACH(vd, &vnc_displays, next) {
> +        QTAILQ_FOREACH(vs, &vd->clients, next) {
> +            if (vs->sioc) {
> +                snprintf(name, sizeof(name), "%s_%d", vs->sioc->parent.name,
> +                         disp_num);

'disp_num' is only updated by the outer loop. So if we have multiple
iterations of the inner loop, we'll have multiple FDs wth the same
name that try to be stored. Presumably we'll loose all but the last.

> +                setenv_fd(name, vs->sioc->fd);
> +                break;
> +            }
> +        }
> +        disp_num++;
> +    }
> +}
> +
> +static void set_vnc_fd(char *name, QIOChannelSocket *cioc, VncDisplay *vd,
> +                       bool isWebsock)
> +{
> +    VncState *vs;
> +    QIOChannelSocket *sioc;
> +
> +    int fd = getenv_fd(name);
> +    if (fd != -1) {
> +        sioc = qio_channel_socket_accept(cioc, fd, NULL);
> +        if (sioc) {
> +            unsetenv_fd(name);
> +            qio_channel_set_name(QIO_CHANNEL(sioc),
> +                                 isWebsock ? "vnc-ws-server" : "vnc-server");
> +
> +            qio_channel_set_delay(QIO_CHANNEL(sioc), false);
> +            vnc_connect(vd, sioc, false, isWebsock, true);
> +            object_unref(OBJECT(sioc));
> +
> +            /* force update on all clients */
> +            QTAILQ_FOREACH(vs, &vd->clients, next) {
> +                vs->update = VNC_STATE_UPDATE_FORCE;
> +            }
> +        } else {
> +            error_printf("Could not restore vnc channel %s; "
> +                     "client must reconnect.\n", name);
> +        }
> +    }
> +}
> +
> +void load_vnc_fds(void)
> +{
> +    VncDisplay *vd;
> +    QIOChannelSocket *cioc = NULL;
> +    int disp_num = 0;
> +    char name[40];
> +
> +    QTAILQ_FOREACH(vd, &vnc_displays, next) {
> +        if (vd->listener) {
> +            cioc = *vd->listener->sioc;
> +            snprintf(name, sizeof(name), "vnc-server_%d", disp_num);
> +            set_vnc_fd(name, cioc, vd, false);
> +        }
> +
> +        if (vd->wslistener) {
> +            cioc = *vd->wslistener->sioc;
> +            snprintf(name, sizeof(name), "vnc-ws-server_%d", disp_num);
> +            set_vnc_fd(name, cioc, vd, true);
> +        }
> +        disp_num++;

This only attempts to restore a single client for each listener,
despite trying (but failing) to save multiple clients.


In any case, as per my comment at the top of the pathc, this whole
patch just looks broken as it is not doing anything with client
state.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-30 18:14     ` Steven Sistare
@ 2020-07-31  9:44       ` Alex Bennée
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2020-07-31  9:44 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Dr. David Alan Gilbert, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé


Steven Sistare <steven.sistare@oracle.com> writes:

> On 7/30/2020 1:03 PM, Alex Bennée wrote:
>> 
>> Steve Sistare <steven.sistare@oracle.com> writes:
>> 
>>> Provide the -pause command-line parameter and the QEMU_PAUSE environment
>>> variable to briefly pause QEMU in main and allow a developer to attach gdb.
>>> Useful when the developer does not invoke QEMU directly, such as when using
>>> libvirt.
>> 
>> How does this differ from -S?
>
> The -S flag runs qemu to the main loop but does not start the guest.  Lots of code
> that you may need to debug runs before you get there.

Right - so this is for attaching a debugger to QEMU itself, not using
the gdbstub? Why isn't this a problem the calling entity can solve by
the way it invoked QEMU?

We have similar sort of solutions for debugging our testcases:

  https://wiki.qemu.org/Features/QTest#Using_debugging_tools_under_the_test_harness

I still think:

>>> +DEF("pause", HAS_ARG, QEMU_OPTION_pause, \
>>> +    "-pause secs    Pause for secs seconds on entry to main.\n", QEMU_ARCH_ALL)
>>> +
>>> +SRST
>>> +``--pause secs``
>>> +    Pause for a number of seconds on entry to main.  Useful for attaching
>>> +    a debugger after QEMU has been launched by some other entity.
>>> +ERST
>>> +
>> 
>> It seems like having an option to race with the debugger is just asking
>> for trouble.

this make the option problematic.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-30 18:11     ` Steven Sistare
@ 2020-07-31 10:07       ` Daniel P. Berrangé
  2020-07-31 15:18         ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Daniel P. Berrangé @ 2020-07-31 10:07 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Michael S. Tsirkin, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Bennée, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Thu, Jul 30, 2020 at 02:11:19PM -0400, Steven Sistare wrote:
> On 7/30/2020 12:20 PM, Eric Blake wrote:
> > On 7/30/20 10:14 AM, Steve Sistare wrote:
> >> Provide the -pause command-line parameter and the QEMU_PAUSE environment
> >> variable to briefly pause QEMU in main and allow a developer to attach gdb.
> >> Useful when the developer does not invoke QEMU directly, such as when using
> >> libvirt.
> > 
> > How would you set this option with libvirt?
> 
> Add -pause in the qemu args in the xml.
>  
> > It feels like you are trying to reinvent something that is already well-documented:
> > 
> > https://www.berrange.com/posts/2011/10/12/debugging-early-startup-of-kvm-with-gdb-when-launched-by-libvirtd/
> 
> Too many steps to reach BINGO for my taste.  Easier is better.  Also, in our shop we start qemu 
> in other ways, such as via services.


A "sleep" is a pretty crude & unreliable way to get into debugging
though. It is racy for a start, but also QEMU has a bunch of stuff
that runs via ELF constructors before main() even starts.

So I feel like the thing that starts QEMU is better placed to provide
a way in for debugging.

eg the service launcher can send SIGSTOP to the child process immediately
before the execve(qemu) call.

Now user can attach with the debugger, allow execution to continue,
and has ability to debug *everything* right from the ELF constructors
onwards into main() and all that follows.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 12/32] vl: pause option
  2020-07-31 10:07       ` Daniel P. Berrangé
@ 2020-07-31 15:18         ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-31 15:18 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Bennée, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On 7/31/2020 6:07 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 02:11:19PM -0400, Steven Sistare wrote:
>> On 7/30/2020 12:20 PM, Eric Blake wrote:
>>> On 7/30/20 10:14 AM, Steve Sistare wrote:
>>>> Provide the -pause command-line parameter and the QEMU_PAUSE environment
>>>> variable to briefly pause QEMU in main and allow a developer to attach gdb.
>>>> Useful when the developer does not invoke QEMU directly, such as when using
>>>> libvirt.
>>>
>>> How would you set this option with libvirt?
>>
>> Add -pause in the qemu args in the xml.
>>  
>>> It feels like you are trying to reinvent something that is already well-documented:
>>>
>>> https://www.berrange.com/posts/2011/10/12/debugging-early-startup-of-kvm-with-gdb-when-launched-by-libvirtd/
>>
>> Too many steps to reach BINGO for my taste.  Easier is better.  Also, in our shop we start qemu 
>> in other ways, such as via services.
> 
> A "sleep" is a pretty crude & unreliable way to get into debugging
> though. It is racy for a start, but also QEMU has a bunch of stuff
> that runs via ELF constructors before main() even starts.
> 
> So I feel like the thing that starts QEMU is better placed to provide
> a way in for debugging.
> 
> eg the service launcher can send SIGSTOP to the child process immediately
> before the execve(qemu) call.
> 
> Now user can attach with the debugger, allow execution to continue,
> and has ability to debug *everything* right from the ELF constructors
> onwards into main() and all that follows.
> 
> Regards,
> Daniel

That is a nice solution for the launchers we can modify.
We could use your idea in place of the sleep in main,
    kill(getpid(), SIGSTOP);

Not quite as good as being able to debug the elf constructors, but still helpful.

- Steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-31  8:53     ` Daniel P. Berrangé
@ 2020-07-31 15:27       ` Steven Sistare
  2020-07-31 15:52         ` Daniel P. Berrangé
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Sistare @ 2020-07-31 15:27 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, Alex Bennée, Juan Quintela, qemu-devel,
	Markus Armbruster, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
>> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
>>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>>>> Improve and extend the qemu functions that save and restore VM state so a
>>>> guest may be suspended and resumed with minimal pause time.  qemu may be
>>>> updated to a new version in between.
>>>>
>>>> The first set of patches adds the cprsave and cprload commands to save and
>>>> restore VM state, and allow the host kernel to be updated and rebooted in
>>>> between.  The VM must create guest RAM in a persistent shared memory file,
>>>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>>>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
>>>>
>>>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>>>> thus supports any type of guest image and block device.  The caller must
>>>> not modify the VM's block devices between cprsave and cprload.
>>>>
>>>> cprsave and cprload support guests with vfio devices if the caller first
>>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>>>> The guest drivers suspend methods flush outstanding requests and re-
>>>> initialize the devices, and thus there is no device state to save and
>>>> restore.
>>>>
>>>>    1 savevm: add vmstate handler iterators
>>>>    2 savevm: VM handlers mode mask
>>>>    3 savevm: QMP command for cprsave
>>>>    4 savevm: HMP Command for cprsave
>>>>    5 savevm: QMP command for cprload
>>>>    6 savevm: HMP Command for cprload
>>>>    7 savevm: QMP command for cprinfo
>>>>    8 savevm: HMP command for cprinfo
>>>>    9 savevm: prevent cprsave if memory is volatile
>>>>   10 kvmclock: restore paused KVM clock
>>>>   11 cpu: disable ticks when suspended
>>>>   12 vl: pause option
>>>>   13 gdbstub: gdb support for suspended state
>>>>
>>>> The next patches add a restart method that eliminates the persistent memory
>>>> constraint, and allows qemu to be updated across the restart, but does not
>>>> allow host reboot.  Anonymous memory segments used by the guest are
>>>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>>>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>>>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
>>>>
>>>>   14 savevm: VMS_RESTART and cprsave restart
>>>>   15 vl: QEMU_START_FREEZE env var
>>>>   16 oslib: add qemu_clr_cloexec
>>>>   17 util: env var helpers
>>>>   18 osdep: import MADV_DOEXEC
>>>>   19 memory: ram_block_add cosmetic changes
>>>>   20 vl: add helper to request re-exec
>>>>   21 exec, memory: exec(3) to restart
>>>>   22 char: qio_channel_socket_accept reuse fd
>>>>   23 char: save/restore chardev socket fds
>>>>   24 ui: save/restore vnc socket fds
>>>>   25 char: save/restore chardev pty fds
>>>
>>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
>>> with the state associated with them, most especially the TLS encryption
>>> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
>>> GNUTLS maintains, and the patches don't show any sign of dealing with
>>> this. IOW it looks like while the FD will be preserved, any TLS session
>>> running on it will fail.
>>
>> I had not considered TLS.  If a non-qemu library maintains connection state, then
>> we won't be able to support it for live update until the library provides interfaces
>> to serialize the state.
>>
>> For qemu objects, so far vmstate has been adequate to represent the devices with
>> descriptors that we preserve.
> 
> My main concern about this series is that there is an implicit assumption
> that QEMU is *not* configured with certain features that are not handled
> If QEMU is using one of the unsupported features, I don't see anything in
> the series which attempts to prevent the actions.
> 
> IOW, users can have an arbitrary QEMU config, attempt to use these new features,
> the commands may well succeed, but the user is silently left with a broken QEMU.
> Such silent failure modes are really undesirable as they'll lead to a never
> ending stream of hard to diagnose bug reports for QEMU maintainers.
> 
> TLS is one example of this, the live upgrade  will "succeed", but the TLS
> connections will be totally non-functional.

I agree with all your points and would like to do better in this area.  Other than hunting for 
every use of a descriptor and either supporting it or blocking cpr, do you have any suggestions?
Thinking out loud, maybe we can gather all the fds that we support, then look for all fds in the
process, and block the cpr if we find an unrecognized fd.

- Steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-31 15:27       ` Steven Sistare
@ 2020-07-31 15:52         ` Daniel P. Berrangé
  2020-07-31 17:20           ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Daniel P. Berrangé @ 2020-07-31 15:52 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Michael S. Tsirkin, Alex Bennée, Juan Quintela, qemu-devel,
	Markus Armbruster, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
> On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
> > On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> >> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> >>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> >>>> Improve and extend the qemu functions that save and restore VM state so a
> >>>> guest may be suspended and resumed with minimal pause time.  qemu may be
> >>>> updated to a new version in between.
> >>>>
> >>>> The first set of patches adds the cprsave and cprload commands to save and
> >>>> restore VM state, and allow the host kernel to be updated and rebooted in
> >>>> between.  The VM must create guest RAM in a persistent shared memory file,
> >>>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> >>>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> >>>>
> >>>> cprsave stops the VCPUs and saves VM device state in a simple file, and
> >>>> thus supports any type of guest image and block device.  The caller must
> >>>> not modify the VM's block devices between cprsave and cprload.
> >>>>
> >>>> cprsave and cprload support guests with vfio devices if the caller first
> >>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> >>>> The guest drivers suspend methods flush outstanding requests and re-
> >>>> initialize the devices, and thus there is no device state to save and
> >>>> restore.
> >>>>
> >>>>    1 savevm: add vmstate handler iterators
> >>>>    2 savevm: VM handlers mode mask
> >>>>    3 savevm: QMP command for cprsave
> >>>>    4 savevm: HMP Command for cprsave
> >>>>    5 savevm: QMP command for cprload
> >>>>    6 savevm: HMP Command for cprload
> >>>>    7 savevm: QMP command for cprinfo
> >>>>    8 savevm: HMP command for cprinfo
> >>>>    9 savevm: prevent cprsave if memory is volatile
> >>>>   10 kvmclock: restore paused KVM clock
> >>>>   11 cpu: disable ticks when suspended
> >>>>   12 vl: pause option
> >>>>   13 gdbstub: gdb support for suspended state
> >>>>
> >>>> The next patches add a restart method that eliminates the persistent memory
> >>>> constraint, and allows qemu to be updated across the restart, but does not
> >>>> allow host reboot.  Anonymous memory segments used by the guest are
> >>>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> >>>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> >>>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> >>>>
> >>>>   14 savevm: VMS_RESTART and cprsave restart
> >>>>   15 vl: QEMU_START_FREEZE env var
> >>>>   16 oslib: add qemu_clr_cloexec
> >>>>   17 util: env var helpers
> >>>>   18 osdep: import MADV_DOEXEC
> >>>>   19 memory: ram_block_add cosmetic changes
> >>>>   20 vl: add helper to request re-exec
> >>>>   21 exec, memory: exec(3) to restart
> >>>>   22 char: qio_channel_socket_accept reuse fd
> >>>>   23 char: save/restore chardev socket fds
> >>>>   24 ui: save/restore vnc socket fds
> >>>>   25 char: save/restore chardev pty fds
> >>>
> >>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> >>> with the state associated with them, most especially the TLS encryption
> >>> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> >>> GNUTLS maintains, and the patches don't show any sign of dealing with
> >>> this. IOW it looks like while the FD will be preserved, any TLS session
> >>> running on it will fail.
> >>
> >> I had not considered TLS.  If a non-qemu library maintains connection state, then
> >> we won't be able to support it for live update until the library provides interfaces
> >> to serialize the state.
> >>
> >> For qemu objects, so far vmstate has been adequate to represent the devices with
> >> descriptors that we preserve.
> > 
> > My main concern about this series is that there is an implicit assumption
> > that QEMU is *not* configured with certain features that are not handled
> > If QEMU is using one of the unsupported features, I don't see anything in
> > the series which attempts to prevent the actions.
> > 
> > IOW, users can have an arbitrary QEMU config, attempt to use these new features,
> > the commands may well succeed, but the user is silently left with a broken QEMU.
> > Such silent failure modes are really undesirable as they'll lead to a never
> > ending stream of hard to diagnose bug reports for QEMU maintainers.
> > 
> > TLS is one example of this, the live upgrade  will "succeed", but the TLS
> > connections will be totally non-functional.
> 
> I agree with all your points and would like to do better in this area.  Other than hunting for 
> every use of a descriptor and either supporting it or blocking cpr, do you have any suggestions?
> Thinking out loud, maybe we can gather all the fds that we support, then look for all fds in the
> process, and block the cpr if we find an unrecognized fd.

There's no magic easy answer to this problem. Conceptually it is similar to
the problem of reliably migrating guest device state, but in this case we're
primarily concerned about the backends instead.

For migration we've got standardized interfaces that devices must implement
in order to correctly support migration serialization. There is also support
for devices to register migration "blockers" which prevent any use of the
migration feature when the device is present.

We lack this kind of concept for the backend, and that's what I think needs
to be tackled in a more thorough way.  There are quite alot of backends,
but they're grouped into a reasonable small number of sets (UIs, chardevs,
blockdevs, net devs, etc). We need some standard interface that we can
plumb into all the backends, along with providing backends the ability to
block the re-exec. If we plumb the generic infrastructure into each of the
different types of backend, and make the default behaviour be to reject
the re-exec. Then we need to carefull consider specific  backend impls
and allow the re-exec only in the very precise cases we can demonstrate
to be safe.

IOW, have a presumption that re-exec will *not* be permitted. Over time
we can make it work for an ever expanding set of use cases. 


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 24/32] ui: save/restore vnc socket fds
  2020-07-31  9:06   ` Daniel P. Berrangé
@ 2020-07-31 16:51     ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-31 16:51 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Markus Armbruster, qemu-devel,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Alex Bennée, Dr. David Alan Gilbert

On 7/31/2020 5:06 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 08:14:28AM -0700, Steve Sistare wrote:
>> From: Mark Kanda <mark.kanda@oracle.com>
>>
>> Iterate through the VNC displays and save/restore the socket fds.
> 
> This patch doesn't appear to do anything around the client state, so I
> can't see how this will work in general.  eg QEMU is 1/2 way through
> receiving a message from the client, and we trigger re-exec.
> 
> The new QEMU is going to startup considering the VNC client is in an
> idle state, and will then read the 2nd 1/2 of the message off the
> client socket. Everything will go rapidly downhill from there.
> Or the reverse, the server has sent a message, but this outbound
> message is still in the buffer and only been partially sent on the
> wire. We re'exec and now we've lost the unsent part of the buffer.

Yes.  For partial messages in qemu object buffers, we need to add a draining phase
between exec-requested and exec, and complete all partial messages.

For kernel socket buffers, we should be OK.  If we are accurately preserving vnc
server state (which is the intent), then we can correctly respond to any client
reqwuests that were sent to us pre-exec but read into qemu post-exec.

However, there is another icky issue with vnc.  It only works reliably with raw 
encoding.  Compressed streams accumulate state on the client side which ww cannot
match on the server when we create a new zlib stream after exec.  The vnc protocol
defines a per-stream reset flag in the compression control word, which sounds like it
should reset zlib state, but it does not for tigervnc.  I have not tried other clients.

vnc is one of the tricker patches in this series.  It may be wisest to close the connection 
and require the client to reconnect.  The virtual framebuffer is preserved, so the same content 
will be shown after reconnect.

- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-31 15:52         ` Daniel P. Berrangé
@ 2020-07-31 17:20           ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-31 17:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, Alex Bennée, Juan Quintela, qemu-devel,
	Markus Armbruster, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 7/31/2020 11:52 AM, Daniel P. Berrangé wrote:
> On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
>> On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
>>> On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
>>>> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
>>>>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>>>>>> Improve and extend the qemu functions that save and restore VM state so a
>>>>>> guest may be suspended and resumed with minimal pause time.  qemu may be
>>>>>> updated to a new version in between.
>>>>>>
>>>>>> The first set of patches adds the cprsave and cprload commands to save and
>>>>>> restore VM state, and allow the host kernel to be updated and rebooted in
>>>>>> between.  The VM must create guest RAM in a persistent shared memory file,
>>>>>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>>>>>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
>>>>>>
>>>>>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>>>>>> thus supports any type of guest image and block device.  The caller must
>>>>>> not modify the VM's block devices between cprsave and cprload.
>>>>>>
>>>>>> cprsave and cprload support guests with vfio devices if the caller first
>>>>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>>>>>> The guest drivers suspend methods flush outstanding requests and re-
>>>>>> initialize the devices, and thus there is no device state to save and
>>>>>> restore.
>>>>>>
>>>>>>    1 savevm: add vmstate handler iterators
>>>>>>    2 savevm: VM handlers mode mask
>>>>>>    3 savevm: QMP command for cprsave
>>>>>>    4 savevm: HMP Command for cprsave
>>>>>>    5 savevm: QMP command for cprload
>>>>>>    6 savevm: HMP Command for cprload
>>>>>>    7 savevm: QMP command for cprinfo
>>>>>>    8 savevm: HMP command for cprinfo
>>>>>>    9 savevm: prevent cprsave if memory is volatile
>>>>>>   10 kvmclock: restore paused KVM clock
>>>>>>   11 cpu: disable ticks when suspended
>>>>>>   12 vl: pause option
>>>>>>   13 gdbstub: gdb support for suspended state
>>>>>>
>>>>>> The next patches add a restart method that eliminates the persistent memory
>>>>>> constraint, and allows qemu to be updated across the restart, but does not
>>>>>> allow host reboot.  Anonymous memory segments used by the guest are
>>>>>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>>>>>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>>>>>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
>>>>>>
>>>>>>   14 savevm: VMS_RESTART and cprsave restart
>>>>>>   15 vl: QEMU_START_FREEZE env var
>>>>>>   16 oslib: add qemu_clr_cloexec
>>>>>>   17 util: env var helpers
>>>>>>   18 osdep: import MADV_DOEXEC
>>>>>>   19 memory: ram_block_add cosmetic changes
>>>>>>   20 vl: add helper to request re-exec
>>>>>>   21 exec, memory: exec(3) to restart
>>>>>>   22 char: qio_channel_socket_accept reuse fd
>>>>>>   23 char: save/restore chardev socket fds
>>>>>>   24 ui: save/restore vnc socket fds
>>>>>>   25 char: save/restore chardev pty fds
>>>>>
>>>>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
>>>>> with the state associated with them, most especially the TLS encryption
>>>>> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
>>>>> GNUTLS maintains, and the patches don't show any sign of dealing with
>>>>> this. IOW it looks like while the FD will be preserved, any TLS session
>>>>> running on it will fail.
>>>>
>>>> I had not considered TLS.  If a non-qemu library maintains connection state, then
>>>> we won't be able to support it for live update until the library provides interfaces
>>>> to serialize the state.
>>>>
>>>> For qemu objects, so far vmstate has been adequate to represent the devices with
>>>> descriptors that we preserve.
>>>
>>> My main concern about this series is that there is an implicit assumption
>>> that QEMU is *not* configured with certain features that are not handled
>>> If QEMU is using one of the unsupported features, I don't see anything in
>>> the series which attempts to prevent the actions.
>>>
>>> IOW, users can have an arbitrary QEMU config, attempt to use these new features,
>>> the commands may well succeed, but the user is silently left with a broken QEMU.
>>> Such silent failure modes are really undesirable as they'll lead to a never
>>> ending stream of hard to diagnose bug reports for QEMU maintainers.
>>>
>>> TLS is one example of this, the live upgrade  will "succeed", but the TLS
>>> connections will be totally non-functional.
>>
>> I agree with all your points and would like to do better in this area.  Other than hunting for 
>> every use of a descriptor and either supporting it or blocking cpr, do you have any suggestions?
>> Thinking out loud, maybe we can gather all the fds that we support, then look for all fds in the
>> process, and block the cpr if we find an unrecognized fd.
> 
> There's no magic easy answer to this problem. Conceptually it is similar to
> the problem of reliably migrating guest device state, but in this case we're
> primarily concerned about the backends instead.
> 
> For migration we've got standardized interfaces that devices must implement
> in order to correctly support migration serialization. There is also support
> for devices to register migration "blockers" which prevent any use of the
> migration feature when the device is present.
> 
> We lack this kind of concept for the backend, and that's what I think needs
> to be tackled in a more thorough way.  There are quite alot of backends,
> but they're grouped into a reasonable small number of sets (UIs, chardevs,
> blockdevs, net devs, etc). We need some standard interface that we can
> plumb into all the backends, along with providing backends the ability to
> block the re-exec. If we plumb the generic infrastructure into each of the
> different types of backend, and make the default behaviour be to reject
> the re-exec. Then we need to carefull consider specific  backend impls
> and allow the re-exec only in the very precise cases we can demonstrate
> to be safe.
> 
> IOW, have a presumption that re-exec will *not* be permitted. Over time
> we can make it work for an ever expanding set of use cases. 

Actually, we could use the vmstate mode_mask field added in patch 2, and only allow the restart
mode for vmstate objects that have been vetted.  Currently an uninitialized mask (value 0)
enables the object for all modes, but we could change that.

- Steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 21:39     ` Paolo Bonzini
@ 2020-07-31 19:22       ` Steven Sistare
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-07-31 19:22 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin,
	Philippe Mathieu-Daudé,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Alex Bennée

On 7/30/2020 5:39 PM, Paolo Bonzini wrote:
> On 30/07/20 21:09, Steven Sistare wrote:
>>> please spell it out.  Also, how does the functionality compare to
>>> xen-save-devices-state and xen-load-devices-state?
>>
>> qmp_xen_save_devices_state serializes device state to a file which is loaded 
>> on the target for a live migration.  It performs some of the same actions
>> as cprsave/cprload but does not support live update-in-place.
> 
> So it is a subset, can code be reused across both?  

They use common subroutines, but their bodies check different conditions, so I
don't think merging would be an improvement.  We do provide a new helper 
qf_file_open() which could replace a handful of lines in both qmp_xen_save_devices_state 
and qmp_xen_load_devices_state.

> Also, live migration
> across versions is supported, so can you describe the special
> update-in-place support more precisely?  I am confused about the use
> cases, which require (or try) to keep file descriptors across re-exec,
> which are for kexec, and so on.

Sure. The first use case allows you to kexec reboot the host and update host
software and/or qemu.  It does not preserve descriptors, and guest ram must be
backed by persistant shared memory.  Guest pause time depends on host reboot
time, which can be seconds to 10's of seconds.

The second case allows you to update qemu in place, but not update the host.
Guest ram can be in shared or anonymous memory.  We call madvise(MADV_DOEXEC)
to tell the kernel to preserve anon memory across the exec.  Open descriptors
are preserved.  Addresses and lengths of saved memory segments are saved in
the environment, and the values of descriptors are saved.  When new qemu
restarts, it finds those values in the environment and uses them when the
various objects are created.  Memory is not realloc'd, it is already present,
and the address and lengths are saved in the ram objects.  Guest pause time
is in the 100 to 200 msec range.  It is less resource intensive than live
migration, and is appropriate if your only goal is to update qemu, as opposed
to evacuating a host.

>>>> cprsave and cprload support guests with vfio devices if the caller first
>>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>>>> The guest drivers suspend methods flush outstanding requests and re-
>>>> initialize the devices, and thus there is no device state to save and
>>>> restore.
>>> This probably should be allowed even for regular migration.  Can you
>>> generalize the code as a separate series?
>>
>> Maybe.  I think that would be a distinct patch that ignores the vfio migration blocker 
>> if the state is suspended.  Plus a qemu agent call to do the suspend.  Needs more
>> thought.
> 
> The agent already supports suspend, so that should be relatively easy.
> Only the code to add/remove the VFIO migration blocker from a VM state
> change notifier, or something like that, would be needed.

Yes, I have experimented with the guest's suspend method.

- Steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 00/32] Live Update
  2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
                   ` (34 preceding siblings ...)
  2020-07-30 17:49 ` Dr. David Alan Gilbert
@ 2020-08-04 18:18 ` Steven Sistare
  35 siblings, 0 replies; 66+ messages in thread
From: Steven Sistare @ 2020-08-04 18:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Alex Bennée,
	Juan Quintela, Dr. David Alan Gilbert, Markus Armbruster,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé

Hi folks, any questions or comments on the vfio and pci changes in 
patch 30?  Or on the means of preserving anonymous memory and re-exec'ing 
in patches 14 - 21?

- Steve

On 7/30/2020 11:14 AM, Steve Sistare wrote:
> Improve and extend the qemu functions that save and restore VM state so a
> guest may be suspended and resumed with minimal pause time.  qemu may be
> updated to a new version in between.
> 
> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.
> 
> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.
> 
>    1 savevm: add vmstate handler iterators
>    2 savevm: VM handlers mode mask
>    3 savevm: QMP command for cprsave
>    4 savevm: HMP Command for cprsave
>    5 savevm: QMP command for cprload
>    6 savevm: HMP Command for cprload
>    7 savevm: QMP command for cprinfo
>    8 savevm: HMP command for cprinfo
>    9 savevm: prevent cprsave if memory is volatile
>   10 kvmclock: restore paused KVM clock
>   11 cpu: disable ticks when suspended
>   12 vl: pause option
>   13 gdbstub: gdb support for suspended state
> 
> The next patches add a restart method that eliminates the persistent memory
> constraint, and allows qemu to be updated across the restart, but does not
> allow host reboot.  Anonymous memory segments used by the guest are
> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> 
>   14 savevm: VMS_RESTART and cprsave restart
>   15 vl: QEMU_START_FREEZE env var
>   16 oslib: add qemu_clr_cloexec
>   17 util: env var helpers
>   18 osdep: import MADV_DOEXEC
>   19 memory: ram_block_add cosmetic changes
>   20 vl: add helper to request re-exec
>   21 exec, memory: exec(3) to restart
>   22 char: qio_channel_socket_accept reuse fd
>   23 char: save/restore chardev socket fds
>   24 ui: save/restore vnc socket fds
>   25 char: save/restore chardev pty fds
>   26 monitor: save/restore QMP negotiation status
>   27 vhost: reset vhost devices upon cprsave
>   28 char: restore terminal on restart
> 
> The next patches extend the restart method to save and restore vfio-pci
> state, eliminating the requirement for a guest agent.  The vfio container,
> group, and device descriptors are preserved across the qemu re-exec.
> 
>   29 pci: export pci_update_mappings
>   30 vfio-pci: save and restore
>   31 vfio-pci: trace pci config
>   32 vfio-pci: improved tracing
> 
> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
> "cprload restart".  The software update is performed while the guest is
> running to minimize downtime.
> 
> window 1				| window 2
> 					|
> # qemu-system-x86_64 ... 		|
> QEMU 4.2.0 monitor - type 'help' ...	|
> (qemu) info status			|
> VM status: running			|
> 					| # yum update qemu
> (qemu) cprsave /tmp/qemu.sav restart	|
> QEMU 4.2.1 monitor - type 'help' ...	|
> (qemu) info status			|
> VM status: paused (prelaunch)		|
> (qemu) cprload /tmp/qemu.sav		|
> (qemu) info status			|
> VM status: running			|
> 
> 
> Here is an example of updating the host kernel using "cprload reboot"
> 
> window 1					| window 2
> 						|
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...		|
> (qemu) info status				|
> VM status: running				|
> 						| # yum update kernel-uek
> (qemu) cprsave /tmp/qemu.sav restart		|
> 						|
> # systemctl kexec				|
> kexec_core: Starting new kernel			|
> ...						|
> 						|
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...		|
> (qemu) info status				|
> VM status: paused (prelaunch)			|
> (qemu) cprload /tmp/qemu.sav			|
> (qemu) info status				|
> VM status: running				|
> 
> 
> Mark Kanda (5):
>   char: qio_channel_socket_accept reuse fd
>   char: save/restore chardev socket fds
>   ui: save/restore vnc socket fds
>   monitor: save/restore QMP negotiation status
>   vhost: reset vhost devices upon cprsave
> 
> Steve Sistare (27):
>   savevm: add vmstate handler iterators
>   savevm: VM handlers mode mask
>   savevm: QMP command for cprsave
>   savevm: HMP Command for cprsave
>   savevm: QMP command for cprload
>   savevm: HMP Command for cprload
>   savevm: QMP command for cprinfo
>   savevm: HMP command for cprinfo
>   savevm: prevent cprsave if memory is volatile
>   kvmclock: restore paused KVM clock
>   cpu: disable ticks when suspended
>   vl: pause option
>   gdbstub: gdb support for suspended state
>   savevm: VMS_RESTART and cprsave restart
>   vl: QEMU_START_FREEZE env var
>   oslib: add qemu_clr_cloexec
>   util: env var helpers
>   osdep: import MADV_DOEXEC
>   memory: ram_block_add cosmetic changes
>   vl: add helper to request re-exec
>   exec, memory: exec(3) to restart
>   char: save/restore chardev pty fds
>   char: restore terminal on restart
>   pci: export pci_update_mappings
>   vfio-pci: save and restore
>   vfio-pci: trace pci config
>   vfio-pci: improved tracing
> 
>  MAINTAINERS                    |   7 ++
>  accel/kvm/kvm-all.c            |   8 +-
>  accel/kvm/trace-events         |   3 +-
>  chardev/char-pty.c             |  38 +++++--
>  chardev/char-socket.c          |  35 ++++++
>  chardev/char-stdio.c           |   7 ++
>  chardev/char.c                 |  16 +++
>  exec.c                         |  88 +++++++++++++--
>  gdbstub.c                      |  11 +-
>  hmp-commands.hx                |  46 ++++++++
>  hw/i386/kvm/clock.c            |   6 +-
>  hw/pci/msix.c                  |   1 +
>  hw/pci/pci.c                   |  17 +--
>  hw/pci/trace-events            |   5 +-
>  hw/vfio/common.c               | 115 ++++++++++++++++----
>  hw/vfio/pci.c                  | 179 ++++++++++++++++++++++++++++++-
>  hw/vfio/platform.c             |   2 +-
>  hw/vfio/trace-events           |  11 +-
>  hw/virtio/vhost.c              |  12 +++
>  include/chardev/char.h         |   8 ++
>  include/exec/memory.h          |   4 +
>  include/hw/pci/pci.h           |   2 +
>  include/hw/vfio/vfio-common.h  |   4 +-
>  include/io/channel-socket.h    |   3 +-
>  include/migration/register.h   |   3 +
>  include/migration/vmstate.h    |  11 ++
>  include/monitor/hmp.h          |   3 +
>  include/qemu/cutils.h          |   1 +
>  include/qemu/env.h             |  31 ++++++
>  include/qemu/osdep.h           |   8 ++
>  include/sysemu/sysemu.h        |  10 ++
>  io/channel-socket.c            |  12 ++-
>  io/net-listener.c              |   4 +-
>  migration/block.c              |   1 +
>  migration/migration.c          |   4 +-
>  migration/ram.c                |   1 +
>  migration/savevm.c             | 237 ++++++++++++++++++++++++++++++++++++-----
>  migration/savevm.h             |   4 +-
>  monitor/hmp-cmds.c             |  28 +++++
>  monitor/qmp-cmds.c             |  16 +++
>  monitor/qmp.c                  |  42 ++++++++
>  qapi/migration.json            |  35 ++++++
>  qapi/pragma.json               |   1 +
>  qemu-options.hx                |   9 ++
>  scsi/qemu-pr-helper.c          |   2 +-
>  softmmu/vl.c                   |  65 ++++++++++-
>  tests/qtest/tpm-emu.c          |   2 +-
>  tests/test-char.c              |   2 +-
>  tests/test-io-channel-socket.c |   4 +-
>  trace-events                   |   2 +
>  ui/vnc.c                       | 153 +++++++++++++++++++++-----
>  util/Makefile.objs             |   2 +-
>  util/env.c                     | 132 +++++++++++++++++++++++
>  util/oslib-posix.c             |   9 ++
>  util/oslib-win32.c             |   4 +
>  55 files changed, 1331 insertions(+), 135 deletions(-)
>  create mode 100644 include/qemu/env.h
>  create mode 100644 util/env.c
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 30/32] vfio-pci: save and restore
  2020-07-30 15:14 ` [PATCH V1 30/32] vfio-pci: save and restore Steve Sistare
@ 2020-08-06 10:22   ` Jason Zeng
  2020-08-07 20:38     ` Steven Sistare
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Zeng @ 2020-08-06 10:22 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Juan Quintela, Markus Armbruster,
	Michael S. Tsirkin, qemu-devel, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	Alex Bennée

Hi Steve,

On Thu, Jul 30, 2020 at 08:14:34AM -0700, Steve Sistare wrote:
> @@ -3182,6 +3207,51 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static int vfio_pci_post_load(void *opaque, int version_id)
> +{
> +    int vector;
> +    MSIMessage msg;
> +    Error *err = 0;
> +    VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *pdev = &vdev->pdev;
> +
> +    if (msix_enabled(pdev)) {
> +        vfio_msix_enable(vdev);
> +        pdev->msix_function_masked = false;
> +
> +        for (vector = 0; vector < vdev->pdev.msix_entries_nr; vector++) {
> +            if (!msix_is_masked(pdev, vector)) {
> +                msg = msix_get_message(pdev, vector);
> +                vfio_msix_vector_use(pdev, vector, msg);
> +            }
> +        }

It looks to me MSIX re-init here may lose device IRQs and impact
device hardware state?

The re-init will cause the kernel vfio driver to connect the device
MSIX vectors to new eventfds and KVM instance. But before that, device
IRQs will be routed to previous eventfd. Looks these IRQs will be lost.

And the re-init will make the device go through the procedure of
disabling MSIX, enabling INTX, and re-enabling MSIX and vectors.
So if the device is active, its hardware state will be impacted?


Thanks,
Jason

> +
> +    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
> +        vfio_intx_enable(vdev, &err);
> +        if (err) {
> +            error_report_err(err);
> +        }
> +    }
> +
> +    vdev->vbasedev.group->container->reused = false;
> +    vdev->pdev.reused = false;
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vfio_pci_vmstate = {
> +    .name = "vfio-pci",
> +    .unmigratable = 1,
> +    .mode_mask = VMS_RESTART,
> +    .version_id = 0,
> +    .minimum_version_id = 0,
> +    .post_load = vfio_pci_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_MSIX(pdev, VFIOPCIDevice),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -3189,6 +3259,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>  
>      dc->reset = vfio_pci_reset;
>      device_class_set_props(dc, vfio_pci_dev_properties);
> +    dc->vmsd = &vfio_pci_vmstate;
>      dc->desc = "VFIO-based PCI device assignment";
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      pdc->realize = vfio_realize;
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index ac2cefc..e6e1a5d 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -592,7 +592,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
>              return -EBUSY;
>          }
>      }
> -    ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
> +    ret = vfio_get_device(group, vbasedev->name, vbasedev, 0, errp);
>      if (ret) {
>          vfio_put_group(group);
>          return ret;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index bd07c86..c926a24 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -358,6 +358,7 @@ struct PCIDevice {
>  
>      /* ID of standby device in net_failover pair */
>      char *failover_pair_id;
> +    bool reused;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index c78f3ff..4e2a332 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -73,6 +73,8 @@ typedef struct VFIOContainer {
>      unsigned iommu_type;
>      Error *error;
>      bool initialized;
> +    bool reused;
> +    int cid;
>      unsigned long pgsizes;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> @@ -177,7 +179,7 @@ void vfio_reset_handler(void *opaque);
>  VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
>  void vfio_put_group(VFIOGroup *group);
>  int vfio_get_device(VFIOGroup *group, const char *name,
> -                    VFIODevice *vbasedev, Error **errp);
> +                    VFIODevice *vbasedev, bool *reused, Error **errp);
>  
>  extern const MemoryRegionOps vfio_region_ops;
>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 881dc13..2606cf0 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1568,7 +1568,7 @@ static int qemu_savevm_state(QEMUFile *f, VMStateMode mode, Error **errp)
>          return -EINVAL;
>      }
>  
> -    if (migrate_use_block()) {
> +    if ((mode & (VMS_SNAPSHOT | VMS_MIGRATE)) && migrate_use_block()) {
>          error_setg(errp, "Block migration and snapshots are incompatible");
>          return -EINVAL;
>      }
> -- 
> 1.8.3.1
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 30/32] vfio-pci: save and restore
  2020-08-06 10:22   ` Jason Zeng
@ 2020-08-07 20:38     ` Steven Sistare
  2020-08-10  3:50       ` Jason Zeng
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Sistare @ 2020-08-07 20:38 UTC (permalink / raw)
  To: Jason Zeng
  Cc: Daniel P. Berrange, Juan Quintela, Markus Armbruster,
	Michael S. Tsirkin, qemu-devel, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	Alex Bennée

On 8/6/2020 6:22 AM, Jason Zeng wrote:
> Hi Steve,
> 
> On Thu, Jul 30, 2020 at 08:14:34AM -0700, Steve Sistare wrote:
>> @@ -3182,6 +3207,51 @@ static Property vfio_pci_dev_properties[] = {
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>>  
>> +static int vfio_pci_post_load(void *opaque, int version_id)
>> +{
>> +    int vector;
>> +    MSIMessage msg;
>> +    Error *err = 0;
>> +    VFIOPCIDevice *vdev = opaque;
>> +    PCIDevice *pdev = &vdev->pdev;
>> +
>> +    if (msix_enabled(pdev)) {
>> +        vfio_msix_enable(vdev);
>> +        pdev->msix_function_masked = false;
>> +
>> +        for (vector = 0; vector < vdev->pdev.msix_entries_nr; vector++) {
>> +            if (!msix_is_masked(pdev, vector)) {
>> +                msg = msix_get_message(pdev, vector);
>> +                vfio_msix_vector_use(pdev, vector, msg);
>> +            }
>> +        }
> 
> It looks to me MSIX re-init here may lose device IRQs and impact
> device hardware state?
> 
> The re-init will cause the kernel vfio driver to connect the device
> MSIX vectors to new eventfds and KVM instance. But before that, device
> IRQs will be routed to previous eventfd. Looks these IRQs will be lost.

Thanks Jason, that sounds like a problem.  I could try reading and saving an 
event from eventfd before shutdown, and injecting it into the eventfd after
restart, but that would be racy unless I disable interrupts.  Or, unconditionally
inject a spurious interrupt after restart to kick it, in case an interrupt 
was lost.

Do you have any other ideas?

> And the re-init will make the device go through the procedure of
> disabling MSIX, enabling INTX, and re-enabling MSIX and vectors.
> So if the device is active, its hardware state will be impacted?

Again thanks.  vfio_msix_enable() does indeed call vfio_disable_interrupts().
For a quick experiment, I deleted that call in for the post_load code path, and 
it seems to work fine, but I need to study it more.

- Steve
 
>> +
>> +    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
>> +        vfio_intx_enable(vdev, &err);
>> +        if (err) {
>> +            error_report_err(err);
>> +        }
>> +    }
>> +
>> +    vdev->vbasedev.group->container->reused = false;
>> +    vdev->pdev.reused = false;
>> +
>> +    return 0;
>> +}
>> +
>> +static const VMStateDescription vfio_pci_vmstate = {
>> +    .name = "vfio-pci",
>> +    .unmigratable = 1,
>> +    .mode_mask = VMS_RESTART,
>> +    .version_id = 0,
>> +    .minimum_version_id = 0,
>> +    .post_load = vfio_pci_post_load,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_MSIX(pdev, VFIOPCIDevice),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>>  {
>>      DeviceClass *dc = DEVICE_CLASS(klass);
>> @@ -3189,6 +3259,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>>  
>>      dc->reset = vfio_pci_reset;
>>      device_class_set_props(dc, vfio_pci_dev_properties);
>> +    dc->vmsd = &vfio_pci_vmstate;
>>      dc->desc = "VFIO-based PCI device assignment";
>>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>      pdc->realize = vfio_realize;
>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>> index ac2cefc..e6e1a5d 100644
>> --- a/hw/vfio/platform.c
>> +++ b/hw/vfio/platform.c
>> @@ -592,7 +592,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
>>              return -EBUSY;
>>          }
>>      }
>> -    ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
>> +    ret = vfio_get_device(group, vbasedev->name, vbasedev, 0, errp);
>>      if (ret) {
>>          vfio_put_group(group);
>>          return ret;
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index bd07c86..c926a24 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -358,6 +358,7 @@ struct PCIDevice {
>>  
>>      /* ID of standby device in net_failover pair */
>>      char *failover_pair_id;
>> +    bool reused;
>>  };
>>  
>>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index c78f3ff..4e2a332 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -73,6 +73,8 @@ typedef struct VFIOContainer {
>>      unsigned iommu_type;
>>      Error *error;
>>      bool initialized;
>> +    bool reused;
>> +    int cid;
>>      unsigned long pgsizes;
>>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>>      QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>> @@ -177,7 +179,7 @@ void vfio_reset_handler(void *opaque);
>>  VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
>>  void vfio_put_group(VFIOGroup *group);
>>  int vfio_get_device(VFIOGroup *group, const char *name,
>> -                    VFIODevice *vbasedev, Error **errp);
>> +                    VFIODevice *vbasedev, bool *reused, Error **errp);
>>  
>>  extern const MemoryRegionOps vfio_region_ops;
>>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 881dc13..2606cf0 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1568,7 +1568,7 @@ static int qemu_savevm_state(QEMUFile *f, VMStateMode mode, Error **errp)
>>          return -EINVAL;
>>      }
>>  
>> -    if (migrate_use_block()) {
>> +    if ((mode & (VMS_SNAPSHOT | VMS_MIGRATE)) && migrate_use_block()) {
>>          error_setg(errp, "Block migration and snapshots are incompatible");
>>          return -EINVAL;
>>      }
>> -- 
>> 1.8.3.1
>>
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH V1 30/32] vfio-pci: save and restore
  2020-08-07 20:38     ` Steven Sistare
@ 2020-08-10  3:50       ` Jason Zeng
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Zeng @ 2020-08-10  3:50 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Daniel P. Berrange, Juan Quintela, Markus Armbruster,
	Michael S. Tsirkin, qemu-devel, Dr. David Alan Gilbert,
	Alex Williamson, Paolo Bonzini, Stefan Hajnoczi,
	Marc-André Lureau, Jason Zeng, Philippe Mathieu-Daudé,
	Alex Bennée

On Fri, Aug 07, 2020 at 04:38:12PM -0400, Steven Sistare wrote:
> On 8/6/2020 6:22 AM, Jason Zeng wrote:
> > Hi Steve,
> > 
> > On Thu, Jul 30, 2020 at 08:14:34AM -0700, Steve Sistare wrote:
> >> @@ -3182,6 +3207,51 @@ static Property vfio_pci_dev_properties[] = {
> >>      DEFINE_PROP_END_OF_LIST(),
> >>  };
> >>  
> >> +static int vfio_pci_post_load(void *opaque, int version_id)
> >> +{
> >> +    int vector;
> >> +    MSIMessage msg;
> >> +    Error *err = 0;
> >> +    VFIOPCIDevice *vdev = opaque;
> >> +    PCIDevice *pdev = &vdev->pdev;
> >> +
> >> +    if (msix_enabled(pdev)) {
> >> +        vfio_msix_enable(vdev);
> >> +        pdev->msix_function_masked = false;
> >> +
> >> +        for (vector = 0; vector < vdev->pdev.msix_entries_nr; vector++) {
> >> +            if (!msix_is_masked(pdev, vector)) {
> >> +                msg = msix_get_message(pdev, vector);
> >> +                vfio_msix_vector_use(pdev, vector, msg);
> >> +            }
> >> +        }
> > 
> > It looks to me MSIX re-init here may lose device IRQs and impact
> > device hardware state?
> > 
> > The re-init will cause the kernel vfio driver to connect the device
> > MSIX vectors to new eventfds and KVM instance. But before that, device
> > IRQs will be routed to previous eventfd. Looks these IRQs will be lost.
> 
> Thanks Jason, that sounds like a problem.  I could try reading and saving an 
> event from eventfd before shutdown, and injecting it into the eventfd after
> restart, but that would be racy unless I disable interrupts.  Or, unconditionally
> inject a spurious interrupt after restart to kick it, in case an interrupt 
> was lost.
> 
> Do you have any other ideas?

Maybe we can consider to also hand over the eventfd file descriptor, or
even the KVM fds to the new Qemu?

If the KVM fds can be preserved, we will just need to restore Qemu KVM
side states. But not sure how complicated the implementation would be.

If we only preserve the eventfd fd, we can attach the old eventfd to
vfio devices. But looks it may turn out we always inject an interrupt
unconditionally, because kernel KVM irqfd eventfd handling is a bit
different than normal user land eventfd read/write. It doesn't decrease
the counter in the eventfd context. So if we read the eventfd from new
Qemu, it looks will always have a non-zero counter, which requires an
interrupt injection.

> 
> > And the re-init will make the device go through the procedure of
> > disabling MSIX, enabling INTX, and re-enabling MSIX and vectors.
> > So if the device is active, its hardware state will be impacted?
> 
> Again thanks.  vfio_msix_enable() does indeed call vfio_disable_interrupts().
> For a quick experiment, I deleted that call in for the post_load code path, and 
> it seems to work fine, but I need to study it more.

vfio_msix_vector_use() will also trigger this procedure in the kernel.

Looks we shouldn't trigger any kernel vfio actions here? Because we
preserve vfio fds, so its kernel state shouldn't be touched. Here we
may only need to restore Qemu states. Re-connect to KVM instance should
be done automatically when we setup the KVM irqfds with the same eventfd.

BTW, if I remember correctly, it is not enough to only save MSIX state
in the snapshot. We should also save the Qemu side pci config space
cache to the snapshot, because Qemu's copy is not exactly the same as
the kernel's copy. I encountered this before, but I don't remember which
field it was.

And another question, why don't we support MSI? I see the code only
handles MSIX?

Thanks,
Jason


> 
> - Steve
>  
> >> +
> >> +    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
> >> +        vfio_intx_enable(vdev, &err);
> >> +        if (err) {
> >> +            error_report_err(err);
> >> +        }
> >> +    }
> >> +
> >> +    vdev->vbasedev.group->container->reused = false;
> >> +    vdev->pdev.reused = false;
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static const VMStateDescription vfio_pci_vmstate = {
> >> +    .name = "vfio-pci",
> >> +    .unmigratable = 1,
> >> +    .mode_mask = VMS_RESTART,
> >> +    .version_id = 0,
> >> +    .minimum_version_id = 0,
> >> +    .post_load = vfio_pci_post_load,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_MSIX(pdev, VFIOPCIDevice),
> >> +        VMSTATE_END_OF_LIST()
> >> +    }
> >> +};
> >> +
> >>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> >>  {
> >>      DeviceClass *dc = DEVICE_CLASS(klass);
> >> @@ -3189,6 +3259,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> >>  
> >>      dc->reset = vfio_pci_reset;
> >>      device_class_set_props(dc, vfio_pci_dev_properties);
> >> +    dc->vmsd = &vfio_pci_vmstate;
> >>      dc->desc = "VFIO-based PCI device assignment";
> >>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >>      pdc->realize = vfio_realize;
> >> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> >> index ac2cefc..e6e1a5d 100644
> >> --- a/hw/vfio/platform.c
> >> +++ b/hw/vfio/platform.c
> >> @@ -592,7 +592,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
> >>              return -EBUSY;
> >>          }
> >>      }
> >> -    ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
> >> +    ret = vfio_get_device(group, vbasedev->name, vbasedev, 0, errp);
> >>      if (ret) {
> >>          vfio_put_group(group);
> >>          return ret;
> >> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >> index bd07c86..c926a24 100644
> >> --- a/include/hw/pci/pci.h
> >> +++ b/include/hw/pci/pci.h
> >> @@ -358,6 +358,7 @@ struct PCIDevice {
> >>  
> >>      /* ID of standby device in net_failover pair */
> >>      char *failover_pair_id;
> >> +    bool reused;
> >>  };
> >>  
> >>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index c78f3ff..4e2a332 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -73,6 +73,8 @@ typedef struct VFIOContainer {
> >>      unsigned iommu_type;
> >>      Error *error;
> >>      bool initialized;
> >> +    bool reused;
> >> +    int cid;
> >>      unsigned long pgsizes;
> >>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> >>      QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> >> @@ -177,7 +179,7 @@ void vfio_reset_handler(void *opaque);
> >>  VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
> >>  void vfio_put_group(VFIOGroup *group);
> >>  int vfio_get_device(VFIOGroup *group, const char *name,
> >> -                    VFIODevice *vbasedev, Error **errp);
> >> +                    VFIODevice *vbasedev, bool *reused, Error **errp);
> >>  
> >>  extern const MemoryRegionOps vfio_region_ops;
> >>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
> >> diff --git a/migration/savevm.c b/migration/savevm.c
> >> index 881dc13..2606cf0 100644
> >> --- a/migration/savevm.c
> >> +++ b/migration/savevm.c
> >> @@ -1568,7 +1568,7 @@ static int qemu_savevm_state(QEMUFile *f, VMStateMode mode, Error **errp)
> >>          return -EINVAL;
> >>      }
> >>  
> >> -    if (migrate_use_block()) {
> >> +    if ((mode & (VMS_SNAPSHOT | VMS_MIGRATE)) && migrate_use_block()) {
> >>          error_setg(errp, "Block migration and snapshots are incompatible");
> >>          return -EINVAL;
> >>      }
> >> -- 
> >> 1.8.3.1
> >>
> >>


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, back to index

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
2020-07-30 15:14 ` [PATCH V1 01/32] savevm: add vmstate handler iterators Steve Sistare
2020-07-30 15:14 ` [PATCH V1 02/32] savevm: VM handlers mode mask Steve Sistare
2020-07-30 15:14 ` [PATCH V1 03/32] savevm: QMP command for cprsave Steve Sistare
2020-07-30 16:12   ` Eric Blake
2020-07-30 17:52     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 04/32] savevm: HMP Command " Steve Sistare
2020-07-30 15:14 ` [PATCH V1 05/32] savevm: QMP command for cprload Steve Sistare
2020-07-30 16:14   ` Eric Blake
2020-07-30 18:00     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 06/32] savevm: HMP Command " Steve Sistare
2020-07-30 15:14 ` [PATCH V1 07/32] savevm: QMP command for cprinfo Steve Sistare
2020-07-30 16:17   ` Eric Blake
2020-07-30 18:02     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 08/32] savevm: HMP " Steve Sistare
2020-07-30 15:14 ` [PATCH V1 09/32] savevm: prevent cprsave if memory is volatile Steve Sistare
2020-07-30 15:14 ` [PATCH V1 10/32] kvmclock: restore paused KVM clock Steve Sistare
2020-07-30 15:14 ` [PATCH V1 11/32] cpu: disable ticks when suspended Steve Sistare
2020-07-30 15:14 ` [PATCH V1 12/32] vl: pause option Steve Sistare
2020-07-30 16:20   ` Eric Blake
2020-07-30 18:11     ` Steven Sistare
2020-07-31 10:07       ` Daniel P. Berrangé
2020-07-31 15:18         ` Steven Sistare
2020-07-30 17:03   ` Alex Bennée
2020-07-30 18:14     ` Steven Sistare
2020-07-31  9:44       ` Alex Bennée
2020-07-30 15:14 ` [PATCH V1 13/32] gdbstub: gdb support for suspended state Steve Sistare
2020-07-30 15:14 ` [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart Steve Sistare
2020-07-30 16:22   ` Eric Blake
2020-07-30 18:14     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 15/32] vl: QEMU_START_FREEZE env var Steve Sistare
2020-07-30 15:14 ` [PATCH V1 16/32] oslib: add qemu_clr_cloexec Steve Sistare
2020-07-30 15:14 ` [PATCH V1 17/32] util: env var helpers Steve Sistare
2020-07-30 15:14 ` [PATCH V1 18/32] osdep: import MADV_DOEXEC Steve Sistare
2020-07-30 15:14 ` [PATCH V1 19/32] memory: ram_block_add cosmetic changes Steve Sistare
2020-07-30 15:14 ` [PATCH V1 20/32] vl: add helper to request re-exec Steve Sistare
2020-07-30 15:14 ` [PATCH V1 21/32] exec, memory: exec(3) to restart Steve Sistare
2020-07-30 15:14 ` [PATCH V1 22/32] char: qio_channel_socket_accept reuse fd Steve Sistare
2020-07-30 15:14 ` [PATCH V1 23/32] char: save/restore chardev socket fds Steve Sistare
2020-07-30 15:14 ` [PATCH V1 24/32] ui: save/restore vnc " Steve Sistare
2020-07-31  9:06   ` Daniel P. Berrangé
2020-07-31 16:51     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 25/32] char: save/restore chardev pty fds Steve Sistare
2020-07-30 15:14 ` [PATCH V1 26/32] monitor: save/restore QMP negotiation status Steve Sistare
2020-07-30 15:14 ` [PATCH V1 27/32] vhost: reset vhost devices upon cprsave Steve Sistare
2020-07-30 15:14 ` [PATCH V1 28/32] char: restore terminal on restart Steve Sistare
2020-07-30 15:14 ` [PATCH V1 29/32] pci: export pci_update_mappings Steve Sistare
2020-07-30 15:14 ` [PATCH V1 30/32] vfio-pci: save and restore Steve Sistare
2020-08-06 10:22   ` Jason Zeng
2020-08-07 20:38     ` Steven Sistare
2020-08-10  3:50       ` Jason Zeng
2020-07-30 15:14 ` [PATCH V1 31/32] vfio-pci: trace pci config Steve Sistare
2020-07-30 15:14 ` [PATCH V1 32/32] vfio-pci: improved tracing Steve Sistare
2020-07-30 16:52 ` [PATCH V1 00/32] Live Update Daniel P. Berrangé
2020-07-30 18:48   ` Steven Sistare
2020-07-31  8:53     ` Daniel P. Berrangé
2020-07-31 15:27       ` Steven Sistare
2020-07-31 15:52         ` Daniel P. Berrangé
2020-07-31 17:20           ` Steven Sistare
2020-07-30 17:15 ` Paolo Bonzini
2020-07-30 19:09   ` Steven Sistare
2020-07-30 21:39     ` Paolo Bonzini
2020-07-31 19:22       ` Steven Sistare
2020-07-30 17:49 ` Dr. David Alan Gilbert
2020-07-30 19:31   ` Steven Sistare
2020-08-04 18:18 ` Steven Sistare

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git