All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 00/25] Live Update
@ 2021-07-06 16:45 Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 01/25] qemu_ram_volatile Steve Sistare
                   ` (25 more replies)
  0 siblings, 26 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave, cprexec, and cprload commands for live update.  These
save and restore VM state, with minimal guest pause time, so that qemu may
be updated to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports any
type of guest image and block device, but the caller must not modify guest
block devices between cprsave and cprload.  It supports two modes: reboot
and restart.

In reboot mode, the caller invokes cprsave and then terminates qemu.
The caller may then update the host kernel and system software and reboot.
The caller resumes the guest by running qemu with the same arguments as the
original process and invoking cprload.  To use this mode, guest ram must be
mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
PKRAM as proposed in https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com.

The reboot mode supports vfio devices if the caller first suspends the
guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.

Restart mode preserves the guest VM across a restart of the qemu process.
After cprsave, the caller passes qemu command-line arguments to cprexec,
which directly exec's the new qemu binary.  The arguments must include -S
so new qemu starts in a paused state and waits for the cprload command.
The restart mode supports vfio devices by preserving the vfio container,
group, device, and event descriptors across the qemu re-exec, and by
updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and
VFIO_DMA_MAP_FLAG_VADDR as defined in https://lore.kernel.org/kvm/1611939252-7240-1-git-send-email-steven.sistare@oracle.com/
and integrated in Linux kernel 5.12.

To use the restart mode, qemu must be started with the memfd-alloc option,
which allocates guest ram using memfd_create.  The memfd's are saved to
the environment and kept open across exec, after which they are found from
the environment and re-mmap'd.  Hence guest ram is preserved in place,
albeit with new virtual addresses in the qemu process.

The caller resumes the guest by invoking cprload, which loads state from
the file.  If the VM was running at cprsave time, then VM execution resumes.
If the VM was suspended at cprsave time (reboot mode), then the caller must
issue a system_wakeup command to resume.

The first patches add reboot mode:
  - qemu_ram_volatile
  - cpr: reboot mode
  - cpr: QMP interfaces for reboot
  - cpr: HMP interfaces for reboot

The next patches add restart mode:
  - as_flat_walk
  - oslib: qemu_clr_cloexec
  - machine: memfd-alloc option
  - vl: add helper to request re-exec
  - string to strList
  - util: env var helpers
  - cpr: restart mode
  - cpr: QMP interfaces for restart
  - cpr: HMP interfaces for restart

The next patches add vfio support for restart mode:
  - pci: export functions for cpr
  - vfio-pci: refactor for cpr
  - vfio-pci: cpr part 1
  - vfio-pci: cpr part 2

The next patches preserve various descriptor-based backend devices across
cprexec:
  - vhost: reset vhost devices upon cprsave
  - hostmem-memfd: cpr support
  - chardev: cpr framework
  - chardev: cpr for simple devices
  - chardev: cpr for pty
  - chardev: cpr for sockets
  - cpr: only-cpr-capable option
  - simplify savevm

Here is an example of updating qemu from v4.2.0 to v4.2.1 using
restart mode.  The software update is performed while the guest is
running to minimize downtime.

window 1                                        | window 2
                                                |
# qemu-system-x86_64 ...                        |
QEMU 4.2.0 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: running                              |
                                                | # yum update qemu
(qemu) cprsave /tmp/qemu.sav restart            |
(qemu) cprexec qemu-system-x86_64 -S ...        |
QEMU 4.2.1 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: paused (prelaunch)                   |
(qemu) cprload /tmp/qemu.sav                    |
(qemu) info status                              |
VM status: running                              |


Here is an example of updating the host kernel using reboot mode.

window 1                                        | window 2
                                                |
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: running                              |
                                                | # yum update kernel-uek
(qemu) cprsave /tmp/qemu.sav restart            |
(qemu) quit                                     |
                                                |
# systemctl kexec                               |
kexec_core: Starting new kernel                 |
...                                             |
                                                |
# qemu-system-x86_64 -S mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: paused (prelaunch)                   |
(qemu) cprload /tmp/qemu.sav                    |
(qemu) info status                              |
VM status: running                              |

Changes from V1 to V2:
  - revert vmstate infrastructure changes
  - refactor cpr functions into new files
  - delete MADV_DOEXEC and use memfd + VFIO_DMA_UNMAP_FLAG_SUSPEND to
    preserve memory.
  - add framework to filter chardev's that support cpr
  - save and restore vfio eventfd's
  - modify cprinfo QMP interface
  - incorporate misc review feedback
  - remove unrelated and unneeded patches
  - refactor all patches into a shorter and easier to review series

Changes from V2 to V3:
  - rebase to qemu 6.0.0
  - use final definition of vfio ioctls (VFIO_DMA_UNMAP_FLAG_VADDR etc)
  - change memfd-alloc to a machine option
  - Use qio_channel_socket_new_fd instead of adding qio_channel_socket_new_fd
  - close monitor socket during cpr
  - fix a few unreported bugs
  - support memory-backend-memfd

Changes from V3 to V4:
  - split reboot mode into separate patches
  - add cprexec command
  - delete QEMU_START_FREEZE, argv_main, and /usr/bin/qemu-exec
  - add more checks for vfio and cpr compatibility, and recover after errors
  - save vfio pci config in vmstate
  - rename {setenv,getenv}_event_fd to {save,load}_event_fd
  - use qemu_strtol
  - change 6.0 references to 6.1
  - use strerror(), use EXIT_FAILURE, remove period from error messages
  - distribute MAINTAINERS additions to each patch

Steve Sistare (21):
  qemu_ram_volatile
  cpr: reboot mode
  as_flat_walk
  oslib: qemu_clr_cloexec
  machine: memfd-alloc option
  vl: add helper to request re-exec
  string to strList
  util: env var helpers
  cpr: restart mode
  cpr: QMP interfaces for restart
  cpr: HMP interfaces for restart
  pci: export functions for cpr
  vfio-pci: refactor for cpr
  vfio-pci: cpr part 1
  vfio-pci: cpr part 2
  hostmem-memfd: cpr support
  chardev: cpr framework
  chardev: cpr for simple devices
  chardev: cpr for pty
  cpr: only-cpr-capable option
  simplify savevm

Mark Kanda, Steve Sistare (4):
  cpr: QMP interfaces for reboot
  cpr: HMP interfaces for reboot
  vhost: reset vhost devices upon cprsave
  chardev: cpr for sockets

 MAINTAINERS                   |  12 +++
 backends/hostmem-memfd.c      |  21 ++--
 chardev/char-mux.c            |   1 +
 chardev/char-null.c           |   1 +
 chardev/char-pty.c            |  15 ++-
 chardev/char-serial.c         |   1 +
 chardev/char-socket.c         |  35 +++++++
 chardev/char-stdio.c          |   8 ++
 chardev/char.c                |  41 +++++++-
 gdbstub.c                     |   1 +
 hmp-commands.hx               |  62 ++++++++++++
 hw/core/machine.c             |  19 ++++
 hw/pci/msix.c                 |  20 ++--
 hw/pci/pci.c                  |   7 +-
 hw/vfio/common.c              |  78 ++++++++++++--
 hw/vfio/cpr.c                 | 154 ++++++++++++++++++++++++++++
 hw/vfio/meson.build           |   1 +
 hw/vfio/pci.c                 | 230 +++++++++++++++++++++++++++++++++++++++---
 hw/vfio/trace-events          |   1 +
 hw/virtio/vhost.c             |  11 ++
 include/chardev/char.h        |   6 ++
 include/exec/memory.h         |  25 +++++
 include/hw/boards.h           |   1 +
 include/hw/pci/msix.h         |   5 +
 include/hw/pci/pci.h          |   2 +
 include/hw/vfio/vfio-common.h |   8 ++
 include/hw/virtio/vhost.h     |   1 +
 include/migration/cpr.h       |  20 ++++
 include/monitor/hmp.h         |   4 +
 include/qemu/env.h            |  23 +++++
 include/qemu/osdep.h          |   1 +
 include/sysemu/runstate.h     |   2 +
 include/sysemu/sysemu.h       |   1 +
 linux-headers/linux/vfio.h    |  27 +++++
 migration/cpr.c               | 195 +++++++++++++++++++++++++++++++++++
 migration/meson.build         |   1 +
 migration/migration.c         |   5 +
 migration/savevm.c            |  21 ++--
 migration/savevm.h            |   2 +
 monitor/hmp-cmds.c            |  75 ++++++++++++--
 monitor/hmp.c                 |   3 +
 monitor/qmp-cmds.c            |  36 +++++++
 monitor/qmp.c                 |   3 +
 qapi/char.json                |   5 +-
 qapi/cpr.json                 |  88 ++++++++++++++++
 qapi/meson.build              |   1 +
 qapi/qapi-schema.json         |   1 +
 qemu-options.hx               |  39 ++++++-
 softmmu/globals.c             |   1 +
 softmmu/memory.c              |  48 +++++++++
 softmmu/physmem.c             |  49 +++++++--
 softmmu/runstate.c            |  58 ++++++++++-
 softmmu/vl.c                  |  14 ++-
 stubs/cpr.c                   |   3 +
 stubs/meson.build             |   1 +
 trace-events                  |   1 +
 util/env.c                    |  95 +++++++++++++++++
 util/meson.build              |   1 +
 util/oslib-posix.c            |   9 ++
 util/oslib-win32.c            |   4 +
 util/qemu-config.c            |   4 +
 61 files changed, 1525 insertions(+), 83 deletions(-)
 create mode 100644 hw/vfio/cpr.c
 create mode 100644 include/migration/cpr.h
 create mode 100644 include/qemu/env.h
 create mode 100644 migration/cpr.c
 create mode 100644 qapi/cpr.json
 create mode 100644 stubs/cpr.c
 create mode 100644 util/env.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V4 01/25] qemu_ram_volatile
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 02/25] cpr: reboot mode Steve Sistare
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a function that returns true if any ram_list block represents
volatile memory.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/exec/memory.h |  8 ++++++++
 softmmu/memory.c      | 30 ++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5728a68..e8b71d8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2629,6 +2629,14 @@ bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+/**
+ * qemu_ram_volatile: return true if any memory regions are writable and not
+ * backed by shared memory.
+ *
+ * @errp: returned error message identifying the bad region.
+ */
+bool qemu_ram_volatile(Error **errp);
+
 #endif
 
 #endif
diff --git a/softmmu/memory.c b/softmmu/memory.c
index d4493ef..7197559 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2707,6 +2707,36 @@ void memory_global_dirty_log_stop(void)
     memory_global_dirty_log_do_stop();
 }
 
+/*
+ * Return true if any memory regions are writable and not backed by shared
+ * memory.
+ */
+bool qemu_ram_volatile(Error **errp)
+{
+    RAMBlock *block;
+    MemoryRegion *mr;
+    bool ret = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        mr = block->mr;
+        if (mr &&
+            memory_region_is_ram(mr) &&
+            !memory_region_is_ram_device(mr) &&
+            !memory_region_is_rom(mr) &&
+            (block->fd == -1 || !qemu_ram_is_shared(block))) {
+
+            error_setg(errp, "Memory region %s is volatile",
+                       memory_region_name(mr));
+            ret = true;
+            break;
+        }
+    }
+
+    rcu_read_unlock();
+    return ret;
+}
+
 static void listener_add_address_space(MemoryListener *listener,
                                        AddressSpace *as)
 {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 02/25] cpr: reboot mode
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 01/25] qemu_ram_volatile Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 03/25] cpr: QMP interfaces for reboot Steve Sistare
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave and cprload functions for live update.  These save and
restore VM state, with minimal guest pause time, so that qemu may be updated
to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports any
type of guest image and block device, but the caller must not modify guest
block devices between cprsave and cprload.

cprsave supports several modes, the first of which is reboot.  In this mode,
the caller invokes cprsave and then terminates qemu.  The caller may then
update the host kernel and system software and reboot.  The caller resumes
the guest by running qemu with the same arguments as the original process
and invoking cprload.  To use this mode, guest ram must be mapped to a
persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.

The reboot mode supports vfio devices if the caller first suspends the
guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.

cprload loads state from the file.  If the VM was running at cprsave time,
then VM execution resumes.  If the VM was suspended at cprsave time, then
the caller must issue a system_wakeup command to resume.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS               |   7 +++
 include/migration/cpr.h   |  17 ++++++
 include/sysemu/runstate.h |   1 +
 migration/cpr.c           | 149 ++++++++++++++++++++++++++++++++++++++++++++++
 migration/meson.build     |   1 +
 migration/savevm.h        |   2 +
 softmmu/runstate.c        |  21 ++++++-
 7 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 36055f1..d337446 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2843,6 +2843,13 @@ F: net/colo*
 F: net/filter-rewriter.c
 F: net/filter-mirror.c
 
+CPR
+M: Steve Sistare <steven.sistare@oracle.com>
+M: Mark Kanda <mark.kanda@oracle.com>
+S: Maintained
+F: include/migration/cpr.h
+F: migration/cpr.c
+
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
 R: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 0000000..bffee19
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+#include "qapi/qapi-types-cpr.h"
+
+void cprsave(const char *file, CprMode mode, Error **errp);
+void cprexec(strList *args, Error **errp);
+void cprload(const char *file, Error **errp);
+
+#endif
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index a535691..ed4b735 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -51,6 +51,7 @@ void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
 bool qemu_wakeup_suspend_enabled(void);
+void qemu_system_start_on_wake_request(void);
 void qemu_system_wakeup_request(WakeupReason reason, Error **errp);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 0000000..c5bad8a
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "monitor/monitor.h"
+#include "migration.h"
+#include "migration/snapshot.h"
+#include "chardev/char.h"
+#include "migration/misc.h"
+#include "migration/cpr.h"
+#include "migration/global_state.h"
+#include "qemu-file-channel.h"
+#include "qemu-file.h"
+#include "savevm.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "io/channel-buffer.h"
+#include "io/channel-file.h"
+#include "sysemu/cpu-timers.h"
+#include "sysemu/runstate.h"
+#include "sysemu/runstate-action.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
+#include "sysemu/xen.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/virtio/vhost.h"
+
+QEMUFile *qf_file_open(const char *path, int flags, int mode,
+                              const char *name, Error **errp)
+{
+    QIOChannelFile *fioc;
+    QIOChannel *ioc;
+    QEMUFile *f;
+
+    if (flags & O_RDWR) {
+        error_setg(errp, "qf_file_open %s: O_RDWR not supported", path);
+        return 0;
+    }
+
+    fioc = qio_channel_file_new_path(path, flags, mode, errp);
+    if (!fioc) {
+        return 0;
+    }
+
+    ioc = QIO_CHANNEL(fioc);
+    qio_channel_set_name(ioc, name);
+    f = (flags & O_WRONLY) ? qemu_fopen_channel_output(ioc) :
+                             qemu_fopen_channel_input(ioc);
+    object_unref(OBJECT(fioc));
+    return f;
+}
+
+void cprsave(const char *file, CprMode mode, Error **errp)
+{
+    int ret;
+    QEMUFile *f;
+    int saved_vm_running = runstate_is_running();
+
+    if (mode == CPR_MODE_REBOOT && qemu_ram_volatile(errp)) {
+        return;
+    }
+
+    if (migrate_colo_enabled()) {
+        error_setg(errp, "error: cprsave does not support x-colo");
+        return;
+    }
+
+    if (replay_mode != REPLAY_MODE_NONE) {
+        error_setg(errp, "error: cprsave does not support replay");
+        return;
+    }
+
+    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, "cprsave", errp);
+    if (!f) {
+        return;
+    }
+
+    if (global_state_store()) {
+        error_setg(errp, "Error saving global state");
+        qemu_fclose(f);
+        return;
+    }
+    if (runstate_check(RUN_STATE_SUSPENDED)) {
+        /* Update timers_state before saving.  Suspend did not so do. */
+        cpu_disable_ticks();
+    }
+    vm_stop(RUN_STATE_SAVE_VM);
+
+    ret = qemu_save_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while saving VM state", ret);
+        goto err;
+    }
+
+    goto done;
+
+err:
+    if (saved_vm_running) {
+        vm_start();
+    }
+done:
+    return;
+}
+
+void cprload(const char *file, Error **errp)
+{
+    QEMUFile *f;
+    int ret;
+    RunState state;
+
+    if (runstate_is_running()) {
+        error_setg(errp, "cprload called for a running VM");
+        return;
+    }
+
+    f = qf_file_open(file, O_RDONLY, 0, "cprload", errp);
+    if (!f) {
+        return;
+    }
+
+    if (qemu_get_be32(f) != QEMU_VM_FILE_MAGIC ||
+        qemu_get_be32(f) != QEMU_VM_FILE_VERSION) {
+        error_setg(errp, "error: %s is not a vmstate file", file);
+        return;
+    }
+
+    ret = qemu_load_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while loading VM state", ret);
+        return;
+    }
+
+    state = global_state_get_runstate();
+    if (state == RUN_STATE_RUNNING) {
+        vm_start();
+    } else {
+        runstate_set(state);
+        if (runstate_check(RUN_STATE_SUSPENDED)) {
+            qemu_system_start_on_wake_request();
+        }
+    }
+}
diff --git a/migration/meson.build b/migration/meson.build
index 3ecedce..c756374 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -15,6 +15,7 @@ softmmu_ss.add(files(
   'channel.c',
   'colo-failover.c',
   'colo.c',
+  'cpr.c',
   'exec.c',
   'fd.c',
   'global_state.c',
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342..ce5d710 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy, bool inactivate_disks);
+QEMUFile *qf_file_open(const char *path, int flags, int mode,
+                       const char *name, Error **errp);
 
 #endif
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index ce8977c..ccb35cb 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -115,6 +115,8 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
     { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
@@ -334,6 +336,7 @@ void vm_state_notify(bool running, RunState state)
     }
 }
 
+static bool start_on_wake_requested;
 static ShutdownCause reset_requested;
 static ShutdownCause shutdown_requested;
 static int shutdown_signal;
@@ -561,6 +564,11 @@ void qemu_register_suspend_notifier(Notifier *notifier)
     notifier_list_add(&suspend_notifiers, notifier);
 }
 
+void qemu_system_start_on_wake_request(void)
+{
+    start_on_wake_requested = true;
+}
+
 void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
 {
     trace_system_wakeup_request(reason);
@@ -573,7 +581,18 @@ void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
     if (!(wakeup_reason_mask & (1 << reason))) {
         return;
     }
-    runstate_set(RUN_STATE_RUNNING);
+
+    /*
+     * Must call vm_start if it has never been called, to invoke the state
+     * change callbacks for the first time.
+     */
+    if (start_on_wake_requested) {
+        start_on_wake_requested = false;
+        vm_start();
+    } else {
+        runstate_set(RUN_STATE_RUNNING);
+    }
+
     wakeup_reason = reason;
     qemu_notify_event();
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 03/25] cpr: QMP interfaces for reboot
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 01/25] qemu_ram_volatile Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 02/25] cpr: reboot mode Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 04/25] cpr: HMP " Steve Sistare
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave calls cprsave().  Syntax:
  { 'enum': 'CprMode', 'data': [ 'reboot' ] }
  { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }

cprload calls cprload().  Syntax:
  { 'command': 'cprload', 'data': { 'file': 'str' } }

cprinfo returns a list of supported modes.  Syntax:
  { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
  { 'command': 'cprinfo', 'returns': 'CprInfo' }

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS           |  1 +
 monitor/qmp-cmds.c    | 31 +++++++++++++++++++++
 qapi/cpr.json         | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/meson.build      |  1 +
 qapi/qapi-schema.json |  1 +
 5 files changed, 108 insertions(+)
 create mode 100644 qapi/cpr.json

diff --git a/MAINTAINERS b/MAINTAINERS
index d337446..d4d1e9d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2849,6 +2849,7 @@ M: Mark Kanda <mark.kanda@oracle.com>
 S: Maintained
 F: include/migration/cpr.h
 F: migration/cpr.c
+F: qapi/cpr.json
 
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index f7d64a6..1128604 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -37,9 +37,11 @@
 #include "qapi/qapi-commands-machine.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-commands-ui.h"
+#include "qapi/qapi-commands-cpr.h"
 #include "qapi/qmp/qerror.h"
 #include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "migration/cpr.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -153,6 +155,35 @@ void qmp_cont(Error **errp)
     }
 }
 
+CprInfo *qmp_cprinfo(Error **errp)
+{
+    CprInfo *cprinfo;
+    CprModeList *mode, *mode_list = NULL;
+    CprMode i;
+
+    cprinfo = g_malloc0(sizeof(*cprinfo));
+
+    for (i = 0; i < CPR_MODE__MAX; i++) {
+        mode = g_malloc0(sizeof(*mode));
+        mode->value = i;
+        mode->next = mode_list;
+        mode_list = mode;
+    }
+
+    cprinfo->modes = mode_list;
+    return cprinfo;
+}
+
+void qmp_cprsave(const char *file, CprMode mode, Error **errp)
+{
+    cprsave(file, mode, errp);
+}
+
+void qmp_cprload(const char *file, Error **errp)
+{
+    cprload(file, errp);
+}
+
 void qmp_system_wakeup(Error **errp)
 {
     if (!qemu_wakeup_suspend_enabled()) {
diff --git a/qapi/cpr.json b/qapi/cpr.json
new file mode 100644
index 0000000..b6fdc89
--- /dev/null
+++ b/qapi/cpr.json
@@ -0,0 +1,74 @@
+# -*- Mode: Python -*-
+#
+# Copyright (c) 2021 Oracle and/or its affiliates.
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+##
+# = CPR
+##
+
+{ 'include': 'common.json' }
+
+##
+# @CprMode:
+#
+# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
+#
+# Since: 6.1
+##
+{ 'enum': 'CprMode',
+  'data': [ 'reboot' ] }
+
+
+##
+# @CprInfo:
+#
+# @modes: @CprMode list
+#
+# Since: 6.1
+##
+{ 'struct': 'CprInfo',
+  'data': { 'modes': [ 'CprMode' ] } }
+
+##
+# @cprinfo:
+#
+# Returns the modes supported by @cprsave.
+#
+# Returns: @CprInfo
+#
+# Since: 6.1
+#
+##
+{ 'command': 'cprinfo',
+  'returns': 'CprInfo' }
+
+##
+# @cprsave:
+#
+# Create a checkpoint of the virtual machine device state in @file.
+# Guest RAM and guest block device blocks are not saved.
+#
+# @file: name of checkpoint file
+# @mode: @CprMode mode
+#
+# Since: 6.1
+##
+{ 'command': 'cprsave',
+  'data': { 'file': 'str',
+            'mode': 'CprMode' } }
+
+##
+# @cprload:
+#
+# Start virtual machine from checkpoint file that was created earlier using
+# the cprsave command.
+#
+# @file: name of checkpoint file
+#
+# Since: 6.1
+##
+{ 'command': 'cprload',
+  'data': { 'file': 'str' } }
diff --git a/qapi/meson.build b/qapi/meson.build
index 376f4ce..7e7c48a 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -26,6 +26,7 @@ qapi_all_modules = [
   'common',
   'compat',
   'control',
+  'cpr',
   'crypto',
   'dump',
   'error',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 4912b97..001d790 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -77,6 +77,7 @@
 { 'include': 'ui.json' }
 { 'include': 'authz.json' }
 { 'include': 'migration.json' }
+{ 'include': 'cpr.json' }
 { 'include': 'transaction.json' }
 { 'include': 'trace.json' }
 { 'include': 'compat.json' }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 04/25] cpr: HMP interfaces for reboot
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (2 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 03/25] cpr: QMP interfaces for reboot Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 05/25] as_flat_walk Steve Sistare
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave <file> <mode>
  Call cprsave().
  Arguments:
    file : save vmstate to this file name
    mode: must be "reboot"

cprload <file>
  Call cprload().
  Arguments:
    file : load vmstate from this file name

cprinfo
  Print to stdout a space-delimited list of modes supported by cprsave.
  Arguments: none

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/monitor/hmp.h |  3 +++
 monitor/hmp-cmds.c    | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 435c591..015faae 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -351,6 +351,50 @@ SRST
 ERST
 
     {
+        .name       = "cprinfo",
+        .args_type  = "",
+        .params     = "",
+        .help       = "return list of modes supported by cprsave",
+        .cmd        = hmp_cprinfo,
+    },
+
+SRST
+``cprinfo``
+Return a space-delimited list of modes supported by cprsave.
+ERST
+
+    {
+        .name       = "cprsave",
+        .args_type  = "file:s,mode:s",
+        .params     = "file 'reboot'",
+        .help       = "create a checkpoint of the VM in file",
+        .cmd        = hmp_cprsave,
+    },
+
+SRST
+``cprsave`` *file* *mode*
+Pause the VCPUs,
+create a checkpoint of the whole virtual machine, and save it in *file*.
+If *mode* is 'reboot', the checkpoint remains valid after a host kexec
+reboot, and guest ram must be backed by persistant shared memory.  To
+resume from the checkpoint, issue the quit command, reboot the system,
+and issue the cprload command.
+ERST
+
+    {
+        .name       = "cprload",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "load VM checkpoint from file",
+        .cmd        = hmp_cprload,
+    },
+
+SRST
+``cprload`` *file*
+Load a virtual machine from checkpoint file *file* and continue VCPUs.
+ERST
+
+    {
         .name       = "delvm",
         .args_type  = "name:s",
         .params     = "tag",
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 605d572..e4ebdf1 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -58,6 +58,9 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_loadvm(Monitor *mon, const QDict *qdict);
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
+void hmp_cprinfo(Monitor *mon, const QDict *qdict);
+void hmp_cprsave(Monitor *mon, const QDict *qdict);
+void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 0ad5b77..e115a23 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -33,6 +33,7 @@
 #include "qapi/qapi-commands-block.h"
 #include "qapi/qapi-commands-char.h"
 #include "qapi/qapi-commands-control.h"
+#include "qapi/qapi-commands-cpr.h"
 #include "qapi/qapi-commands-machine.h"
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-commands-misc.h"
@@ -1173,6 +1174,53 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
     qapi_free_AnnounceParameters(params);
 }
 
+void hmp_cprinfo(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    CprInfo *cprinfo;
+    CprModeList *mode;
+
+    cprinfo = qmp_cprinfo(&err);
+    if (err) {
+        goto out;
+    }
+
+    for (mode = cprinfo->modes; mode; mode = mode->next) {
+        monitor_printf(mon, "%s ", CprMode_str(mode->value));
+    }
+
+out:
+    hmp_handle_error(mon, err);
+    qapi_free_CprInfo(cprinfo);
+}
+
+void hmp_cprsave(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *mode;
+    int val;
+
+    mode = qdict_get_try_str(qdict, "mode");
+    val = qapi_enum_parse(&CprMode_lookup, mode, -1, &err);
+
+    if (val == -1) {
+        goto out;
+    }
+
+    qmp_cprsave(qdict_get_try_str(qdict, "file"), val, &err);
+
+out:
+    hmp_handle_error(mon, err);
+}
+
+void hmp_cprload(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_cprload(qdict_get_try_str(qdict, "file"), &err);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 05/25] as_flat_walk
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (3 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 04/25] cpr: HMP " Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 06/25] oslib: qemu_clr_cloexec Steve Sistare
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add an iterator over the sections of a flattened address space.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/exec/memory.h | 17 +++++++++++++++++
 softmmu/memory.c      | 18 ++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e8b71d8..d87c059 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2003,6 +2003,23 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
  */
 bool memory_region_is_mapped(MemoryRegion *mr);
 
+typedef int (*qemu_flat_walk_cb)(MemoryRegionSection *s,
+                                 void *handle,
+                                 Error **errp);
+
+/**
+ * as_flat_walk: walk the ranges in the address space flat view and call @func
+ * for each.  Return 0 on success, else return non-zero with a message in
+ * @errp.
+ *
+ * @as: target address space
+ * @func: callback function
+ * @handle: passed to @func
+ * @errp: passed to @func
+ */
+int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
+                 void *handle, Error **errp);
+
 /**
  * memory_region_find: translate an address/size relative to a
  * MemoryRegion into a #MemoryRegionSection.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7197559..b2d5092 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2570,6 +2570,24 @@ bool memory_region_is_mapped(MemoryRegion *mr)
     return mr->container ? true : false;
 }
 
+int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
+                 void *handle, Error **errp)
+{
+    FlatView *view = address_space_get_flatview(as);
+    FlatRange *fr;
+    int ret;
+
+    FOR_EACH_FLAT_RANGE(fr, view) {
+        MemoryRegionSection section = section_from_flat_range(fr, view);
+        ret = func(&section, handle, errp);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /* Same as memory_region_find, but it does not add a reference to the
  * returned region.  It must be called from an RCU critical section.
  */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 06/25] oslib: qemu_clr_cloexec
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (4 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 05/25] as_flat_walk Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 07/25] machine: memfd-alloc option Steve Sistare
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Define qemu_clr_cloexec, analogous to qemu_set_cloexec.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/qemu/osdep.h | 1 +
 util/oslib-posix.c   | 9 +++++++++
 util/oslib-win32.c   | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cb2a07e..de06e60 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -577,6 +577,7 @@ static inline void qemu_timersub(const struct timeval *val1,
 #endif
 
 void qemu_set_cloexec(int fd);
+void qemu_clr_cloexec(int fd);
 
 /* Starting on QEMU 2.5, qemu_hw_version() returns "2.5+" by default
  * instead of QEMU_VERSION, so setting hw_version on MachineClass
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 36820fe..ac9229d 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -317,6 +317,15 @@ void qemu_set_cloexec(int fd)
     assert(f != -1);
 }
 
+void qemu_clr_cloexec(int fd)
+{
+    int f;
+    f = fcntl(fd, F_GETFD);
+    assert(f != -1);
+    f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
+    assert(f != -1);
+}
+
 /*
  * Creates a pipe with FD_CLOEXEC set on both file descriptors
  */
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index f68b801..b5c53b3 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -251,6 +251,10 @@ void qemu_set_cloexec(int fd)
 {
 }
 
+void qemu_clr_cloexec(int fd)
+{
+}
+
 /* Offset between 1/1/1601 and 1/1/1970 in 100 nanosec units */
 #define _W32_FT_OFFSET (116444736000000000ULL)
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 07/25] machine: memfd-alloc option
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (5 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 06/25] oslib: qemu_clr_cloexec Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 08/25] vl: add helper to request re-exec Steve Sistare
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Allocate anonymous memory using memfd_create if the memfd-alloc machine
option is set.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/core/machine.c   | 19 +++++++++++++++++++
 include/hw/boards.h |  1 +
 qemu-options.hx     |  5 +++++
 softmmu/physmem.c   | 41 ++++++++++++++++++++++++++++++++---------
 trace-events        |  1 +
 util/qemu-config.c  |  4 ++++
 6 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 40def78..3ce5303 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -375,6 +375,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
     ms->mem_merge = value;
 }
 
+static bool machine_get_memfd_alloc(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->memfd_alloc;
+}
+
+static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->memfd_alloc = value;
+}
+
 static bool machine_get_usb(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -858,6 +872,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
     object_class_property_set_description(oc, "mem-merge",
         "Enable/disable memory merge support");
 
+    object_class_property_add_bool(oc, "memfd-alloc",
+        machine_get_memfd_alloc, machine_set_memfd_alloc);
+    object_class_property_set_description(oc, "memfd-alloc",
+        "Enable/disable allocating anonymous memory using memfd_create");
+
     object_class_property_add_bool(oc, "usb",
         machine_get_usb, machine_set_usb);
     object_class_property_set_description(oc, "usb",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index ad6c8fd..dceb7f7 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -305,6 +305,7 @@ struct MachineState {
     char *dt_compatible;
     bool dump_guest_core;
     bool mem_merge;
+    bool memfd_alloc;
     bool usb;
     bool usb_disabled;
     char *firmware;
diff --git a/qemu-options.hx b/qemu-options.hx
index fd21002..3392ac0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
+    "                memfd-alloc=on|off controls allocating anonymous memory using memfd_create (default: off)\n"
     "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
     "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
@@ -76,6 +77,10 @@ SRST
         supported by the host, de-duplicates identical memory pages
         among VMs instances (enabled by default).
 
+    ``memfd-alloc=on|off``
+        Enables or disables allocation of anonymous memory using memfd_create.
+        (disabled by default).
+
     ``aes-key-wrap=on|off``
         Enables or disables AES key wrapping support on s390-ccw hosts.
         This feature controls whether AES wrapping keys will be created
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 85034d9..695aa10 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -67,6 +67,7 @@
 
 #include "qemu/pmem.h"
 
+#include "qemu/memfd.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1931,35 +1932,57 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
 {
     RAMBlock *block;
     RAMBlock *last_block = NULL;
+    struct MemoryRegion *mr = new_block->mr;
     ram_addr_t old_ram_size, new_ram_size;
     Error *err = NULL;
+    const char *name;
+    void *addr = 0;
+    size_t maxlen;
+    MachineState *ms = MACHINE(qdev_get_machine());
 
     old_ram_size = last_ram_page();
 
     qemu_mutex_lock_ramlist();
-    new_block->offset = find_ram_offset(new_block->max_length);
+    maxlen = new_block->max_length;
+    new_block->offset = find_ram_offset(maxlen);
 
     if (!new_block->host) {
         if (xen_enabled()) {
-            xen_ram_alloc(new_block->offset, new_block->max_length,
-                          new_block->mr, &err);
+            xen_ram_alloc(new_block->offset, maxlen, new_block->mr, &err);
             if (err) {
                 error_propagate(errp, err);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
         } else {
-            new_block->host = qemu_anon_ram_alloc(new_block->max_length,
-                                                  &new_block->mr->align,
-                                                  shared);
-            if (!new_block->host) {
+            name = memory_region_name(new_block->mr);
+            if (ms->memfd_alloc) {
+                int mfd = -1;          /* placeholder until next patch */
+                mr->align = QEMU_VMALLOC_ALIGN;
+                if (mfd < 0) {
+                    mfd = qemu_memfd_create(name, maxlen + mr->align,
+                                            0, 0, 0, &err);
+                    if (mfd < 0) {
+                        return;
+                    }
+                }
+                new_block->flags |= RAM_SHARED;
+                addr = file_ram_alloc(new_block, maxlen, mfd,
+                                      false, false, 0, errp);
+                trace_anon_memfd_alloc(name, maxlen, addr, mfd);
+            } else {
+                addr = qemu_anon_ram_alloc(maxlen, &mr->align, shared);
+            }
+
+            if (!addr) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
-                                 memory_region_name(new_block->mr));
+                                 name);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
+            memory_try_enable_merging(addr, maxlen);
+            new_block->host = addr;
         }
     }
 
diff --git a/trace-events b/trace-events
index ac7cef9..99e8208 100644
--- a/trace-events
+++ b/trace-events
@@ -40,6 +40,7 @@ ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_
 # accel/tcg/cputlb.c
 memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
 memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
+anon_memfd_alloc(const char *name, size_t size, void *ptr, int fd) "%s size %zu ptr %p fd %d"
 
 # gdbstub.c
 gdbstub_op_start(const char *device) "Starting gdbstub using device %s"
diff --git a/util/qemu-config.c b/util/qemu-config.c
index 670bd6e..135ec3b 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -205,6 +205,10 @@ static QemuOptsList machine_opts = {
             .type = QEMU_OPT_BOOL,
             .help = "enable/disable memory merge support",
         },{
+            .name = "memfd-alloc",
+            .type = QEMU_OPT_BOOL,
+            .help = "enable/disable memfd_create for anonymous memory",
+        },{
             .name = "usb",
             .type = QEMU_OPT_BOOL,
             .help = "Set on/off to enable/disable usb",
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 08/25] vl: add helper to request re-exec
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (6 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 07/25] machine: memfd-alloc option Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 09/25] string to strList Steve Sistare
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a qemu_system_exec_request() hook that causes the main loop to exit and
re-exec qemu using the specified arguments.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/runstate.h |  1 +
 softmmu/runstate.c        | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index ed4b735..e1ae7e5 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -57,6 +57,7 @@ void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_register_wakeup_support(void);
 void qemu_system_shutdown_request(ShutdownCause reason);
+void qemu_system_exec_request(strList *args);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_register_shutdown_notifier(Notifier *notifier);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index ccb35cb..554dfc6 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -354,6 +354,7 @@ static NotifierList wakeup_notifiers =
 static NotifierList shutdown_notifiers =
     NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
+static char **exec_argv;
 
 ShutdownCause qemu_shutdown_requested_get(void)
 {
@@ -370,6 +371,11 @@ static int qemu_shutdown_requested(void)
     return qatomic_xchg(&shutdown_requested, SHUTDOWN_CAUSE_NONE);
 }
 
+static int qemu_exec_requested(void)
+{
+    return exec_argv != NULL;
+}
+
 static void qemu_kill_report(void)
 {
     if (!qtest_driver() && shutdown_signal) {
@@ -644,6 +650,32 @@ void qemu_system_shutdown_request(ShutdownCause reason)
     qemu_notify_event();
 }
 
+static char **make_argv(strList *args)
+{
+    strList *arg;
+    char **argv;
+    int n = 1, i = 0;
+
+    for (arg = args; arg != NULL; arg = arg->next) {
+        n++;
+    }
+
+    argv = g_malloc(n * sizeof(char *));
+    for (arg = args; arg != NULL; arg = arg->next) {
+        argv[i++] = g_strdup(arg->value);
+    }
+    argv[i] = NULL;
+
+    return argv;
+}
+
+void qemu_system_exec_request(strList *args)
+{
+    exec_argv = make_argv(args);
+    shutdown_requested = 1;
+    qemu_notify_event();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown();
@@ -692,6 +724,11 @@ static bool main_loop_should_exit(void)
     }
     request = qemu_shutdown_requested();
     if (request) {
+
+        if (qemu_exec_requested()) {
+            execvp(exec_argv[0], exec_argv);
+            error_setg_errno(&error_fatal, errno, "execvp failed");
+        }
         qemu_kill_report();
         qemu_system_shutdown(request);
         if (shutdown_action == SHUTDOWN_ACTION_PAUSE) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 09/25] string to strList
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (7 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 08/25] vl: add helper to request re-exec Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 10/25] util: env var helpers Steve Sistare
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Generalize strList_from_comma_list to take any delimiter character.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 monitor/hmp-cmds.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index e115a23..f902dcf 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -71,21 +71,21 @@ void hmp_handle_error(Monitor *mon, Error *err)
 }
 
 /*
- * Produce a strList from a comma separated list.
- * A NULL or empty input string return NULL.
+ * Produce a strList from a character delimited string.
+ * A NULL or empty input string returns NULL.
  */
-static strList *strList_from_comma_list(const char *in)
+static strList *strList_from_string(const char *in, char delim)
 {
     strList *res = NULL;
     strList **tail = &res;
 
     while (in && in[0]) {
-        char *comma = strchr(in, ',');
+        char *next = strchr(in, delim);
         char *value;
 
-        if (comma) {
-            value = g_strndup(in, comma - in);
-            in = comma + 1; /* skip the , */
+        if (next) {
+            value = g_strndup(in, next - in);
+            in = next + 1; /* skip the delim */
         } else {
             value = g_strdup(in);
             in = NULL;
@@ -1166,7 +1166,7 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
                                             migrate_announce_params());
 
     qapi_free_strList(params->interfaces);
-    params->interfaces = strList_from_comma_list(interfaces_str);
+    params->interfaces = strList_from_string(interfaces_str, ',');
     params->has_interfaces = params->interfaces != NULL;
     params->id = g_strdup(id);
     params->has_id = !!params->id;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 10/25] util: env var helpers
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (8 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 09/25] string to strList Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 11/25] cpr: restart mode Steve Sistare
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add functions for saving fd's and other values in the environment via
setenv, and for reading them back via getenv.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS        |  2 ++
 include/qemu/env.h | 23 +++++++++++++
 util/env.c         | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 util/meson.build   |  1 +
 4 files changed, 121 insertions(+)
 create mode 100644 include/qemu/env.h
 create mode 100644 util/env.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d4d1e9d..b844f3f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2850,6 +2850,8 @@ S: Maintained
 F: include/migration/cpr.h
 F: migration/cpr.c
 F: qapi/cpr.json
+F: include/qemu/env.h
+F: util/env.c
 
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
diff --git a/include/qemu/env.h b/include/qemu/env.h
new file mode 100644
index 0000000..3dad503
--- /dev/null
+++ b/include/qemu/env.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_ENV_H
+#define QEMU_ENV_H
+
+#define FD_PREFIX "QEMU_FD_"
+
+typedef int (*walkenv_cb)(const char *name, const char *val, void *handle);
+
+int getenv_fd(const char *name);
+void setenv_fd(const char *name, int fd);
+void unsetenv_fd(const char *name);
+void unsetenv_fdv(const char *fmt, ...);
+int walkenv(const char *prefix, walkenv_cb cb, void *handle);
+void printenv(void);
+
+#endif
diff --git a/util/env.c b/util/env.c
new file mode 100644
index 0000000..863678d
--- /dev/null
+++ b/util/env.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/env.h"
+
+static uint64_t getenv_ulong(const char *prefix, const char *name, int *err)
+{
+    char var[80], *val;
+    uint64_t res = 0;
+
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    val = getenv(var);
+    if (val) {
+        *err = qemu_strtoul(val, NULL, 10, &res);
+    } else {
+        *err = -ENOENT;
+    }
+    return res;
+}
+
+static void setenv_ulong(const char *prefix, const char *name, uint64_t val)
+{
+    char var[80], val_str[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
+    setenv(var, val_str, 1);
+}
+
+static void unsetenv_ulong(const char *prefix, const char *name)
+{
+    char var[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    unsetenv(var);
+}
+
+int getenv_fd(const char *name)
+{
+    int err;
+    int fd = getenv_ulong(FD_PREFIX, name, &err);
+    return err ? -1 : fd;
+}
+
+void setenv_fd(const char *name, int fd)
+{
+    setenv_ulong(FD_PREFIX, name, fd);
+}
+
+void unsetenv_fd(const char *name)
+{
+    unsetenv_ulong(FD_PREFIX, name);
+}
+
+void unsetenv_fdv(const char *fmt, ...)
+{
+    va_list args;
+    char buf[80];
+    va_start(args, fmt);
+    vsnprintf(buf, sizeof(buf), fmt, args);
+    va_end(args);
+}
+
+int walkenv(const char *prefix, walkenv_cb cb, void *handle)
+{
+    char *str, name[128];
+    char **envp = environ;
+    size_t prefix_len = strlen(prefix);
+
+    while (*envp) {
+        str = *envp++;
+        if (!strncmp(str, prefix, prefix_len)) {
+            char *val = strchr(str, '=');
+            str += prefix_len;
+            strncpy(name, str, val - str);
+            name[val - str] = 0;
+            if (cb(name, val + 1, handle)) {
+                return 1;
+            }
+        }
+    }
+    return 0;
+}
+
+void printenv(void)
+{
+    char **ptr = environ;
+    while (*ptr) {
+        puts(*ptr++);
+    }
+}
diff --git a/util/meson.build b/util/meson.build
index 510765c..d2d90cc 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -22,6 +22,7 @@ util_ss.add(files('host-utils.c'))
 util_ss.add(files('bitmap.c', 'bitops.c'))
 util_ss.add(files('fifo8.c'))
 util_ss.add(files('cacheinfo.c', 'cacheflush.c'))
+util_ss.add(files('env.c'))
 util_ss.add(files('error.c', 'qemu-error.c'))
 util_ss.add(files('qemu-print.c'))
 util_ss.add(files('id.c'))
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 11/25] cpr: restart mode
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (9 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 10/25] util: env var helpers Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 12/25] cpr: QMP interfaces for restart Steve Sistare
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave restart mode, which preserves the guest VM across a
restart of the qemu process.  After cprsave, the caller passes qemu
command-line arguments to cprexec, which directly exec's the new qemu
binary.  The arguments must include -S so new qemu starts in a paused state.
The caller resumes the guest by calling cprload.

To use the restart mode, qemu must be started with the memfd-alloc machine
option.  The memfd's are saved to the environment and kept open across exec,
after which they are found from the environment and re-mmap'd.  Hence guest
ram is preserved in place, albeit with new virtual addresses in the qemu
process.

The restart mode supports vfio devices in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/cpr.c   | 21 +++++++++++++++++++++
 softmmu/physmem.c |  6 +++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/migration/cpr.c b/migration/cpr.c
index c5bad8a..fb57dec 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -29,6 +29,7 @@
 #include "sysemu/xen.h"
 #include "hw/vfio/vfio-common.h"
 #include "hw/virtio/vhost.h"
+#include "qemu/env.h"
 
 QEMUFile *qf_file_open(const char *path, int flags, int mode,
                               const char *name, Error **errp)
@@ -108,6 +109,26 @@ done:
     return;
 }
 
+static int preserve_fd(const char *name, const char *val, void *handle)
+{
+    qemu_clr_cloexec(atoi(val));
+    return 0;
+}
+
+void cprexec(strList *args, Error **errp)
+{
+    if (xen_enabled()) {
+        error_setg(errp, "xen does not support cprexec");
+        return;
+    }
+    if (!runstate_check(RUN_STATE_SAVE_VM)) {
+        error_setg(errp, "runstate is not save-vm");
+        return;
+    }
+    walkenv(FD_PREFIX, preserve_fd, 0);
+    qemu_system_exec_request(args);
+}
+
 void cprload(const char *file, Error **errp)
 {
     QEMUFile *f;
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 695aa10..b79f408 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -68,6 +68,7 @@
 #include "qemu/pmem.h"
 
 #include "qemu/memfd.h"
+#include "qemu/env.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1957,7 +1958,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
         } else {
             name = memory_region_name(new_block->mr);
             if (ms->memfd_alloc) {
-                int mfd = -1;          /* placeholder until next patch */
+                int mfd = getenv_fd(name);
                 mr->align = QEMU_VMALLOC_ALIGN;
                 if (mfd < 0) {
                     mfd = qemu_memfd_create(name, maxlen + mr->align,
@@ -1965,7 +1966,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                     if (mfd < 0) {
                         return;
                     }
+                    setenv_fd(name, mfd);
                 }
+                qemu_clr_cloexec(mfd);
                 new_block->flags |= RAM_SHARED;
                 addr = file_ram_alloc(new_block, maxlen, mfd,
                                       false, false, 0, errp);
@@ -2214,6 +2217,7 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     qemu_mutex_lock_ramlist();
+    unsetenv_fd(memory_region_name(block->mr));
     QLIST_REMOVE_RCU(block, next);
     ram_list.mru_block = NULL;
     /* Write list before version */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 12/25] cpr: QMP interfaces for restart
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (10 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 11/25] cpr: restart mode Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 13/25] cpr: HMP " Steve Sistare
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprexec calls cprexec().  Syntax:
  { 'command': 'cprexec', 'data': { 'argv': [ 'str' ] } }

Add the restart mode:
  { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 monitor/qmp-cmds.c |  5 +++++
 qapi/cpr.json      | 16 +++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 1128604..7326f7d 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -179,6 +179,11 @@ void qmp_cprsave(const char *file, CprMode mode, Error **errp)
     cprsave(file, mode, errp);
 }
 
+void qmp_cprexec(strList *args, Error **errp)
+{
+    cprexec(args, errp);
+}
+
 void qmp_cprload(const char *file, Error **errp)
 {
     cprload(file, errp);
diff --git a/qapi/cpr.json b/qapi/cpr.json
index b6fdc89..2467e48 100644
--- a/qapi/cpr.json
+++ b/qapi/cpr.json
@@ -16,10 +16,12 @@
 #
 # @reboot: checkpoint can be cprload'ed after a host kexec reboot.
 #
+# @restart: checkpoint can be cprload'ed after restarting qemu.
+#
 # Since: 6.1
 ##
 { 'enum': 'CprMode',
-  'data': [ 'reboot' ] }
+  'data': [ 'reboot', 'restart' ] }
 
 
 ##
@@ -61,6 +63,18 @@
             'mode': 'CprMode' } }
 
 ##
+# @cprexec:
+#
+# Restart qemu.
+#
+# @argv: arguments to exec
+#
+# Since: 6.1
+##
+{ 'command': 'cprexec',
+  'data': { 'argv': [ 'str' ] } }
+
+##
 # @cprload:
 #
 # Start virtual machine from checkpoint file that was created earlier using
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 13/25] cpr: HMP interfaces for restart
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (11 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 12/25] cpr: QMP interfaces for restart Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 14/25] pci: export functions for cpr Steve Sistare
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave <file> <mode>
  mode may be "restart"

cprexec <command>
  Call cprexec().
  Arguments:
    command : command line to execute, with space-separated arguments

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 20 +++++++++++++++++++-
 include/monitor/hmp.h |  1 +
 monitor/hmp-cmds.c    | 11 +++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 015faae..1066fe7 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -366,7 +366,7 @@ ERST
     {
         .name       = "cprsave",
         .args_type  = "file:s,mode:s",
-        .params     = "file 'reboot'",
+        .params     = "file 'restart'|'reboot'",
         .help       = "create a checkpoint of the VM in file",
         .cmd        = hmp_cprsave,
     },
@@ -379,6 +379,24 @@ If *mode* is 'reboot', the checkpoint remains valid after a host kexec
 reboot, and guest ram must be backed by persistant shared memory.  To
 resume from the checkpoint, issue the quit command, reboot the system,
 and issue the cprload command.
+
+If *mode* is 'restart', the checkpoint remains valid after restarting qemu,
+and guest ram must be allocated with the memfd-alloc machine option.  To
+resume from the checkpoint, issue the cprexec command to restart, and issue
+the cprload command.
+ERST
+
+    {
+        .name       = "cprexec",
+        .args_type  = "command:S",
+        .params     = "command",
+        .help       = "Restart qemu by directly exec'ing command",
+        .cmd        = hmp_cprexec,
+    },
+
+SRST
+``cprexec`` *command*
+Restart qemu by directly exec'ing *command*, replacing the qemu process.
 ERST
 
     {
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index e4ebdf1..c26d0d7 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -60,6 +60,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
 void hmp_cprinfo(Monitor *mon, const QDict *qdict);
 void hmp_cprsave(Monitor *mon, const QDict *qdict);
+void hmp_cprexec(Monitor *mon, const QDict *qdict);
 void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index f902dcf..adffe80 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1213,6 +1213,17 @@ out:
     hmp_handle_error(mon, err);
 }
 
+void hmp_cprexec(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *command = qdict_get_try_str(qdict, "command");
+    strList *args = strList_from_string(command, ' ');
+
+    qmp_cprexec(args, &err);
+    qapi_free_strList(args);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_cprload(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 14/25] pci: export functions for cpr
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (12 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 13/25] cpr: HMP " Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 15/25] vfio-pci: refactor " Steve Sistare
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Export msix_is_pending, msix_init_vector_notifiers, and pci_update_mappings
for use by cpr.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/pci/msix.c         | 20 ++++++++++++++------
 hw/pci/pci.c          |  3 +--
 include/hw/pci/msix.h |  5 +++++
 include/hw/pci/pci.h  |  1 +
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331c..73f4259 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -64,7 +64,7 @@ static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
     return dev->msix_pba + vector / 8;
 }
 
-static int msix_is_pending(PCIDevice *dev, int vector)
+int msix_is_pending(PCIDevice *dev, unsigned int vector)
 {
     return *msix_pending_byte(dev, vector) & msix_pending_mask(vector);
 }
@@ -579,6 +579,17 @@ static void msix_unset_notifier_for_vector(PCIDevice *dev, unsigned int vector)
     dev->msix_vector_release_notifier(dev, vector);
 }
 
+void msix_init_vector_notifiers(PCIDevice *dev,
+                                MSIVectorUseNotifier use_notifier,
+                                MSIVectorReleaseNotifier release_notifier,
+                                MSIVectorPollNotifier poll_notifier)
+{
+    assert(use_notifier && release_notifier);
+    dev->msix_vector_use_notifier = use_notifier;
+    dev->msix_vector_release_notifier = release_notifier;
+    dev->msix_vector_poll_notifier = poll_notifier;
+}
+
 int msix_set_vector_notifiers(PCIDevice *dev,
                               MSIVectorUseNotifier use_notifier,
                               MSIVectorReleaseNotifier release_notifier,
@@ -586,11 +597,8 @@ int msix_set_vector_notifiers(PCIDevice *dev,
 {
     int vector, ret;
 
-    assert(use_notifier && release_notifier);
-
-    dev->msix_vector_use_notifier = use_notifier;
-    dev->msix_vector_release_notifier = release_notifier;
-    dev->msix_vector_poll_notifier = poll_notifier;
+    msix_init_vector_notifiers(dev, use_notifier, release_notifier,
+                               poll_notifier);
 
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8f35e13..e08d981 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -226,7 +226,6 @@ static const TypeInfo pcie_bus_info = {
 };
 
 static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num);
-static void pci_update_mappings(PCIDevice *d);
 static void pci_irq_handler(void *opaque, int irq_num, int level);
 static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, Error **);
 static void pci_del_option_rom(PCIDevice *pdev);
@@ -1335,7 +1334,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
     return new_addr;
 }
 
-static void pci_update_mappings(PCIDevice *d)
+void pci_update_mappings(PCIDevice *d)
 {
     PCIIORegion *r;
     int i;
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 4c4a60c..46606cf 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -32,6 +32,7 @@ int msix_present(PCIDevice *dev);
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
 void msix_clr_pending(PCIDevice *dev, int vector);
+int msix_is_pending(PCIDevice *dev, unsigned vector);
 
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
@@ -41,6 +42,10 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
+void msix_init_vector_notifiers(PCIDevice *dev,
+                                MSIVectorUseNotifier use_notifier,
+                                MSIVectorReleaseNotifier release_notifier,
+                                MSIVectorPollNotifier poll_notifier);
 int msix_set_vector_notifiers(PCIDevice *dev,
                               MSIVectorUseNotifier use_notifier,
                               MSIVectorReleaseNotifier release_notifier,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6be4e0c..bef3e49 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -902,5 +902,6 @@ extern const VMStateDescription vmstate_pci_device;
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+void pci_update_mappings(PCIDevice *d);
 
 #endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 15/25] vfio-pci: refactor for cpr
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (13 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 14/25] pci: export functions for cpr Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 16/25] vfio-pci: cpr part 1 Steve Sistare
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Export vfio_address_spaces and vfio_listener_skipped_section.
Add optional eventfd arg to vfio_add_kvm_msi_virq.
Refactor vector use into a helper vfio_vector_init.
All for use by cpr in a subsequent patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/common.c              |  4 ++--
 hw/vfio/pci.c                 | 41 ++++++++++++++++++++++++++++++-----------
 include/hw/vfio/vfio-common.h |  3 +++
 3 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ae5654f..9220e64 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -42,7 +42,7 @@
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
+VFIOAddressSpaceList vfio_address_spaces =
     QLIST_HEAD_INITIALIZER(vfio_address_spaces);
 
 #ifdef CONFIG_KVM
@@ -534,7 +534,7 @@ static int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
     return -1;
 }
 
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
             !memory_region_is_iommu(section->mr)) ||
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c65aa0..bc656e9 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -410,8 +410,19 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
     return ret;
 }
 
+/* Create new or reuse existing eventfd */
+static int vfio_event_notifier_init(EventNotifier *e, int eventfd)
+{
+    if (eventfd < 0) {
+        return event_notifier_init(e, 0);
+    }
+
+    event_notifier_init_fd(e, eventfd);
+    return 0;
+}
+
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
-                                  int vector_n, bool msix)
+                                  int vector_n, bool msix, int eventfd)
 {
     int virq;
 
@@ -419,7 +430,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
         return;
     }
 
-    if (event_notifier_init(&vector->kvm_interrupt, 0)) {
+    if (vfio_event_notifier_init(&vector->kvm_interrupt, eventfd)) {
         return;
     }
 
@@ -455,6 +466,20 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
     kvm_irqchip_commit_routes(kvm_state);
 }
 
+static void vfio_vector_init(VFIOPCIDevice *vdev, int nr, int eventfd)
+{
+    VFIOMSIVector *vector = &vdev->msi_vectors[nr];
+    PCIDevice *pdev = &vdev->pdev;
+
+    vector->vdev = vdev;
+    vector->virq = -1;
+    if (vfio_event_notifier_init(&vector->interrupt, eventfd)) {
+        error_report("vfio: Error: event_notifier_init failed");
+    }
+    vector->use = true;
+    msix_vector_use(pdev, nr);
+}
+
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
@@ -467,13 +492,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
     vector = &vdev->msi_vectors[nr];
 
     if (!vector->use) {
-        vector->vdev = vdev;
-        vector->virq = -1;
-        if (event_notifier_init(&vector->interrupt, 0)) {
-            error_report("vfio: Error: event_notifier_init failed");
-        }
-        vector->use = true;
-        msix_vector_use(pdev, nr);
+        vfio_vector_init(vdev, nr, -1);
     }
 
     qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
@@ -491,7 +510,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
         }
     } else {
         if (msg) {
-            vfio_add_kvm_msi_virq(vdev, vector, nr, true);
+            vfio_add_kvm_msi_virq(vdev, vector, nr, true, -1);
         }
     }
 
@@ -641,7 +660,7 @@ retry:
          * Attempt to enable route through KVM irqchip,
          * default to userspace handling if unavailable.
          */
-        vfio_add_kvm_msi_virq(vdev, vector, i, false);
+        vfio_add_kvm_msi_virq(vdev, vector, i, false, -1);
     }
 
     /* Set interrupt type prior to possible interrupts */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6141162..00acb85 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -204,6 +204,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
+typedef QLIST_HEAD(, VFIOAddressSpace) VFIOAddressSpaceList;
+extern VFIOAddressSpaceList vfio_address_spaces;
 
 bool vfio_mig_active(void);
 int64_t vfio_mig_bytes_transferred(void);
@@ -222,6 +224,7 @@ struct vfio_info_cap_header *
 vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
 #endif
 extern const MemoryListener vfio_prereg_listener;
+bool vfio_listener_skipped_section(MemoryRegionSection *section);
 
 int vfio_spapr_create_window(VFIOContainer *container,
                              MemoryRegionSection *section,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 16/25] vfio-pci: cpr part 1
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (14 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 15/25] vfio-pci: refactor " Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 17/25] vfio-pci: cpr part 2 Steve Sistare
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable vfio-pci devices to be saved and restored across an exec restart
of qemu.

At vfio creation time, save the value of vfio container, group, and device
descriptors in the environment.

In cprsave and cprexec, suspend the use of virtual addresses in DMA
mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped
at a different VA after exec.  DMA to already-mapped pages continues.  Save
the msi message area as part of vfio-pci vmstate, save the interrupt and
notifier eventfd's in the environment, and clear the close-on-exec flag
for the vfio descriptors.  The flag is not cleared earlier because the
descriptors should not persist across miscellaneous fork and exec calls
that may be performed during normal operation.

On qemu restart, vfio_realize() finds the descriptor env vars, uses
the descriptors, and notes that the device is being reused.  Device and
iommu state is already configured, so operations in vfio_realize that
would modify the configuration are skipped for a reused device, including
vfio ioctl's and writes to PCI configuration space.  The result is that
vfio_realize constructs qemu data structures that reflect the current
state of the device.  However, the reconstruction is not complete until
cprload is called. cprload loads the msi data and finds eventfds in the
environment.  It rebuilds vector data structures and attaches the
interrupts to the new KVM instance.  cprload then walks the flattened
ranges of the vfio_address_spaces and calls VFIO_DMA_MAP_FLAG_VADDR to
inform the kernel of the new VA's.  Lastly, it starts the VM and suppresses
vfio device reset.

This functionality is delivered by 2 patches for clarity.  Part 2 adds
eventfd and vector support.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS                   |   1 +
 hw/pci/pci.c                  |   4 ++
 hw/vfio/common.c              |  69 +++++++++++++++++--
 hw/vfio/cpr.c                 | 154 ++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/meson.build           |   1 +
 hw/vfio/pci.c                 |  66 +++++++++++++++++-
 hw/vfio/trace-events          |   1 +
 include/hw/pci/pci.h          |   1 +
 include/hw/vfio/vfio-common.h |   5 ++
 include/migration/cpr.h       |   3 +
 linux-headers/linux/vfio.h    |  27 ++++++++
 migration/cpr.c               |  20 ++++++
 12 files changed, 344 insertions(+), 8 deletions(-)
 create mode 100644 hw/vfio/cpr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b844f3f..b69bbf5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2847,6 +2847,7 @@ CPR
 M: Steve Sistare <steven.sistare@oracle.com>
 M: Mark Kanda <mark.kanda@oracle.com>
 S: Maintained
+F: hw/vfio/cpr.c
 F: include/migration/cpr.h
 F: migration/cpr.c
 F: qapi/cpr.json
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e08d981..27019ca 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -308,6 +308,10 @@ static void pci_do_device_reset(PCIDevice *dev)
 {
     int r;
 
+    if (dev->reused) {
+        return;
+    }
+
     pci_device_deassert_intx(dev);
     assert(dev->irq_state == 0);
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9220e64..40c882f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -31,6 +31,7 @@
 #include "exec/memory.h"
 #include "exec/ram_addr.h"
 #include "hw/hw.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/range.h"
@@ -440,6 +441,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
     }
 
+    if (container->reused) {
+        return 0;
+    }
+
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         /*
          * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -463,6 +468,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
+    if (unmap.size != size) {
+        warn_report("VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
+                     iova, size, unmap.size);
+    }
+
     return 0;
 }
 
@@ -477,6 +487,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
         .size = size,
     };
 
+    if (container->reused) {
+        return 0;
+    }
+
     if (!readonly) {
         map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
     }
@@ -1603,6 +1617,10 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
     if (iommu_type < 0) {
         return iommu_type;
     }
+    if (container->reused) {
+        container->iommu_type = iommu_type;
+        return 0;
+    }
 
     ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
     if (ret) {
@@ -1703,6 +1721,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 {
     VFIOContainer *container;
     int ret, fd;
+    bool reused;
+    char name[40];
     VFIOAddressSpace *space;
 
     space = vfio_get_address_space(as);
@@ -1739,16 +1759,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         return ret;
     }
 
+    snprintf(name, sizeof(name), "vfio_container_for_group_%d", group->groupid);
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+
     QLIST_FOREACH(container, &space->containers, next) {
-        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
-            group->container = container;
-            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+        if (container->fd == fd ||
+            !ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            break;
+        }
+    }
+
+    if (container) {
+        group->container = container;
+        QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+        if (!reused) {
             vfio_kvm_device_add_group(group);
-            return 0;
+            setenv_fd(name, container->fd);
         }
+        return 0;
+    }
+
+    if (!reused) {
+        fd = qemu_open_old("/dev/vfio/vfio", O_RDWR);
     }
 
-    fd = qemu_open_old("/dev/vfio/vfio", O_RDWR);
     if (fd < 0) {
         error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio");
         ret = -errno;
@@ -1766,6 +1801,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->space = space;
     container->fd = fd;
+    container->reused = reused;
     container->error = NULL;
     container->dirty_pages_supported = false;
     QLIST_INIT(&container->giommu_list);
@@ -1893,6 +1929,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     }
 
     container->initialized = true;
+    setenv_fd(name, fd);
 
     return 0;
 listener_release_exit:
@@ -1920,6 +1957,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
+    unsetenv_fdv("vfio_container_for_group_%d", group->groupid);
 
     /*
      * Explicitly release the listener first before unset container,
@@ -1978,7 +2016,12 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     group = g_malloc0(sizeof(*group));
 
     snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open_old(path, O_RDWR);
+
+    group->fd = getenv_fd(path);
+    if (group->fd < 0) {
+        group->fd = qemu_open_old(path, O_RDWR);
+    }
+
     if (group->fd < 0) {
         error_setg_errno(errp, errno, "failed to open %s", path);
         goto free_group_exit;
@@ -2012,6 +2055,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 
     QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
+    setenv_fd(path, group->fd);
+
     return group;
 
 close_fd_exit:
@@ -2036,6 +2081,7 @@ void vfio_put_group(VFIOGroup *group)
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
     trace_vfio_put_group(group->fd);
+    unsetenv_fdv("/dev/vfio/%d", group->groupid);
     close(group->fd);
     g_free(group);
 
@@ -2049,8 +2095,14 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     int ret, fd;
+    bool reused;
+
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+    if (!reused) {
+        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    }
 
-    fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
     if (fd < 0) {
         error_setg_errno(errp, errno, "error getting device from group %d",
                          group->groupid);
@@ -2095,6 +2147,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
     vbasedev->num_irqs = dev_info.num_irqs;
     vbasedev->num_regions = dev_info.num_regions;
     vbasedev->flags = dev_info.flags;
+    vbasedev->reused = reused;
+    setenv_fd(name, fd);
 
     trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
                           dev_info.num_irqs);
@@ -2111,6 +2165,7 @@ void vfio_put_base_device(VFIODevice *vbasedev)
     QLIST_REMOVE(vbasedev, next);
     vbasedev->group = NULL;
     trace_vfio_put_base_device(vbasedev->fd);
+    unsetenv_fd(vbasedev->name);
     close(vbasedev->fd);
 }
 
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
new file mode 100644
index 0000000..28f8a76
--- /dev/null
+++ b/hw/vfio/cpr.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+#include "hw/vfio/vfio-common.h"
+#include "sysemu/kvm.h"
+#include "qapi/error.h"
+#include "trace.h"
+
+static int
+vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
+{
+    struct vfio_iommu_type1_dma_unmap unmap = {
+        .argsz = sizeof(unmap),
+        .flags = VFIO_DMA_UNMAP_FLAG_VADDR | VFIO_DMA_UNMAP_FLAG_ALL,
+        .iova = 0,
+        .size = 0,
+    };
+    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+        error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
+        return -errno;
+    }
+    return 0;
+}
+
+static int vfio_dma_map_vaddr(VFIOContainer *container, hwaddr iova,
+                              ram_addr_t size, void *vaddr,
+                              Error **errp)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_VADDR,
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .iova = iova,
+        .size = size,
+    };
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
+        error_setg_errno(errp, errno,
+                         "vfio_dma_map_vaddr(iova %lu, size %ld, va %p)",
+                         iova, size, vaddr);
+        return -errno;
+    }
+    return 0;
+}
+
+static int
+vfio_region_remap(MemoryRegionSection *section, void *handle, Error **errp)
+{
+    MemoryRegion *mr = section->mr;
+    VFIOContainer *container = handle;
+    const char *name = memory_region_name(mr);
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr offset, iova, roundup;
+    void *vaddr;
+
+    if (vfio_listener_skipped_section(section) || memory_region_is_iommu(mr)) {
+        return 0;
+    }
+
+    offset = section->offset_within_address_space;
+    iova = TARGET_PAGE_ALIGN(offset);
+    roundup = iova - offset;
+    size = (size - roundup) & TARGET_PAGE_MASK;
+    vaddr = memory_region_get_ram_ptr(mr) +
+            section->offset_within_region + roundup;
+
+    trace_vfio_region_remap(name, container->fd, iova, iova + size - 1, vaddr);
+    return vfio_dma_map_vaddr(container, iova, size, vaddr, errp);
+}
+
+bool vfio_cpr_capable(VFIOContainer *container, Error **errp)
+{
+    if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR) ||
+        !ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
+        error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR "
+                         "or VFIO_UNMAP_ALL");
+        return false;
+    } else {
+        return true;
+    }
+}
+
+int vfio_cprsave(Error **errp)
+{
+    VFIOAddressSpace *space, *last_space;
+    VFIOContainer *container, *last_container;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (!vfio_cpr_capable(container, errp)) {
+                return 1;
+            }
+        }
+    }
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (vfio_dma_unmap_vaddr_all(container, errp)) {
+                goto unwind;
+            }
+        }
+    }
+    return 0;
+
+unwind:
+    last_space = space;
+    last_container = container;
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            Error *err;
+
+            if (space == last_space && container == last_container) {
+                break;
+            }
+            if (as_flat_walk(space->as, vfio_region_remap, container, &err)) {
+                error_prepend(errp, "%s", error_get_pretty(err));
+                error_free(err);
+            }
+        }
+    }
+    return 1;
+}
+
+int vfio_cprload(Error **errp)
+{
+    VFIOAddressSpace *space;
+    VFIOContainer *container;
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (!vfio_cpr_capable(container, errp)) {
+                return 1;
+            }
+            container->reused = false;
+            if (as_flat_walk(space->as, vfio_region_remap, container, errp)) {
+                return 1;
+            }
+        }
+    }
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->reused = false;
+        }
+    }
+    return 0;
+}
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index da9af29..e247b2b 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -5,6 +5,7 @@ vfio_ss.add(files(
   'migration.c',
 ))
 vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
+  'cpr.c',
   'display.c',
   'pci-quirks.c',
   'pci.c',
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bc656e9..f3a9361 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -29,6 +29,8 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
 #include "migration/vmstate.h"
+#include "migration/cpr.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -1657,6 +1659,7 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
 static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
+    PCIDevice *pdev = &vdev->pdev;
     char *name;
 
     if (!bar->size) {
@@ -1677,7 +1680,7 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
         }
     }
 
-    pci_register_bar(&vdev->pdev, nr, bar->type, bar->mr);
+    pci_register_bar(pdev, nr, bar->type, bar->mr);
 }
 
 static void vfio_bars_register(VFIOPCIDevice *vdev)
@@ -2889,6 +2892,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         vfio_put_group(group);
         goto error;
     }
+    pdev->reused = vdev->vbasedev.reused;
 
     vfio_populate_device(vdev, &err);
     if (err) {
@@ -3158,6 +3162,10 @@ static void vfio_pci_reset(DeviceState *dev)
 {
     VFIOPCIDevice *vdev = VFIO_PCI(dev);
 
+    if (vdev->pdev.reused) {
+        return;
+    }
+
     trace_vfio_pci_reset(vdev->vbasedev.name);
 
     vfio_pci_pre_reset(vdev);
@@ -3265,6 +3273,61 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static void vfio_merge_config(VFIOPCIDevice *vdev)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    int size = MIN(pci_config_size(pdev), vdev->config_size);
+    uint8_t *phys_config = g_malloc(size);
+    uint32_t mask;
+    int ret, i;
+
+    ret = pread(vdev->vbasedev.fd, phys_config, size, vdev->config_offset);
+    if (ret < size) {
+        ret = ret < 0 ? errno : EFAULT;
+        error_report("failed to read device config space: %s", strerror(ret));
+        return;
+    }
+
+    for (i = 0; i < size; i++) {
+        mask = vdev->emulated_config_bits[i];
+        pdev->config[i] = (pdev->config[i] & mask) | (phys_config[i] & ~mask);
+    }
+
+    g_free(phys_config);
+}
+
+static int vfio_pci_post_load(void *opaque, int version_id)
+{
+    VFIOPCIDevice *vdev = opaque;
+    PCIDevice *pdev = &vdev->pdev;
+    bool enabled;
+
+    vfio_merge_config(vdev);
+
+    pdev->reused = false;
+    enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
+    memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
+
+    return 0;
+}
+
+static bool vfio_pci_needed(void *opaque)
+{
+    return cpr_mode() == CPR_MODE_RESTART;
+}
+
+static const VMStateDescription vfio_pci_vmstate = {
+    .name = "vfio-pci",
+    .unmigratable = 1,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .post_load = vfio_pci_post_load,
+    .needed = vfio_pci_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3272,6 +3335,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 
     dc->reset = vfio_pci_reset;
     device_class_set_props(dc, vfio_pci_dev_properties);
+    dc->vmsd = &vfio_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 079f53a..0f8b166 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -118,6 +118,7 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_region_remap(const char *name, int fd, uint64_t iova_start, uint64_t iova_end, void *vaddr) "%s fd %d 0x%"PRIx64" - 0x%"PRIx64" [%p]"
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index bef3e49..add7f46 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -360,6 +360,7 @@ struct PCIDevice {
     /* ID of standby device in net_failover pair */
     char *failover_pair_id;
     uint32_t acpi_index;
+    bool reused;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 00acb85..b46d850 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -85,6 +85,7 @@ typedef struct VFIOContainer {
     Error *error;
     bool initialized;
     bool dirty_pages_supported;
+    bool reused;
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
@@ -124,6 +125,7 @@ typedef struct VFIODevice {
     bool no_mmap;
     bool ram_block_discard_allowed;
     bool enable_migration;
+    bool reused;
     VFIODeviceOps *ops;
     unsigned int num_irqs;
     unsigned int num_regions;
@@ -200,6 +202,9 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
+int vfio_cprsave(Error **errp);
+int vfio_cprload(Error **errp);
+bool vfio_cpr_capable(VFIOContainer *container, Error **errp);
 
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index bffee19..1ea5046 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -10,6 +10,9 @@
 
 #include "qapi/qapi-types-cpr.h"
 
+#define CPR_MODE_NONE ((CprMode)(-1))
+
+CprMode cpr_mode(void);
 void cprsave(const char *file, CprMode mode, Error **errp);
 void cprexec(strList *args, Error **errp);
 void cprload(const char *file, Error **errp);
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 609099e..bc3a66e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -46,6 +46,12 @@
  */
 #define VFIO_NOIOMMU_IOMMU		8
 
+/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
+#define VFIO_UNMAP_ALL                        9
+
+/* Supports VFIO DMA map and unmap with the VADDR flag */
+#define VFIO_UPDATE_VADDR              10
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
@@ -1074,12 +1080,22 @@ struct vfio_iommu_type1_info_dma_avail {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, record the new base vaddr for iova, and
+ * unblock translation of host virtual addresses in the iova range.  The vaddr
+ * must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR.  To
+ * maintain memory consistency within the user application, the updated vaddr
+ * must address the same memory object as originally mapped.  Failure to do so
+ * will result in user memory corruption and/or device misbehavior.  iova and
+ * size must match those in the original MAP_DMA call.  Protection is not
+ * changed, and the READ & WRITE flags must be 0.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+#define VFIO_DMA_MAP_FLAG_VADDR (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -1102,6 +1118,7 @@ struct vfio_bitmap {
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
  * succeed.
+ *
  * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get the dirty bitmap
  * before unmapping IO virtual addresses. When this flag is set, the user must
  * provide a struct vfio_bitmap in data[]. User must provide zero-allocated
@@ -1111,11 +1128,21 @@ struct vfio_bitmap {
  * indicates that the page at that offset from iova is dirty. A Bitmap of the
  * pages in the range of unmapped size is returned in the user-provided
  * vfio_bitmap.data.
+ *
+ * If flags & VFIO_DMA_UNMAP_FLAG_ALL, unmap all addresses.  iova and size
+ * must be 0.  This cannot be combined with the get-dirty-bitmap flag.
+ *
+ * If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
+ * virtual addresses in the iova range.  Tasks that attempt to translate an
+ * iova's vaddr will block.  DMA to already-mapped pages continues.  This
+ * cannot be combined with the get-dirty-bitmap flag.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
+#define VFIO_DMA_UNMAP_FLAG_ALL              (1 << 1)
+#define VFIO_DMA_UNMAP_FLAG_VADDR            (1 << 2)
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
 	__u8    data[];
diff --git a/migration/cpr.c b/migration/cpr.c
index fb57dec..578466c 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -31,6 +31,13 @@
 #include "hw/virtio/vhost.h"
 #include "qemu/env.h"
 
+static CprMode cpr_active_mode = CPR_MODE_NONE;
+
+CprMode cpr_mode(void)
+{
+    return cpr_active_mode;
+}
+
 QEMUFile *qf_file_open(const char *path, int flags, int mode,
                               const char *name, Error **errp)
 {
@@ -92,6 +99,7 @@ void cprsave(const char *file, CprMode mode, Error **errp)
     }
     vm_stop(RUN_STATE_SAVE_VM);
 
+    cpr_active_mode = mode;
     ret = qemu_save_device_state(f);
     qemu_fclose(f);
     if (ret < 0) {
@@ -105,6 +113,7 @@ err:
     if (saved_vm_running) {
         vm_start();
     }
+    cpr_active_mode = CPR_MODE_NONE;
 done:
     return;
 }
@@ -125,6 +134,13 @@ void cprexec(strList *args, Error **errp)
         error_setg(errp, "runstate is not save-vm");
         return;
     }
+    if (cpr_active_mode != CPR_MODE_RESTART) {
+        error_setg(errp, "cprexec requires cprsave with restart mode");
+        return;
+    }
+    if (vfio_cprsave(errp)) {
+        return;
+    }
     walkenv(FD_PREFIX, preserve_fd, 0);
     qemu_system_exec_request(args);
 }
@@ -158,6 +174,10 @@ void cprload(const char *file, Error **errp)
         return;
     }
 
+    if (vfio_cprload(errp)) {
+        return;
+    }
+
     state = global_state_get_runstate();
     if (state == RUN_STATE_RUNNING) {
         vm_start();
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 17/25] vfio-pci: cpr part 2
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (15 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 16/25] vfio-pci: cpr part 1 Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Finish cpr for vfio-pci by preserving eventfd's and vector state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 116 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f3a9361..492ff31 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2655,6 +2655,27 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
     vfio_put_base_device(&vdev->vbasedev);
 }
 
+static void save_event_fd(VFIOPCIDevice *vdev, const char *name, int nr,
+                            EventNotifier *ev)
+{
+    char envname[256];
+    int fd = event_notifier_get_fd(ev);
+    const char *vfname = vdev->vbasedev.name;
+
+    if (fd >= 0) {
+        snprintf(envname, sizeof(envname), "%s_%s_%d", vfname, name, nr);
+        setenv_fd(envname, fd);
+    }
+}
+
+static int load_event_fd(VFIOPCIDevice *vdev, const char *name, int nr)
+{
+    char envname[256];
+    const char *vfname = vdev->vbasedev.name;
+    snprintf(envname, sizeof(envname), "%s_%s_%d", vfname, name, nr);
+    return getenv_fd(envname);
+}
+
 static void vfio_err_notifier_handler(void *opaque)
 {
     VFIOPCIDevice *vdev = opaque;
@@ -2686,7 +2707,13 @@ static void vfio_err_notifier_handler(void *opaque)
 static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
 {
     Error *err = NULL;
-    int32_t fd;
+    int32_t fd = load_event_fd(vdev, "err", 0);
+
+    if (fd >= 0) {
+        event_notifier_init_fd(&vdev->err_notifier, fd);
+        qemu_set_fd_handler(fd, vfio_err_notifier_handler, NULL, vdev);
+        return;
+    }
 
     if (!vdev->pci_aer) {
         return;
@@ -2747,7 +2774,14 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
     struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
                                       .index = VFIO_PCI_REQ_IRQ_INDEX };
     Error *err = NULL;
-    int32_t fd;
+    int32_t fd = load_event_fd(vdev, "req", 0);
+
+    if (fd >= 0) {
+        event_notifier_init_fd(&vdev->req_notifier, fd);
+        qemu_set_fd_handler(fd, vfio_req_notifier_handler, NULL, vdev);
+        vdev->req_enabled = true;
+        return;
+    }
 
     if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
         return;
@@ -3296,14 +3330,91 @@ static void vfio_merge_config(VFIOPCIDevice *vdev)
     g_free(phys_config);
 }
 
+static int vfio_pci_pre_save(void *opaque)
+{
+    VFIOPCIDevice *vdev = opaque;
+    PCIDevice *pdev = &vdev->pdev;
+    int i;
+
+    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+        error_report("%s: cpr does not support vfio-pci INTX",
+                     vdev->vbasedev.name);
+    }
+
+    for (i = 0; i < vdev->nr_vectors; i++) {
+        VFIOMSIVector *vector = &vdev->msi_vectors[i];
+        if (vector->use) {
+            save_event_fd(vdev, "interrupt", i, &vector->interrupt);
+            if (vector->virq >= 0) {
+                save_event_fd(vdev, "kvm_interrupt", i,
+                                &vector->kvm_interrupt);
+            }
+        }
+    }
+    save_event_fd(vdev, "err", 0, &vdev->err_notifier);
+    save_event_fd(vdev, "req", 0, &vdev->req_notifier);
+    return 0;
+}
+
+static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix)
+{
+    int i, fd;
+    bool pending = false;
+    PCIDevice *pdev = &vdev->pdev;
+
+    vdev->nr_vectors = nr_vectors;
+    vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
+    vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
+
+    for (i = 0; i < nr_vectors; i++) {
+        VFIOMSIVector *vector = &vdev->msi_vectors[i];
+
+        fd = load_event_fd(vdev, "interrupt", i);
+        if (fd >= 0) {
+            vfio_vector_init(vdev, i, fd);
+            qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
+        }
+
+        fd = load_event_fd(vdev, "kvm_interrupt", i);
+        if (fd >= 0) {
+            vfio_add_kvm_msi_virq(vdev, vector, i, msix, fd);
+        }
+
+        if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
+            set_bit(i, vdev->msix->pending);
+            pending = true;
+        }
+    }
+
+    if (msix) {
+        memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
+    }
+}
+
 static int vfio_pci_post_load(void *opaque, int version_id)
 {
     VFIOPCIDevice *vdev = opaque;
     PCIDevice *pdev = &vdev->pdev;
+    int nr_vectors;
     bool enabled;
 
     vfio_merge_config(vdev);
 
+    if (msix_enabled(pdev)) {
+        nr_vectors = vdev->msix->entries;
+        vfio_claim_vectors(vdev, nr_vectors, true);
+        msix_init_vector_notifiers(pdev, vfio_msix_vector_use,
+                                   vfio_msix_vector_release, NULL);
+
+    } else if (msi_enabled(pdev)) {
+        nr_vectors = msi_nr_vectors_allocated(pdev);
+        vfio_claim_vectors(vdev, nr_vectors, false);
+
+    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+        error_report("%s: cpr does not support vfio-pci INTX",
+                     vdev->vbasedev.name);
+    }
+
     pdev->reused = false;
     enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
     memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
@@ -3322,8 +3433,11 @@ static const VMStateDescription vfio_pci_vmstate = {
     .version_id = 0,
     .minimum_version_id = 0,
     .post_load = vfio_pci_post_load,
+    .pre_save = vfio_pci_pre_save,
     .needed = vfio_pci_needed,
     .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
+        VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 18/25] vhost: reset vhost devices upon cprsave
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (16 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 17/25] vfio-pci: cpr part 2 Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-13 15:47   ` Michael S. Tsirkin
  2021-07-06 16:45 ` [PATCH V4 19/25] hostmem-memfd: cpr support Steve Sistare
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

A vhost device is implicitly preserved across re-exec because its fd is not
closed, and the value of the fd is specified on the command line for the
new qemu to find.  However, new qemu issues an VHOST_RESET_OWNER ioctl,
which fails because the device already has an owner.  To fix, reset the
owner prior to exec.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/virtio/vhost.c         | 11 +++++++++++
 include/hw/virtio/vhost.h |  1 +
 migration/cpr.c           |  1 +
 3 files changed, 13 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2163a0..8c0c9c3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1820,6 +1820,17 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
     hdev->vdev = NULL;
 }
 
+void vhost_dev_reset_all(void)
+{
+    struct vhost_dev *dev;
+
+    QLIST_FOREACH(dev, &vhost_devices, entry) {
+        if (dev->vhost_ops->vhost_reset_device(dev) < 0) {
+            VHOST_OPS_DEBUG("vhost_reset_device failed");
+        }
+    }
+}
+
 int vhost_net_set_backend(struct vhost_dev *hdev,
                           struct vhost_vring_file *file)
 {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 4a8bc75..71704d4 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -106,6 +106,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+void vhost_dev_reset_all(void);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
diff --git a/migration/cpr.c b/migration/cpr.c
index 578466c..6333988 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -142,6 +142,7 @@ void cprexec(strList *args, Error **errp)
         return;
     }
     walkenv(FD_PREFIX, preserve_fd, 0);
+    vhost_dev_reset_all();
     qemu_system_exec_request(args);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 19/25] hostmem-memfd: cpr support
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (17 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 20/25] chardev: cpr framework Steve Sistare
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Preserve memory-backend-memfd memory objects during cpr.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 backends/hostmem-memfd.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 69b0ae3..3503c89 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -15,6 +15,7 @@
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
 #include "qemu/memfd.h"
+#include "qemu/env.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
 #include "qom/object.h"
@@ -36,23 +37,25 @@ static void
 memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
-    char *name;
-    int fd;
+    char *name = host_memory_backend_get_name(backend);
+    int fd = getenv_fd(name);
 
     if (!backend->size) {
         error_setg(errp, "can't create backend with size 0");
         return;
     }
 
-    fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
-                           m->hugetlb, m->hugetlbsize, m->seal ?
-                           F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
-                           errp);
-    if (fd == -1) {
-        return;
+    if (fd < 0) {
+        fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
+                               m->hugetlb, m->hugetlbsize, m->seal ?
+                               F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
+                               errp);
+        if (fd == -1) {
+            return;
+        }
+        setenv_fd(name, fd);
     }
 
-    name = host_memory_backend_get_name(backend);
     memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
                                    name, backend->size,
                                    backend->share, fd, 0, errp);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 20/25] chardev: cpr framework
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (18 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 19/25] hostmem-memfd: cpr support Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 21/25] chardev: cpr for simple devices Steve Sistare
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
Add the chardev close_on_cpr option for devices that can be closed on cpr
and reopened after exec.
cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr is set
for all chardevs in the configuration.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char.c         | 41 ++++++++++++++++++++++++++++++++++++++---
 include/chardev/char.h |  5 +++++
 migration/cpr.c        |  3 +++
 qapi/char.json         |  5 ++++-
 qemu-options.hx        | 26 ++++++++++++++++++++++----
 5 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/chardev/char.c b/chardev/char.c
index 398f09d..596d4f9 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -37,6 +37,7 @@
 #include "qemu/help_option.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
+#include "qemu/env.h"
 #include "qemu/id.h"
 #include "qemu/coroutine.h"
 #include "qemu/yank.h"
@@ -240,6 +241,9 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
     ChardevClass *cc = CHARDEV_GET_CLASS(chr);
     /* Any ChardevCommon member would work */
     ChardevCommon *common = backend ? backend->u.null.data : NULL;
+    char fdname[40];
+
+    chr->close_on_cpr = (common && common->close_on_cpr);
 
     if (common && common->has_logfile) {
         int flags = O_WRONLY | O_CREAT;
@@ -249,7 +253,14 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
         } else {
             flags |= O_TRUNC;
         }
-        chr->logfd = qemu_open_old(common->logfile, flags, 0666);
+        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
+        chr->logfd = getenv_fd(fdname);
+        if (chr->logfd < 0) {
+            chr->logfd = qemu_open_old(common->logfile, flags, 0666);
+            if (!chr->close_on_cpr) {
+                setenv_fd(fdname, chr->logfd);
+            }
+        }
         if (chr->logfd < 0) {
             error_setg_errno(errp, errno,
                              "Unable to open logfile %s",
@@ -301,11 +312,12 @@ static void char_finalize(Object *obj)
     if (chr->be) {
         chr->be->chr = NULL;
     }
-    g_free(chr->filename);
-    g_free(chr->label);
     if (chr->logfd != -1) {
         close(chr->logfd);
+        unsetenv_fdv("%s_log", chr->label);
     }
+    g_free(chr->filename);
+    g_free(chr->label);
     qemu_mutex_destroy(&chr->chr_write_lock);
 }
 
@@ -505,6 +517,8 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend)
 
     backend->has_logappend = true;
     backend->logappend = qemu_opt_get_bool(opts, "logappend", false);
+
+    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr", false);
 }
 
 static const ChardevClass *char_get_class(const char *driver, Error **errp)
@@ -940,6 +954,9 @@ QemuOptsList qemu_chardev_opts = {
         },{
             .name = "abstract",
             .type = QEMU_OPT_BOOL,
+        },{
+            .name = "close-on-cpr",
+            .type = QEMU_OPT_BOOL,
 #endif
         },
         { /* end of list */ }
@@ -1207,6 +1224,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
     return source;
 }
 
+static int chr_cpr_capable(Object *obj, void *opaque)
+{
+    Chardev *chr = (Chardev *)obj;
+    Error **errp = opaque;
+
+    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) || chr->close_on_cpr) {
+        return 0;
+    }
+    error_setg(errp, "error: chardev %s -> %s is not capable of cpr",
+               chr->label, chr->filename);
+    return 1;
+}
+
+bool qemu_chr_cpr_capable(Error **errp)
+{
+    return !object_child_foreach(get_chardevs_root(), chr_cpr_capable, errp);
+}
+
 void qemu_chr_cleanup(void)
 {
     object_unparent(get_chardevs_root());
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 7c0444f..e488ad1 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -50,6 +50,8 @@ typedef enum {
     /* Whether the gcontext can be changed after calling
      * qemu_chr_be_update_read_handlers() */
     QEMU_CHAR_FEATURE_GCONTEXT,
+    /* Whether the device supports cpr */
+    QEMU_CHAR_FEATURE_CPR,
 
     QEMU_CHAR_FEATURE_LAST,
 } ChardevFeature;
@@ -67,6 +69,7 @@ struct Chardev {
     int be_open;
     /* used to coordinate the chardev-change special-case: */
     bool handover_yank_instance;
+    bool close_on_cpr;
     GSource *gsource;
     GMainContext *gcontext;
     DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
@@ -291,4 +294,6 @@ void resume_mux_open(void);
 /* console.c */
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
+bool qemu_chr_cpr_capable(Error **errp);
+
 #endif
diff --git a/migration/cpr.c b/migration/cpr.c
index 6333988..feff97f 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -138,6 +138,9 @@ void cprexec(strList *args, Error **errp)
         error_setg(errp, "cprexec requires cprsave with restart mode");
         return;
     }
+    if (!qemu_chr_cpr_capable(errp)) {
+        return;
+    }
     if (vfio_cprsave(errp)) {
         return;
     }
diff --git a/qapi/char.json b/qapi/char.json
index 6413970..194d88c 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -204,12 +204,15 @@
 # @logfile: The name of a logfile to save output
 # @logappend: true to append instead of truncate
 #             (default to false to truncate)
+# @close-on-cpr: if true, close device's fd on cprsave. defaults to false.
+#                since 6.1.
 #
 # Since: 2.6
 ##
 { 'struct': 'ChardevCommon',
   'data': { '*logfile': 'str',
-            '*logappend': 'bool' } }
+            '*logappend': 'bool',
+            '*close-on-cpr': 'bool' } }
 
 ##
 # @ChardevFile:
diff --git a/qemu-options.hx b/qemu-options.hx
index 3392ac0..ef2d24a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3071,43 +3071,57 @@ DEFHEADING(Character device options:)
 
 DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
     "-chardev help\n"
-    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
     "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off][,reconnect=seconds]\n"
     "         [,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
-    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
+    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID][,close-on-cpr=on|off] (tcp)\n"
     "-chardev socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
-    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off] (unix)\n"
+    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off] (unix)\n"
     "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
     "         [,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
-    "         [,logfile=PATH][,logappend=on|off]\n"
+    "         [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
     "-chardev msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
     "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #ifdef _WIN32
     "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
     "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 #else
     "-chardev pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #ifdef CONFIG_BRLAPI
     "-chardev braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
         || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__DragonFly__)
     "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
     "-chardev parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(CONFIG_SPICE)
     "-chardev spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
     , QEMU_ARCH_ALL
 )
@@ -3182,6 +3196,10 @@ The general form of a character device option is:
     ``logappend`` option controls whether the log file will be truncated
     or appended to when opened.
 
+    Every backend supports the ``close-on-cpr`` option.  If on, the
+    devices's descriptor is closed during cprsave, and reopened after exec.
+    This is useful for devices that do not support cpr.
+
 The available backends are:
 
 ``-chardev null,id=id``
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 21/25] chardev: cpr for simple devices
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (19 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 20/25] chardev: cpr framework Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 22/25] chardev: cpr for pty Steve Sistare
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Set QEMU_CHAR_FEATURE_CPR for devices that trivially support cpr.
char-stdio is slightly less trivial.  Allow the gdb server by
closing it on exec.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-mux.c     | 1 +
 chardev/char-null.c    | 1 +
 chardev/char-serial.c  | 1 +
 chardev/char-stdio.c   | 8 ++++++++
 gdbstub.c              | 1 +
 include/chardev/char.h | 1 +
 migration/cpr.c        | 1 +
 7 files changed, 14 insertions(+)

diff --git a/chardev/char-mux.c b/chardev/char-mux.c
index 72beef2..af74eaf 100644
--- a/chardev/char-mux.c
+++ b/chardev/char-mux.c
@@ -337,6 +337,7 @@ static void qemu_chr_open_mux(Chardev *chr,
      */
     *be_opened = muxes_opened;
     qemu_chr_fe_init(&d->chr, drv, errp);
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 
 static void qemu_chr_parse_mux(QemuOpts *opts, ChardevBackend *backend,
diff --git a/chardev/char-null.c b/chardev/char-null.c
index 1c6a290..02acaff 100644
--- a/chardev/char-null.c
+++ b/chardev/char-null.c
@@ -32,6 +32,7 @@ static void null_chr_open(Chardev *chr,
                           Error **errp)
 {
     *be_opened = false;
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 
 static void char_null_class_init(ObjectClass *oc, void *data)
diff --git a/chardev/char-serial.c b/chardev/char-serial.c
index 7c3d84a..b585085 100644
--- a/chardev/char-serial.c
+++ b/chardev/char-serial.c
@@ -274,6 +274,7 @@ static void qmp_chardev_open_serial(Chardev *chr,
     qemu_set_nonblock(fd);
     tty_serial_init(fd, 115200, 'N', 8, 1);
 
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
     qemu_chr_open_fd(chr, fd, fd);
 }
 #endif /* __linux__ || __sun__ */
diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c
index 403da30..9410c16 100644
--- a/chardev/char-stdio.c
+++ b/chardev/char-stdio.c
@@ -114,9 +114,17 @@ static void qemu_chr_open_stdio(Chardev *chr,
 
     stdio_allow_signal = !opts->has_signal || opts->signal;
     qemu_chr_set_echo_stdio(chr, false);
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 #endif
 
+void qemu_term_exit(void)
+{
+#ifndef _WIN32
+    term_exit();
+#endif
+}
+
 static void qemu_chr_parse_stdio(QemuOpts *opts, ChardevBackend *backend,
                                  Error **errp)
 {
diff --git a/gdbstub.c b/gdbstub.c
index 054665e..fdbf531 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -3540,6 +3540,7 @@ int gdbserver_start(const char *device)
         mon_chr = gdbserver_state.mon_chr;
         reset_gdbserver_state();
     }
+    mon_chr->close_on_cpr = true;
 
     create_processes(&gdbserver_state);
 
diff --git a/include/chardev/char.h b/include/chardev/char.h
index e488ad1..96e5570 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -295,5 +295,6 @@ void resume_mux_open(void);
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
 bool qemu_chr_cpr_capable(Error **errp);
+void qemu_term_exit(void);
 
 #endif
diff --git a/migration/cpr.c b/migration/cpr.c
index feff97f..4600d8c 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -146,6 +146,7 @@ void cprexec(strList *args, Error **errp)
     }
     walkenv(FD_PREFIX, preserve_fd, 0);
     vhost_dev_reset_all();
+    qemu_term_exit();
     qemu_system_exec_request(args);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 22/25] chardev: cpr for pty
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (20 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 21/25] chardev: cpr for simple devices Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 23/25] chardev: cpr for sockets Steve Sistare
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Save and restore pty descriptors across cprsave and cprload.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-pty.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index a2d1e7c..c91151d 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -30,6 +30,7 @@
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "qemu/env.h"
 #include "qemu/qemu-print.h"
 
 #include "chardev/char-io.h"
@@ -191,6 +192,7 @@ static void char_pty_finalize(Object *obj)
     Chardev *chr = CHARDEV(obj);
     PtyChardev *s = PTY_CHARDEV(obj);
 
+    unsetenv_fd(chr->label);
     pty_chr_state(chr, 0);
     object_unref(OBJECT(s->ioc));
     pty_chr_timer_cancel(s);
@@ -207,19 +209,28 @@ static void char_pty_open(Chardev *chr,
     char pty_name[PATH_MAX];
     char *name;
 
+    master_fd = getenv_fd(chr->label);
+    if (master_fd >= 0) {
+        chr->filename = g_strdup_printf("pty:unknown");
+        goto have_fd;
+    }
+
     master_fd = qemu_openpty_raw(&slave_fd, pty_name);
     if (master_fd < 0) {
         error_setg_errno(errp, errno, "Failed to create PTY");
         return;
     }
-
+    if (!chr->close_on_cpr) {
+        setenv_fd(chr->label, master_fd);
+    }
     close(slave_fd);
     qemu_set_nonblock(master_fd);
-
     chr->filename = g_strdup_printf("pty:%s", pty_name);
     qemu_printf("char device redirected to %s (label %s)\n",
                 pty_name, chr->label);
 
+have_fd:
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
     s = PTY_CHARDEV(chr);
     s->ioc = QIO_CHANNEL(qio_channel_file_new_fd(master_fd));
     name = g_strdup_printf("chardev-pty-%s", chr->label);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 23/25] chardev: cpr for sockets
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (21 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 22/25] chardev: cpr for pty Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 24/25] cpr: only-cpr-capable option Steve Sistare
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Save accepted socket fds in the environment before cprsave, and look for
fds in the environment after cprload.  Reject cprexec if a socket enables
the TLS or websocket option.  Allow a monitor socket by closing it on exec.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
 monitor/hmp.c         |  3 +++
 monitor/qmp.c         |  3 +++
 3 files changed, 37 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index daa89fe..110f263 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -27,7 +27,9 @@
 #include "io/channel-socket.h"
 #include "io/channel-tls.h"
 #include "io/channel-websock.h"
+#include "qemu/env.h"
 #include "io/net-listener.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -414,6 +416,7 @@ static void tcp_chr_free_connection(Chardev *chr)
     SocketChardev *s = SOCKET_CHARDEV(chr);
     int i;
 
+    unsetenv_fd(chr->label);
     if (s->read_msgfds_num) {
         for (i = 0; i < s->read_msgfds_num; i++) {
             close(s->read_msgfds[i]);
@@ -976,6 +979,10 @@ static void tcp_chr_accept(QIONetListener *listener,
                                QIO_CHANNEL(cioc));
     }
     tcp_chr_new_client(chr, cioc);
+
+    if (s->sioc && !chr->close_on_cpr) {
+        setenv_fd(chr->label, s->sioc->fd);
+    }
 }
 
 
@@ -1231,6 +1238,24 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
     return false;
 }
 
+static void load_char_socket_fd(Chardev *chr, Error **errp)
+{
+    SocketChardev *sockchar = SOCKET_CHARDEV(chr);
+    QIOChannelSocket *sioc;
+    int fd = getenv_fd(chr->label);
+
+    if (fd != -1) {
+        sockchar = SOCKET_CHARDEV(chr);
+        sioc = qio_channel_socket_new_fd(fd, errp);
+        if (sioc) {
+            tcp_chr_accept(sockchar->listener, sioc, chr);
+            object_unref(OBJECT(sioc));
+        } else {
+            error_setg(errp, "error: could not restore socket for %s",
+                       chr->label);
+        }
+    }
+}
 
 static int qmp_chardev_open_socket_server(Chardev *chr,
                                           bool is_telnet,
@@ -1441,6 +1466,10 @@ static void qmp_chardev_open_socket(Chardev *chr,
     }
     s->registered_yank = true;
 
+    if (!s->tls_creds && !s->is_websock) {
+        qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
+    }
+
     /* be isn't opened until we get a connection */
     *be_opened = false;
 
@@ -1456,6 +1485,8 @@ static void qmp_chardev_open_socket(Chardev *chr,
             return;
         }
     }
+
+    load_char_socket_fd(chr, errp);
 }
 
 static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
diff --git a/monitor/hmp.c b/monitor/hmp.c
index 6c0b33a..63700b3 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1451,4 +1451,7 @@ void monitor_init_hmp(Chardev *chr, bool use_readline, Error **errp)
     qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read, monitor_read,
                              monitor_event, NULL, &mon->common, NULL, true);
     monitor_list_append(&mon->common);
+
+    /* monitor cannot yet be preserved across cpr */
+    chr->close_on_cpr = true;
 }
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 2b0308f..495d68f 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -531,4 +531,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
                                  NULL, &mon->common, NULL, true);
         monitor_list_append(&mon->common);
     }
+
+    /* Monitor cannot yet be preserved across cpr */
+    chr->close_on_cpr = true;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 24/25] cpr: only-cpr-capable option
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (22 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 23/25] chardev: cpr for sockets Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-06 16:45 ` [PATCH V4 25/25] simplify savevm Steve Sistare
  2021-07-07 12:31 ` [PATCH V4 00/25] Live Update Marc-André Lureau
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add the only-cpr-capable option, which causes qemu to exit with an error
if any devices that are not capable of cpr are added.  This guarantees that
a cprexec operation will not fail with an unsupported device error.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS             |  1 +
 chardev/char-socket.c   |  4 ++++
 hw/vfio/common.c        |  5 +++++
 hw/vfio/pci.c           |  5 +++++
 include/sysemu/sysemu.h |  1 +
 migration/migration.c   |  5 +++++
 qemu-options.hx         |  8 ++++++++
 softmmu/globals.c       |  1 +
 softmmu/physmem.c       |  4 ++++
 softmmu/vl.c            | 14 +++++++++++++-
 stubs/cpr.c             |  3 +++
 stubs/meson.build       |  1 +
 12 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 stubs/cpr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b69bbf5..25b5bc3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2853,6 +2853,7 @@ F: migration/cpr.c
 F: qapi/cpr.json
 F: include/qemu/env.h
 F: util/env.c
+F: stubs/cpr.c
 
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 110f263..b8c75ff 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -40,6 +40,7 @@
 
 #include "chardev/char-io.h"
 #include "qom/object.h"
+#include "sysemu/sysemu.h"
 
 /***********************************************************/
 /* TCP Net console */
@@ -1468,6 +1469,9 @@ static void qmp_chardev_open_socket(Chardev *chr,
 
     if (!s->tls_creds && !s->is_websock) {
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
+    } else if (only_cpr_capable) {
+        error_setg(errp, "error: socket %s is not cpr capable due to %s option",
+                   chr->label, (s->tls_creds ? "TLS" : "websocket"));
     }
 
     /* be isn't opened until we get a connection */
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 40c882f..09d5e6e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -37,6 +37,7 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
+#include "sysemu/sysemu.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
@@ -1601,6 +1602,10 @@ static int vfio_get_iommu_type(VFIOContainer *container,
 
     for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
         if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
+            if (only_cpr_capable && !vfio_cpr_capable(container, errp)) {
+                error_prepend(errp, "only-cpr-capable is specified: ");
+                return -EINVAL;
+            }
             return iommu_types[i];
         }
     }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 492ff31..706fc0b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -266,6 +266,11 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 
     if (!pin) {
         return 0;
+    } else if (only_cpr_capable) {
+        error_setg(errp,
+                   "%s: vfio-pci INTX is not compatible with -only-cpr-capable",
+                   vdev->vbasedev.name);
+        return -1;
     }
 
     vfio_disable_interrupts(vdev);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8fae667..6241c20 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 /* vl.c */
 
 extern int only_migratable;
+extern bool only_cpr_capable;
 extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
diff --git a/migration/migration.c b/migration/migration.c
index 8ca0341..181c8d5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1262,6 +1262,11 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_X_COLO] && only_cpr_capable) {
+        error_setg(errp, "x-colo is not compatible with -only-cpr-capable");
+        return false;
+    }
+
     return true;
 }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index ef2d24a..f1b372b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4257,6 +4257,14 @@ SRST
     an unmigratable state.
 ERST
 
+DEF("only-cpr-capable", 0, QEMU_OPTION_only_cpr_capable, \
+    "-only-cpr-capable    allow only cpr capable devices\n", QEMU_ARCH_ALL)
+SRST
+``-only-cpr-capable``
+    Only allow cpr capable devices, which guarantees that cprsave will not
+    fail with an unsupported device error.
+ERST
+
 DEF("nodefaults", 0, QEMU_OPTION_nodefaults, \
     "-nodefaults     don't create default devices\n", QEMU_ARCH_ALL)
 SRST
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 7d0fc81..a18fd8d 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -59,6 +59,7 @@ int boot_menu;
 bool boot_strict;
 uint8_t *boot_splash_filedata;
 int only_migratable; /* turn it off unless user states otherwise */
+bool only_cpr_capable;
 int icount_align_option;
 
 /* The bytes in qemu_uuid are in the order specified by RFC4122, _not_ in the
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index b79f408..04e3603 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1973,6 +1973,10 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                 addr = file_ram_alloc(new_block, maxlen, mfd,
                                       false, false, 0, errp);
                 trace_anon_memfd_alloc(name, maxlen, addr, mfd);
+            } else if (only_cpr_capable) {
+                error_setg(errp,
+                    "only-cpr-capable requires -machine memfd-alloc=on");
+                return;
             } else {
                 addr = qemu_anon_ram_alloc(maxlen, &mr->align, shared);
             }
diff --git a/softmmu/vl.c b/softmmu/vl.c
index aadb526..78f8829 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2589,6 +2589,10 @@ void qmp_x_exit_preconfig(Error **errp)
     qemu_create_cli_devices();
     qemu_machine_creation_done();
 
+    if (only_cpr_capable && !qemu_chr_cpr_capable(errp)) {
+        ;    /* not reached due to error_fatal */
+    }
+
     if (loadvm) {
         Error *local_err = NULL;
         if (!load_snapshot(loadvm, NULL, false, NULL, &local_err)) {
@@ -2598,7 +2602,12 @@ void qmp_x_exit_preconfig(Error **errp)
         }
     }
     if (replay_mode != REPLAY_MODE_NONE) {
-        replay_vmstate_init();
+        if (only_cpr_capable) {
+            error_setg(errp, "replay is not compatible with -only-cpr-capable");
+            /* not reached due to error_fatal */
+        } else {
+            replay_vmstate_init();
+        }
     }
 
     if (incoming) {
@@ -3339,6 +3348,9 @@ void qemu_init(int argc, char **argv, char **envp)
             case QEMU_OPTION_only_migratable:
                 only_migratable = 1;
                 break;
+            case QEMU_OPTION_only_cpr_capable:
+                only_cpr_capable = true;
+                break;
             case QEMU_OPTION_nodefaults:
                 has_defaults = 0;
                 break;
diff --git a/stubs/cpr.c b/stubs/cpr.c
new file mode 100644
index 0000000..aaa189e
--- /dev/null
+++ b/stubs/cpr.c
@@ -0,0 +1,3 @@
+#include "qemu/osdep.h"
+
+bool only_cpr_capable;
diff --git a/stubs/meson.build b/stubs/meson.build
index be6f6d6..2003c77 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -5,6 +5,7 @@ stub_ss.add(files('blk-exp-close-all.c'))
 stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
 stub_ss.add(files('change-state-handler.c'))
 stub_ss.add(files('cmos.c'))
+stub_ss.add(files('cpr.c'))
 stub_ss.add(files('cpu-get-clock.c'))
 stub_ss.add(files('cpus-get-virtual-clock.c'))
 stub_ss.add(files('qemu-timer-notify-cb.c'))
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 25/25] simplify savevm
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (23 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 24/25] cpr: only-cpr-capable option Steve Sistare
@ 2021-07-06 16:45 ` Steve Sistare
  2021-07-07 12:31 ` [PATCH V4 00/25] Live Update Marc-André Lureau
  25 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-07-06 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Use qf_file_open to simplify a few functions in savevm.c.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/savevm.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 52e2d72..d02bce2 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2904,8 +2904,9 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
 void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
                                 Error **errp)
 {
+    const char *ioc_name = "migration-xen-save-state";
+    int flags = O_WRONLY | O_CREAT | O_TRUNC;
     QEMUFile *f;
-    QIOChannelFile *ioc;
     int saved_vm_running;
     int ret;
 
@@ -2919,14 +2920,10 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
     vm_stop(RUN_STATE_SAVE_VM);
     global_state_store_running();
 
-    ioc = qio_channel_file_new_path(filename, O_WRONLY | O_CREAT | O_TRUNC,
-                                    0660, errp);
-    if (!ioc) {
+    f = qf_file_open(filename, flags, 0660, ioc_name, errp);
+    if (!f) {
         goto the_end;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-xen-save-state");
-    f = qemu_fopen_channel_output(QIO_CHANNEL(ioc));
-    object_unref(OBJECT(ioc));
     ret = qemu_save_device_state(f);
     if (ret < 0 || qemu_fclose(f) < 0) {
         error_setg(errp, QERR_IO_ERROR);
@@ -2954,8 +2951,8 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
 
 void qmp_xen_load_devices_state(const char *filename, Error **errp)
 {
+    const char *ioc_name = "migration-xen-load-state";
     QEMUFile *f;
-    QIOChannelFile *ioc;
     int ret;
 
     /* Guest must be paused before loading the device state; the RAM state
@@ -2967,14 +2964,10 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
     }
     vm_stop(RUN_STATE_RESTORE_VM);
 
-    ioc = qio_channel_file_new_path(filename, O_RDONLY | O_BINARY, 0, errp);
-    if (!ioc) {
+    f = qf_file_open(filename, O_RDONLY | O_BINARY, 0, ioc_name, errp);
+    if (!f) {
         return;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-xen-load-state");
-    f = qemu_fopen_channel_input(QIO_CHANNEL(ioc));
-    object_unref(OBJECT(ioc));
-
     ret = qemu_loadvm_state(f);
     qemu_fclose(f);
     if (ret < 0) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 00/25] Live Update
  2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
                   ` (24 preceding siblings ...)
  2021-07-06 16:45 ` [PATCH V4 25/25] simplify savevm Steve Sistare
@ 2021-07-07 12:31 ` Marc-André Lureau
  2021-07-07 12:36   ` Steven Sistare
  25 siblings, 1 reply; 29+ messages in thread
From: Marc-André Lureau @ 2021-07-07 12:31 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 11499 bytes --]

Hi Steve

On Tue, Jul 6, 2021 at 8:58 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Provide the cprsave, cprexec, and cprload commands for live update.  These
> save and restore VM state, with minimal guest pause time, so that qemu may
> be updated to a new version in between.
>
> cprsave stops the VM and saves vmstate to an ordinary file.  It supports
> any
> type of guest image and block device, but the caller must not modify guest
> block devices between cprsave and cprload.  It supports two modes: reboot
> and restart.
>
> In reboot mode, the caller invokes cprsave and then terminates qemu.
> The caller may then update the host kernel and system software and reboot.
> The caller resumes the guest by running qemu with the same arguments as the
> original process and invoking cprload.  To use this mode, guest ram must be
> mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
> PKRAM as proposed in
> https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com
> .
>
> The reboot mode supports vfio devices if the caller first suspends the
> guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.
>
> Restart mode preserves the guest VM across a restart of the qemu process.
> After cprsave, the caller passes qemu command-line arguments to cprexec,
> which directly exec's the new qemu binary.  The arguments must include -S
> so new qemu starts in a paused state and waits for the cprload command.
> The restart mode supports vfio devices by preserving the vfio container,
> group, device, and event descriptors across the qemu re-exec, and by
> updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and
> VFIO_DMA_MAP_FLAG_VADDR as defined in
> https://lore.kernel.org/kvm/1611939252-7240-1-git-send-email-steven.sistare@oracle.com/
> and integrated in Linux kernel 5.12.
>
> To use the restart mode, qemu must be started with the memfd-alloc option,
> which allocates guest ram using memfd_create.  The memfd's are saved to
> the environment and kept open across exec, after which they are found from
> the environment and re-mmap'd.  Hence guest ram is preserved in place,
> albeit with new virtual addresses in the qemu process.
>
> The caller resumes the guest by invoking cprload, which loads state from
> the file.  If the VM was running at cprsave time, then VM execution
> resumes.
> If the VM was suspended at cprsave time (reboot mode), then the caller must
> issue a system_wakeup command to resume.
>
> The first patches add reboot mode:
>   - qemu_ram_volatile
>   - cpr: reboot mode
>   - cpr: QMP interfaces for reboot
>   - cpr: HMP interfaces for reboot
>
> The next patches add restart mode:
>   - as_flat_walk
>   - oslib: qemu_clr_cloexec
>   - machine: memfd-alloc option
>   - vl: add helper to request re-exec
>   - string to strList
>   - util: env var helpers
>   - cpr: restart mode
>   - cpr: QMP interfaces for restart
>   - cpr: HMP interfaces for restart
>
> The next patches add vfio support for restart mode:
>   - pci: export functions for cpr
>   - vfio-pci: refactor for cpr
>   - vfio-pci: cpr part 1
>   - vfio-pci: cpr part 2
>
> The next patches preserve various descriptor-based backend devices across
> cprexec:
>   - vhost: reset vhost devices upon cprsave
>   - hostmem-memfd: cpr support
>   - chardev: cpr framework
>   - chardev: cpr for simple devices
>   - chardev: cpr for pty
>   - chardev: cpr for sockets
>   - cpr: only-cpr-capable option
>   - simplify savevm
>
> Here is an example of updating qemu from v4.2.0 to v4.2.1 using
> restart mode.  The software update is performed while the guest is
> running to minimize downtime.
>
> window 1                                        | window 2
>                                                 |
> # qemu-system-x86_64 ...                        |
> QEMU 4.2.0 monitor - type 'help' ...            |
> (qemu) info status                              |
> VM status: running                              |
>                                                 | # yum update qemu
> (qemu) cprsave /tmp/qemu.sav restart            |
> (qemu) cprexec qemu-system-x86_64 -S ...        |
> QEMU 4.2.1 monitor - type 'help' ...            |
> (qemu) info status                              |
> VM status: paused (prelaunch)                   |
> (qemu) cprload /tmp/qemu.sav                    |
> (qemu) info status                              |
> VM status: running                              |
>
>
> Here is an example of updating the host kernel using reboot mode.
>
> window 1                                        | window 2
>                                                 |
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...            |
> (qemu) info status                              |
> VM status: running                              |
>                                                 | # yum update kernel-uek
> (qemu) cprsave /tmp/qemu.sav restart            |
> (qemu) quit                                     |
>                                                 |
> # systemctl kexec                               |
> kexec_core: Starting new kernel                 |
> ...                                             |
>                                                 |
> # qemu-system-x86_64 -S mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...            |
> (qemu) info status                              |
> VM status: paused (prelaunch)                   |
> (qemu) cprload /tmp/qemu.sav                    |
> (qemu) info status                              |
> VM status: running                              |
>
> Changes from V1 to V2:
>   - revert vmstate infrastructure changes
>   - refactor cpr functions into new files
>   - delete MADV_DOEXEC and use memfd + VFIO_DMA_UNMAP_FLAG_SUSPEND to
>     preserve memory.
>   - add framework to filter chardev's that support cpr
>   - save and restore vfio eventfd's
>   - modify cprinfo QMP interface
>   - incorporate misc review feedback
>   - remove unrelated and unneeded patches
>   - refactor all patches into a shorter and easier to review series
>
> Changes from V2 to V3:
>   - rebase to qemu 6.0.0
>   - use final definition of vfio ioctls (VFIO_DMA_UNMAP_FLAG_VADDR etc)
>   - change memfd-alloc to a machine option
>   - Use qio_channel_socket_new_fd instead of adding
> qio_channel_socket_new_fd
>   - close monitor socket during cpr
>   - fix a few unreported bugs
>   - support memory-backend-memfd
>
> Changes from V3 to V4:
>   - split reboot mode into separate patches
>   - add cprexec command
>   - delete QEMU_START_FREEZE, argv_main, and /usr/bin/qemu-exec
>   - add more checks for vfio and cpr compatibility, and recover after
> errors
>   - save vfio pci config in vmstate
>   - rename {setenv,getenv}_event_fd to {save,load}_event_fd
>   - use qemu_strtol
>   - change 6.0 references to 6.1
>   - use strerror(), use EXIT_FAILURE, remove period from error messages
>   - distribute MAINTAINERS additions to each patch
>
> Steve Sistare (21):
>   qemu_ram_volatile
>   cpr: reboot mode
>   as_flat_walk
>   oslib: qemu_clr_cloexec
>   machine: memfd-alloc option
>   vl: add helper to request re-exec
>   string to strList
>   util: env var helpers
>   cpr: restart mode
>   cpr: QMP interfaces for restart
>   cpr: HMP interfaces for restart
>   pci: export functions for cpr
>   vfio-pci: refactor for cpr
>   vfio-pci: cpr part 1
>   vfio-pci: cpr part 2
>   hostmem-memfd: cpr support
>   chardev: cpr framework
>   chardev: cpr for simple devices
>   chardev: cpr for pty
>   cpr: only-cpr-capable option
>   simplify savevm
>
> Mark Kanda, Steve Sistare (4):
>   cpr: QMP interfaces for reboot
>   cpr: HMP interfaces for reboot
>   vhost: reset vhost devices upon cprsave
>   chardev: cpr for sockets
>
>  MAINTAINERS                   |  12 +++
>  backends/hostmem-memfd.c      |  21 ++--
>  chardev/char-mux.c            |   1 +
>  chardev/char-null.c           |   1 +
>  chardev/char-pty.c            |  15 ++-
>  chardev/char-serial.c         |   1 +
>  chardev/char-socket.c         |  35 +++++++
>  chardev/char-stdio.c          |   8 ++
>  chardev/char.c                |  41 +++++++-
>  gdbstub.c                     |   1 +
>  hmp-commands.hx               |  62 ++++++++++++
>  hw/core/machine.c             |  19 ++++
>  hw/pci/msix.c                 |  20 ++--
>  hw/pci/pci.c                  |   7 +-
>  hw/vfio/common.c              |  78 ++++++++++++--
>  hw/vfio/cpr.c                 | 154 ++++++++++++++++++++++++++++
>  hw/vfio/meson.build           |   1 +
>  hw/vfio/pci.c                 | 230
> +++++++++++++++++++++++++++++++++++++++---
>  hw/vfio/trace-events          |   1 +
>  hw/virtio/vhost.c             |  11 ++
>  include/chardev/char.h        |   6 ++
>  include/exec/memory.h         |  25 +++++
>  include/hw/boards.h           |   1 +
>  include/hw/pci/msix.h         |   5 +
>  include/hw/pci/pci.h          |   2 +
>  include/hw/vfio/vfio-common.h |   8 ++
>  include/hw/virtio/vhost.h     |   1 +
>  include/migration/cpr.h       |  20 ++++
>  include/monitor/hmp.h         |   4 +
>  include/qemu/env.h            |  23 +++++
>  include/qemu/osdep.h          |   1 +
>  include/sysemu/runstate.h     |   2 +
>  include/sysemu/sysemu.h       |   1 +
>  linux-headers/linux/vfio.h    |  27 +++++
>  migration/cpr.c               | 195 +++++++++++++++++++++++++++++++++++
>  migration/meson.build         |   1 +
>  migration/migration.c         |   5 +
>  migration/savevm.c            |  21 ++--
>  migration/savevm.h            |   2 +
>  monitor/hmp-cmds.c            |  75 ++++++++++++--
>  monitor/hmp.c                 |   3 +
>  monitor/qmp-cmds.c            |  36 +++++++
>  monitor/qmp.c                 |   3 +
>  qapi/char.json                |   5 +-
>  qapi/cpr.json                 |  88 ++++++++++++++++
>  qapi/meson.build              |   1 +
>  qapi/qapi-schema.json         |   1 +
>  qemu-options.hx               |  39 ++++++-
>  softmmu/globals.c             |   1 +
>  softmmu/memory.c              |  48 +++++++++
>  softmmu/physmem.c             |  49 +++++++--
>  softmmu/runstate.c            |  58 ++++++++++-
>  softmmu/vl.c                  |  14 ++-
>  stubs/cpr.c                   |   3 +
>  stubs/meson.build             |   1 +
>  trace-events                  |   1 +
>  util/env.c                    |  95 +++++++++++++++++
>  util/meson.build              |   1 +
>  util/oslib-posix.c            |   9 ++
>  util/oslib-win32.c            |   4 +
>  util/qemu-config.c            |   4 +
>  61 files changed, 1525 insertions(+), 83 deletions(-)
>  create mode 100644 hw/vfio/cpr.c
>  create mode 100644 include/migration/cpr.h
>  create mode 100644 include/qemu/env.h
>  create mode 100644 migration/cpr.c
>  create mode 100644 qapi/cpr.json
>  create mode 100644 stubs/cpr.c
>  create mode 100644 util/env.c
>
> --
> 1.8.3.1
>
>
>
It doesn't apply on master, could you rebase and resend?
thanks


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 13880 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 00/25] Live Update
  2021-07-07 12:31 ` [PATCH V4 00/25] Live Update Marc-André Lureau
@ 2021-07-07 12:36   ` Steven Sistare
  0 siblings, 0 replies; 29+ messages in thread
From: Steven Sistare @ 2021-07-07 12:36 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/7/2021 8:31 AM, Marc-André Lureau wrote:
> Hi Steve
> 
> On Tue, Jul 6, 2021 at 8:58 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>[...] 
> It doesn't apply on master, could you rebase and resend?
> thanks
> -- 
> Marc-André Lureau

Will do, sorry about that - steve


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 18/25] vhost: reset vhost devices upon cprsave
  2021-07-06 16:45 ` [PATCH V4 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
@ 2021-07-13 15:47   ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2021-07-13 15:47 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Juan Quintela, Jason Zeng, Alex Bennée,
	qemu-devel, Eric Blake, Dr. David Alan Gilbert, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Tue, Jul 06, 2021 at 09:45:49AM -0700, Steve Sistare wrote:
> A vhost device is implicitly preserved across re-exec because its fd is not
> closed, and the value of the fd is specified on the command line for the
> new qemu to find.  However, new qemu issues an VHOST_RESET_OWNER ioctl,
> which fails because the device already has an owner.  To fix, reset the
> owner prior to exec.
> 
> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Makes sense superficially.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

I didn't really read the series deeply enough to just the rest of it.


> ---
>  hw/virtio/vhost.c         | 11 +++++++++++
>  include/hw/virtio/vhost.h |  1 +
>  migration/cpr.c           |  1 +
>  3 files changed, 13 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2163a0..8c0c9c3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1820,6 +1820,17 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>      hdev->vdev = NULL;
>  }
>  
> +void vhost_dev_reset_all(void)
> +{
> +    struct vhost_dev *dev;
> +
> +    QLIST_FOREACH(dev, &vhost_devices, entry) {
> +        if (dev->vhost_ops->vhost_reset_device(dev) < 0) {
> +            VHOST_OPS_DEBUG("vhost_reset_device failed");
> +        }
> +    }
> +}
> +
>  int vhost_net_set_backend(struct vhost_dev *hdev,
>                            struct vhost_vring_file *file)
>  {
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 4a8bc75..71704d4 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -106,6 +106,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>  void vhost_dev_cleanup(struct vhost_dev *hdev);
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +void vhost_dev_reset_all(void);
>  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>  
> diff --git a/migration/cpr.c b/migration/cpr.c
> index 578466c..6333988 100644
> --- a/migration/cpr.c
> +++ b/migration/cpr.c
> @@ -142,6 +142,7 @@ void cprexec(strList *args, Error **errp)
>          return;
>      }
>      walkenv(FD_PREFIX, preserve_fd, 0);
> +    vhost_dev_reset_all();
>      qemu_system_exec_request(args);
>  }
>  
> -- 
> 1.8.3.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-07-13 15:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 16:45 [PATCH V4 00/25] Live Update Steve Sistare
2021-07-06 16:45 ` [PATCH V4 01/25] qemu_ram_volatile Steve Sistare
2021-07-06 16:45 ` [PATCH V4 02/25] cpr: reboot mode Steve Sistare
2021-07-06 16:45 ` [PATCH V4 03/25] cpr: QMP interfaces for reboot Steve Sistare
2021-07-06 16:45 ` [PATCH V4 04/25] cpr: HMP " Steve Sistare
2021-07-06 16:45 ` [PATCH V4 05/25] as_flat_walk Steve Sistare
2021-07-06 16:45 ` [PATCH V4 06/25] oslib: qemu_clr_cloexec Steve Sistare
2021-07-06 16:45 ` [PATCH V4 07/25] machine: memfd-alloc option Steve Sistare
2021-07-06 16:45 ` [PATCH V4 08/25] vl: add helper to request re-exec Steve Sistare
2021-07-06 16:45 ` [PATCH V4 09/25] string to strList Steve Sistare
2021-07-06 16:45 ` [PATCH V4 10/25] util: env var helpers Steve Sistare
2021-07-06 16:45 ` [PATCH V4 11/25] cpr: restart mode Steve Sistare
2021-07-06 16:45 ` [PATCH V4 12/25] cpr: QMP interfaces for restart Steve Sistare
2021-07-06 16:45 ` [PATCH V4 13/25] cpr: HMP " Steve Sistare
2021-07-06 16:45 ` [PATCH V4 14/25] pci: export functions for cpr Steve Sistare
2021-07-06 16:45 ` [PATCH V4 15/25] vfio-pci: refactor " Steve Sistare
2021-07-06 16:45 ` [PATCH V4 16/25] vfio-pci: cpr part 1 Steve Sistare
2021-07-06 16:45 ` [PATCH V4 17/25] vfio-pci: cpr part 2 Steve Sistare
2021-07-06 16:45 ` [PATCH V4 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
2021-07-13 15:47   ` Michael S. Tsirkin
2021-07-06 16:45 ` [PATCH V4 19/25] hostmem-memfd: cpr support Steve Sistare
2021-07-06 16:45 ` [PATCH V4 20/25] chardev: cpr framework Steve Sistare
2021-07-06 16:45 ` [PATCH V4 21/25] chardev: cpr for simple devices Steve Sistare
2021-07-06 16:45 ` [PATCH V4 22/25] chardev: cpr for pty Steve Sistare
2021-07-06 16:45 ` [PATCH V4 23/25] chardev: cpr for sockets Steve Sistare
2021-07-06 16:45 ` [PATCH V4 24/25] cpr: only-cpr-capable option Steve Sistare
2021-07-06 16:45 ` [PATCH V4 25/25] simplify savevm Steve Sistare
2021-07-07 12:31 ` [PATCH V4 00/25] Live Update Marc-André Lureau
2021-07-07 12:36   ` Steven Sistare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.