All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 00/25] Live Update
@ 2021-07-07 17:20 Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 01/25] qemu_ram_volatile Steve Sistare
                   ` (24 more replies)
  0 siblings, 25 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave, cprexec, and cprload commands for live update.  These
save and restore VM state, with minimal guest pause time, so that qemu may
be updated to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports any
type of guest image and block device, but the caller must not modify guest
block devices between cprsave and cprload.  It supports two modes: reboot
and restart.

In reboot mode, the caller invokes cprsave and then terminates qemu.
The caller may then update the host kernel and system software and reboot.
The caller resumes the guest by running qemu with the same arguments as the
original process and invoking cprload.  To use this mode, guest ram must be
mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
PKRAM as proposed in https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com.

The reboot mode supports vfio devices if the caller first suspends the
guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.

Restart mode preserves the guest VM across a restart of the qemu process.
After cprsave, the caller passes qemu command-line arguments to cprexec,
which directly exec's the new qemu binary.  The arguments must include -S
so new qemu starts in a paused state and waits for the cprload command.
The restart mode supports vfio devices by preserving the vfio container,
group, device, and event descriptors across the qemu re-exec, and by
updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and
VFIO_DMA_MAP_FLAG_VADDR as defined in https://lore.kernel.org/kvm/1611939252-7240-1-git-send-email-steven.sistare@oracle.com/
and integrated in Linux kernel 5.12.

To use the restart mode, qemu must be started with the memfd-alloc option,
which allocates guest ram using memfd_create.  The memfd's are saved to
the environment and kept open across exec, after which they are found from
the environment and re-mmap'd.  Hence guest ram is preserved in place,
albeit with new virtual addresses in the qemu process.

The caller resumes the guest by invoking cprload, which loads state from
the file.  If the VM was running at cprsave time, then VM execution resumes.
If the VM was suspended at cprsave time (reboot mode), then the caller must
issue a system_wakeup command to resume.

The first patches add reboot mode:
  - qemu_ram_volatile
  - cpr: reboot mode
  - cpr: QMP interfaces for reboot
  - cpr: HMP interfaces for reboot

The next patches add restart mode:
  - as_flat_walk
  - oslib: qemu_clr_cloexec
  - machine: memfd-alloc option
  - vl: add helper to request re-exec
  - string to strList
  - util: env var helpers
  - cpr: restart mode
  - cpr: QMP interfaces for restart
  - cpr: HMP interfaces for restart

The next patches add vfio support for restart mode:
  - pci: export functions for cpr
  - vfio-pci: refactor for cpr
  - vfio-pci: cpr part 1
  - vfio-pci: cpr part 2

The next patches preserve various descriptor-based backend devices across
cprexec:
  - vhost: reset vhost devices upon cprsave
  - hostmem-memfd: cpr support
  - chardev: cpr framework
  - chardev: cpr for simple devices
  - chardev: cpr for pty
  - chardev: cpr for sockets
  - cpr: only-cpr-capable option
  - simplify savevm

Here is an example of updating qemu from v4.2.0 to v4.2.1 using
restart mode.  The software update is performed while the guest is
running to minimize downtime.

window 1                                        | window 2
                                                |
# qemu-system-x86_64 ...                        |
QEMU 4.2.0 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: running                              |
                                                | # yum update qemu
(qemu) cprsave /tmp/qemu.sav restart            |
(qemu) cprexec qemu-system-x86_64 -S ...        |
QEMU 4.2.1 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: paused (prelaunch)                   |
(qemu) cprload /tmp/qemu.sav                    |
(qemu) info status                              |
VM status: running                              |


Here is an example of updating the host kernel using reboot mode.

window 1                                        | window 2
                                                |
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: running                              |
                                                | # yum update kernel-uek
(qemu) cprsave /tmp/qemu.sav restart            |
(qemu) quit                                     |
                                                |
# systemctl kexec                               |
kexec_core: Starting new kernel                 |
...                                             |
                                                |
# qemu-system-x86_64 -S mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...            |
(qemu) info status                              |
VM status: paused (prelaunch)                   |
(qemu) cprload /tmp/qemu.sav                    |
(qemu) info status                              |
VM status: running                              |

Changes from V1 to V2:
  - revert vmstate infrastructure changes
  - refactor cpr functions into new files
  - delete MADV_DOEXEC and use memfd + VFIO_DMA_UNMAP_FLAG_SUSPEND to
    preserve memory.
  - add framework to filter chardev's that support cpr
  - save and restore vfio eventfd's
  - modify cprinfo QMP interface
  - incorporate misc review feedback
  - remove unrelated and unneeded patches
  - refactor all patches into a shorter and easier to review series

Changes from V2 to V3:
  - rebase to qemu 6.0.0
  - use final definition of vfio ioctls (VFIO_DMA_UNMAP_FLAG_VADDR etc)
  - change memfd-alloc to a machine option
  - Use qio_channel_socket_new_fd instead of adding qio_channel_socket_new_fd
  - close monitor socket during cpr
  - fix a few unreported bugs
  - support memory-backend-memfd

Changes from V3 to V4:
  - split reboot mode into separate patches
  - add cprexec command
  - delete QEMU_START_FREEZE, argv_main, and /usr/bin/qemu-exec
  - add more checks for vfio and cpr compatibility, and recover after errors
  - save vfio pci config in vmstate
  - rename {setenv,getenv}_event_fd to {save,load}_event_fd
  - use qemu_strtol
  - change 6.0 references to 6.1
  - use strerror(), use EXIT_FAILURE, remove period from error messages
  - distribute MAINTAINERS additions to each patch

Changes from V4 to V5:
  - rebase to master

Steve Sistare (21):
  qemu_ram_volatile
  cpr: reboot mode
  as_flat_walk
  oslib: qemu_clr_cloexec
  machine: memfd-alloc option
  vl: add helper to request re-exec
  string to strList
  util: env var helpers
  cpr: restart mode
  cpr: QMP interfaces for restart
  cpr: HMP interfaces for restart
  pci: export functions for cpr
  vfio-pci: refactor for cpr
  vfio-pci: cpr part 1
  vfio-pci: cpr part 2
  hostmem-memfd: cpr support
  chardev: cpr framework
  chardev: cpr for simple devices
  chardev: cpr for pty
  cpr: only-cpr-capable option
  simplify savevm

Mark Kanda, Steve Sistare (4):
  cpr: QMP interfaces for reboot
  cpr: HMP interfaces for reboot
  vhost: reset vhost devices upon cprsave
  chardev: cpr for sockets

 MAINTAINERS                   |  12 +++
 backends/hostmem-memfd.c      |  21 ++--
 chardev/char-mux.c            |   1 +
 chardev/char-null.c           |   1 +
 chardev/char-pty.c            |  15 ++-
 chardev/char-serial.c         |   1 +
 chardev/char-socket.c         |  35 +++++++
 chardev/char-stdio.c          |   8 ++
 chardev/char.c                |  41 +++++++-
 gdbstub.c                     |   1 +
 hmp-commands.hx               |  62 ++++++++++++
 hw/core/machine.c             |  19 ++++
 hw/pci/msix.c                 |  20 ++--
 hw/pci/pci.c                  |   7 +-
 hw/vfio/common.c              |  78 ++++++++++++--
 hw/vfio/cpr.c                 | 154 ++++++++++++++++++++++++++++
 hw/vfio/meson.build           |   1 +
 hw/vfio/pci.c                 | 230 +++++++++++++++++++++++++++++++++++++++---
 hw/vfio/trace-events          |   1 +
 hw/virtio/vhost.c             |  11 ++
 include/chardev/char.h        |   6 ++
 include/exec/memory.h         |  25 +++++
 include/hw/boards.h           |   1 +
 include/hw/pci/msix.h         |   5 +
 include/hw/pci/pci.h          |   2 +
 include/hw/vfio/vfio-common.h |   8 ++
 include/hw/virtio/vhost.h     |   1 +
 include/migration/cpr.h       |  20 ++++
 include/monitor/hmp.h         |   4 +
 include/qemu/env.h            |  23 +++++
 include/qemu/osdep.h          |   1 +
 include/sysemu/runstate.h     |   2 +
 include/sysemu/sysemu.h       |   1 +
 linux-headers/linux/vfio.h    |   6 ++
 migration/cpr.c               | 195 +++++++++++++++++++++++++++++++++++
 migration/meson.build         |   1 +
 migration/migration.c         |   5 +
 migration/savevm.c            |  21 ++--
 migration/savevm.h            |   2 +
 monitor/hmp-cmds.c            |  75 ++++++++++++--
 monitor/hmp.c                 |   3 +
 monitor/qmp-cmds.c            |  36 +++++++
 monitor/qmp.c                 |   3 +
 qapi/char.json                |   5 +-
 qapi/cpr.json                 |  88 ++++++++++++++++
 qapi/meson.build              |   1 +
 qapi/qapi-schema.json         |   1 +
 qemu-options.hx               |  39 ++++++-
 softmmu/globals.c             |   1 +
 softmmu/memory.c              |  48 +++++++++
 softmmu/physmem.c             |  51 ++++++++--
 softmmu/runstate.c            |  58 ++++++++++-
 softmmu/vl.c                  |  14 ++-
 stubs/cpr.c                   |   3 +
 stubs/meson.build             |   1 +
 trace-events                  |   1 +
 util/env.c                    |  95 +++++++++++++++++
 util/meson.build              |   1 +
 util/oslib-posix.c            |   9 ++
 util/oslib-win32.c            |   4 +
 util/qemu-config.c            |   4 +
 61 files changed, 1506 insertions(+), 83 deletions(-)
 create mode 100644 hw/vfio/cpr.c
 create mode 100644 include/migration/cpr.h
 create mode 100644 include/qemu/env.h
 create mode 100644 migration/cpr.c
 create mode 100644 qapi/cpr.json
 create mode 100644 stubs/cpr.c
 create mode 100644 util/env.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 01/25] qemu_ram_volatile
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 12:01   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 02/25] cpr: reboot mode Steve Sistare
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a function that returns true if any ram_list block represents
volatile memory.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/exec/memory.h |  8 ++++++++
 softmmu/memory.c      | 30 ++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index b116f7c..7ad63f8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2649,6 +2649,14 @@ bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+/**
+ * qemu_ram_volatile: return true if any memory regions are writable and not
+ * backed by shared memory.
+ *
+ * @errp: returned error message identifying the bad region.
+ */
+bool qemu_ram_volatile(Error **errp);
+
 #endif
 
 #endif
diff --git a/softmmu/memory.c b/softmmu/memory.c
index f016151..e9536bc 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2714,6 +2714,36 @@ void memory_global_dirty_log_stop(void)
     memory_global_dirty_log_do_stop();
 }
 
+/*
+ * Return true if any memory regions are writable and not backed by shared
+ * memory.
+ */
+bool qemu_ram_volatile(Error **errp)
+{
+    RAMBlock *block;
+    MemoryRegion *mr;
+    bool ret = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        mr = block->mr;
+        if (mr &&
+            memory_region_is_ram(mr) &&
+            !memory_region_is_ram_device(mr) &&
+            !memory_region_is_rom(mr) &&
+            (block->fd == -1 || !qemu_ram_is_shared(block))) {
+
+            error_setg(errp, "Memory region %s is volatile",
+                       memory_region_name(mr));
+            ret = true;
+            break;
+        }
+    }
+
+    rcu_read_unlock();
+    return ret;
+}
+
 static void listener_add_address_space(MemoryListener *listener,
                                        AddressSpace *as)
 {
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 02/25] cpr: reboot mode
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 01/25] qemu_ram_volatile Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 12:25   ` Marc-André Lureau
  2021-08-04 15:48   ` Eric Blake
  2021-07-07 17:20 ` [PATCH V5 03/25] cpr: QMP interfaces for reboot Steve Sistare
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave and cprload functions for live update.  These save and
restore VM state, with minimal guest pause time, so that qemu may be updated
to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports any
type of guest image and block device, but the caller must not modify guest
block devices between cprsave and cprload.

cprsave supports several modes, the first of which is reboot.  In this mode,
the caller invokes cprsave and then terminates qemu.  The caller may then
update the host kernel and system software and reboot.  The caller resumes
the guest by running qemu with the same arguments as the original process
and invoking cprload.  To use this mode, guest ram must be mapped to a
persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.

The reboot mode supports vfio devices if the caller first suspends the
guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.

cprload loads state from the file.  If the VM was running at cprsave time,
then VM execution resumes.  If the VM was suspended at cprsave time, then
the caller must issue a system_wakeup command to resume.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS               |   7 +++
 include/migration/cpr.h   |  17 ++++++
 include/sysemu/runstate.h |   1 +
 migration/cpr.c           | 149 ++++++++++++++++++++++++++++++++++++++++++++++
 migration/meson.build     |   1 +
 migration/savevm.h        |   2 +
 softmmu/runstate.c        |  21 ++++++-
 7 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 684142e..c3573aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2858,6 +2858,13 @@ F: net/colo*
 F: net/filter-rewriter.c
 F: net/filter-mirror.c
 
+CPR
+M: Steve Sistare <steven.sistare@oracle.com>
+M: Mark Kanda <mark.kanda@oracle.com>
+S: Maintained
+F: include/migration/cpr.h
+F: migration/cpr.c
+
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
 R: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 0000000..bffee19
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+#include "qapi/qapi-types-cpr.h"
+
+void cprsave(const char *file, CprMode mode, Error **errp);
+void cprexec(strList *args, Error **errp);
+void cprload(const char *file, Error **errp);
+
+#endif
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index a535691..ed4b735 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -51,6 +51,7 @@ void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
 bool qemu_wakeup_suspend_enabled(void);
+void qemu_system_start_on_wake_request(void);
 void qemu_system_wakeup_request(WakeupReason reason, Error **errp);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 0000000..c5bad8a
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "monitor/monitor.h"
+#include "migration.h"
+#include "migration/snapshot.h"
+#include "chardev/char.h"
+#include "migration/misc.h"
+#include "migration/cpr.h"
+#include "migration/global_state.h"
+#include "qemu-file-channel.h"
+#include "qemu-file.h"
+#include "savevm.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "io/channel-buffer.h"
+#include "io/channel-file.h"
+#include "sysemu/cpu-timers.h"
+#include "sysemu/runstate.h"
+#include "sysemu/runstate-action.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
+#include "sysemu/xen.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/virtio/vhost.h"
+
+QEMUFile *qf_file_open(const char *path, int flags, int mode,
+                              const char *name, Error **errp)
+{
+    QIOChannelFile *fioc;
+    QIOChannel *ioc;
+    QEMUFile *f;
+
+    if (flags & O_RDWR) {
+        error_setg(errp, "qf_file_open %s: O_RDWR not supported", path);
+        return 0;
+    }
+
+    fioc = qio_channel_file_new_path(path, flags, mode, errp);
+    if (!fioc) {
+        return 0;
+    }
+
+    ioc = QIO_CHANNEL(fioc);
+    qio_channel_set_name(ioc, name);
+    f = (flags & O_WRONLY) ? qemu_fopen_channel_output(ioc) :
+                             qemu_fopen_channel_input(ioc);
+    object_unref(OBJECT(fioc));
+    return f;
+}
+
+void cprsave(const char *file, CprMode mode, Error **errp)
+{
+    int ret;
+    QEMUFile *f;
+    int saved_vm_running = runstate_is_running();
+
+    if (mode == CPR_MODE_REBOOT && qemu_ram_volatile(errp)) {
+        return;
+    }
+
+    if (migrate_colo_enabled()) {
+        error_setg(errp, "error: cprsave does not support x-colo");
+        return;
+    }
+
+    if (replay_mode != REPLAY_MODE_NONE) {
+        error_setg(errp, "error: cprsave does not support replay");
+        return;
+    }
+
+    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, "cprsave", errp);
+    if (!f) {
+        return;
+    }
+
+    if (global_state_store()) {
+        error_setg(errp, "Error saving global state");
+        qemu_fclose(f);
+        return;
+    }
+    if (runstate_check(RUN_STATE_SUSPENDED)) {
+        /* Update timers_state before saving.  Suspend did not so do. */
+        cpu_disable_ticks();
+    }
+    vm_stop(RUN_STATE_SAVE_VM);
+
+    ret = qemu_save_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while saving VM state", ret);
+        goto err;
+    }
+
+    goto done;
+
+err:
+    if (saved_vm_running) {
+        vm_start();
+    }
+done:
+    return;
+}
+
+void cprload(const char *file, Error **errp)
+{
+    QEMUFile *f;
+    int ret;
+    RunState state;
+
+    if (runstate_is_running()) {
+        error_setg(errp, "cprload called for a running VM");
+        return;
+    }
+
+    f = qf_file_open(file, O_RDONLY, 0, "cprload", errp);
+    if (!f) {
+        return;
+    }
+
+    if (qemu_get_be32(f) != QEMU_VM_FILE_MAGIC ||
+        qemu_get_be32(f) != QEMU_VM_FILE_VERSION) {
+        error_setg(errp, "error: %s is not a vmstate file", file);
+        return;
+    }
+
+    ret = qemu_load_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while loading VM state", ret);
+        return;
+    }
+
+    state = global_state_get_runstate();
+    if (state == RUN_STATE_RUNNING) {
+        vm_start();
+    } else {
+        runstate_set(state);
+        if (runstate_check(RUN_STATE_SUSPENDED)) {
+            qemu_system_start_on_wake_request();
+        }
+    }
+}
diff --git a/migration/meson.build b/migration/meson.build
index f8714dc..fd59281 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -15,6 +15,7 @@ softmmu_ss.add(files(
   'channel.c',
   'colo-failover.c',
   'colo.c',
+  'cpr.c',
   'exec.c',
   'fd.c',
   'global_state.c',
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342..ce5d710 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy, bool inactivate_disks);
+QEMUFile *qf_file_open(const char *path, int flags, int mode,
+                       const char *name, Error **errp);
 
 #endif
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 10d9b73..7fe4967 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -115,6 +115,8 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
     { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
@@ -335,6 +337,7 @@ void vm_state_notify(bool running, RunState state)
     }
 }
 
+static bool start_on_wake_requested;
 static ShutdownCause reset_requested;
 static ShutdownCause shutdown_requested;
 static int shutdown_signal;
@@ -562,6 +565,11 @@ void qemu_register_suspend_notifier(Notifier *notifier)
     notifier_list_add(&suspend_notifiers, notifier);
 }
 
+void qemu_system_start_on_wake_request(void)
+{
+    start_on_wake_requested = true;
+}
+
 void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
 {
     trace_system_wakeup_request(reason);
@@ -574,7 +582,18 @@ void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
     if (!(wakeup_reason_mask & (1 << reason))) {
         return;
     }
-    runstate_set(RUN_STATE_RUNNING);
+
+    /*
+     * Must call vm_start if it has never been called, to invoke the state
+     * change callbacks for the first time.
+     */
+    if (start_on_wake_requested) {
+        start_on_wake_requested = false;
+        vm_start();
+    } else {
+        runstate_set(RUN_STATE_RUNNING);
+    }
+
     wakeup_reason = reason;
     qemu_notify_event();
 }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 03/25] cpr: QMP interfaces for reboot
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 01/25] qemu_ram_volatile Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 02/25] cpr: reboot mode Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 13:27   ` Marc-André Lureau
  2021-08-04 15:48   ` Eric Blake
  2021-07-07 17:20 ` [PATCH V5 04/25] cpr: HMP " Steve Sistare
                   ` (21 subsequent siblings)
  24 siblings, 2 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave calls cprsave().  Syntax:
  { 'enum': 'CprMode', 'data': [ 'reboot' ] }
  { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }

cprload calls cprload().  Syntax:
  { 'command': 'cprload', 'data': { 'file': 'str' } }

cprinfo returns a list of supported modes.  Syntax:
  { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
  { 'command': 'cprinfo', 'returns': 'CprInfo' }

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS           |  1 +
 monitor/qmp-cmds.c    | 31 +++++++++++++++++++++
 qapi/cpr.json         | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/meson.build      |  1 +
 qapi/qapi-schema.json |  1 +
 5 files changed, 108 insertions(+)
 create mode 100644 qapi/cpr.json

diff --git a/MAINTAINERS b/MAINTAINERS
index c3573aa..c48dd37 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2864,6 +2864,7 @@ M: Mark Kanda <mark.kanda@oracle.com>
 S: Maintained
 F: include/migration/cpr.h
 F: migration/cpr.c
+F: qapi/cpr.json
 
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index f7d64a6..1128604 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -37,9 +37,11 @@
 #include "qapi/qapi-commands-machine.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-commands-ui.h"
+#include "qapi/qapi-commands-cpr.h"
 #include "qapi/qmp/qerror.h"
 #include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "migration/cpr.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -153,6 +155,35 @@ void qmp_cont(Error **errp)
     }
 }
 
+CprInfo *qmp_cprinfo(Error **errp)
+{
+    CprInfo *cprinfo;
+    CprModeList *mode, *mode_list = NULL;
+    CprMode i;
+
+    cprinfo = g_malloc0(sizeof(*cprinfo));
+
+    for (i = 0; i < CPR_MODE__MAX; i++) {
+        mode = g_malloc0(sizeof(*mode));
+        mode->value = i;
+        mode->next = mode_list;
+        mode_list = mode;
+    }
+
+    cprinfo->modes = mode_list;
+    return cprinfo;
+}
+
+void qmp_cprsave(const char *file, CprMode mode, Error **errp)
+{
+    cprsave(file, mode, errp);
+}
+
+void qmp_cprload(const char *file, Error **errp)
+{
+    cprload(file, errp);
+}
+
 void qmp_system_wakeup(Error **errp)
 {
     if (!qemu_wakeup_suspend_enabled()) {
diff --git a/qapi/cpr.json b/qapi/cpr.json
new file mode 100644
index 0000000..b6fdc89
--- /dev/null
+++ b/qapi/cpr.json
@@ -0,0 +1,74 @@
+# -*- Mode: Python -*-
+#
+# Copyright (c) 2021 Oracle and/or its affiliates.
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+##
+# = CPR
+##
+
+{ 'include': 'common.json' }
+
+##
+# @CprMode:
+#
+# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
+#
+# Since: 6.1
+##
+{ 'enum': 'CprMode',
+  'data': [ 'reboot' ] }
+
+
+##
+# @CprInfo:
+#
+# @modes: @CprMode list
+#
+# Since: 6.1
+##
+{ 'struct': 'CprInfo',
+  'data': { 'modes': [ 'CprMode' ] } }
+
+##
+# @cprinfo:
+#
+# Returns the modes supported by @cprsave.
+#
+# Returns: @CprInfo
+#
+# Since: 6.1
+#
+##
+{ 'command': 'cprinfo',
+  'returns': 'CprInfo' }
+
+##
+# @cprsave:
+#
+# Create a checkpoint of the virtual machine device state in @file.
+# Guest RAM and guest block device blocks are not saved.
+#
+# @file: name of checkpoint file
+# @mode: @CprMode mode
+#
+# Since: 6.1
+##
+{ 'command': 'cprsave',
+  'data': { 'file': 'str',
+            'mode': 'CprMode' } }
+
+##
+# @cprload:
+#
+# Start virtual machine from checkpoint file that was created earlier using
+# the cprsave command.
+#
+# @file: name of checkpoint file
+#
+# Since: 6.1
+##
+{ 'command': 'cprload',
+  'data': { 'file': 'str' } }
diff --git a/qapi/meson.build b/qapi/meson.build
index 376f4ce..7e7c48a 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -26,6 +26,7 @@ qapi_all_modules = [
   'common',
   'compat',
   'control',
+  'cpr',
   'crypto',
   'dump',
   'error',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 4912b97..001d790 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -77,6 +77,7 @@
 { 'include': 'ui.json' }
 { 'include': 'authz.json' }
 { 'include': 'migration.json' }
+{ 'include': 'cpr.json' }
 { 'include': 'transaction.json' }
 { 'include': 'trace.json' }
 { 'include': 'compat.json' }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 04/25] cpr: HMP interfaces for reboot
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (2 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 03/25] cpr: QMP interfaces for reboot Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-28  4:55   ` Zheng Chuan
  2021-07-07 17:20 ` [PATCH V5 05/25] as_flat_walk Steve Sistare
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave <file> <mode>
  Call cprsave().
  Arguments:
    file : save vmstate to this file name
    mode: must be "reboot"

cprload <file>
  Call cprload().
  Arguments:
    file : load vmstate from this file name

cprinfo
  Print to stdout a space-delimited list of modes supported by cprsave.
  Arguments: none

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/monitor/hmp.h |  3 +++
 monitor/hmp-cmds.c    | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8e45bce..11827ae 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -351,6 +351,50 @@ SRST
 ERST
 
     {
+        .name       = "cprinfo",
+        .args_type  = "",
+        .params     = "",
+        .help       = "return list of modes supported by cprsave",
+        .cmd        = hmp_cprinfo,
+    },
+
+SRST
+``cprinfo``
+Return a space-delimited list of modes supported by cprsave.
+ERST
+
+    {
+        .name       = "cprsave",
+        .args_type  = "file:s,mode:s",
+        .params     = "file 'reboot'",
+        .help       = "create a checkpoint of the VM in file",
+        .cmd        = hmp_cprsave,
+    },
+
+SRST
+``cprsave`` *file* *mode*
+Pause the VCPUs,
+create a checkpoint of the whole virtual machine, and save it in *file*.
+If *mode* is 'reboot', the checkpoint remains valid after a host kexec
+reboot, and guest ram must be backed by persistant shared memory.  To
+resume from the checkpoint, issue the quit command, reboot the system,
+and issue the cprload command.
+ERST
+
+    {
+        .name       = "cprload",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "load VM checkpoint from file",
+        .cmd        = hmp_cprload,
+    },
+
+SRST
+``cprload`` *file*
+Load a virtual machine from checkpoint file *file* and continue VCPUs.
+ERST
+
+    {
         .name       = "delvm",
         .args_type  = "name:s",
         .params     = "tag",
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 3baa105..98bb775 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -58,6 +58,9 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_loadvm(Monitor *mon, const QDict *qdict);
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
+void hmp_cprinfo(Monitor *mon, const QDict *qdict);
+void hmp_cprsave(Monitor *mon, const QDict *qdict);
+void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 0942027..8e80581 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -33,6 +33,7 @@
 #include "qapi/qapi-commands-block.h"
 #include "qapi/qapi-commands-char.h"
 #include "qapi/qapi-commands-control.h"
+#include "qapi/qapi-commands-cpr.h"
 #include "qapi/qapi-commands-machine.h"
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-commands-misc.h"
@@ -1177,6 +1178,53 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
     qapi_free_AnnounceParameters(params);
 }
 
+void hmp_cprinfo(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    CprInfo *cprinfo;
+    CprModeList *mode;
+
+    cprinfo = qmp_cprinfo(&err);
+    if (err) {
+        goto out;
+    }
+
+    for (mode = cprinfo->modes; mode; mode = mode->next) {
+        monitor_printf(mon, "%s ", CprMode_str(mode->value));
+    }
+
+out:
+    hmp_handle_error(mon, err);
+    qapi_free_CprInfo(cprinfo);
+}
+
+void hmp_cprsave(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *mode;
+    int val;
+
+    mode = qdict_get_try_str(qdict, "mode");
+    val = qapi_enum_parse(&CprMode_lookup, mode, -1, &err);
+
+    if (val == -1) {
+        goto out;
+    }
+
+    qmp_cprsave(qdict_get_try_str(qdict, "file"), val, &err);
+
+out:
+    hmp_handle_error(mon, err);
+}
+
+void hmp_cprload(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_cprload(qdict_get_try_str(qdict, "file"), &err);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 05/25] as_flat_walk
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (3 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 04/25] cpr: HMP " Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 13:49   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 06/25] oslib: qemu_clr_cloexec Steve Sistare
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add an iterator over the sections of a flattened address space.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/exec/memory.h | 17 +++++++++++++++++
 softmmu/memory.c      | 18 ++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7ad63f8..a030aef 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2023,6 +2023,23 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
  */
 bool memory_region_is_mapped(MemoryRegion *mr);
 
+typedef int (*qemu_flat_walk_cb)(MemoryRegionSection *s,
+                                 void *handle,
+                                 Error **errp);
+
+/**
+ * as_flat_walk: walk the ranges in the address space flat view and call @func
+ * for each.  Return 0 on success, else return non-zero with a message in
+ * @errp.
+ *
+ * @as: target address space
+ * @func: callback function
+ * @handle: passed to @func
+ * @errp: passed to @func
+ */
+int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
+                 void *handle, Error **errp);
+
 /**
  * memory_region_find: translate an address/size relative to a
  * MemoryRegion into a #MemoryRegionSection.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e9536bc..1ec1e25 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2577,6 +2577,24 @@ bool memory_region_is_mapped(MemoryRegion *mr)
     return mr->container ? true : false;
 }
 
+int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
+                 void *handle, Error **errp)
+{
+    FlatView *view = address_space_get_flatview(as);
+    FlatRange *fr;
+    int ret;
+
+    FOR_EACH_FLAT_RANGE(fr, view) {
+        MemoryRegionSection section = section_from_flat_range(fr, view);
+        ret = func(&section, handle, errp);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /* Same as memory_region_find, but it does not add a reference to the
  * returned region.  It must be called from an RCU critical section.
  */
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 06/25] oslib: qemu_clr_cloexec
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (4 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 05/25] as_flat_walk Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 13:58   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 07/25] machine: memfd-alloc option Steve Sistare
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Define qemu_clr_cloexec, analogous to qemu_set_cloexec.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/qemu/osdep.h | 1 +
 util/oslib-posix.c   | 9 +++++++++
 util/oslib-win32.c   | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index c91a78b..3d6a6ca 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -637,6 +637,7 @@ static inline void qemu_timersub(const struct timeval *val1,
 #endif
 
 void qemu_set_cloexec(int fd);
+void qemu_clr_cloexec(int fd);
 
 /* Starting on QEMU 2.5, qemu_hw_version() returns "2.5+" by default
  * instead of QEMU_VERSION, so setting hw_version on MachineClass
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e8bdb02..97577f1 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -309,6 +309,15 @@ void qemu_set_cloexec(int fd)
     assert(f != -1);
 }
 
+void qemu_clr_cloexec(int fd)
+{
+    int f;
+    f = fcntl(fd, F_GETFD);
+    assert(f != -1);
+    f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
+    assert(f != -1);
+}
+
 /*
  * Creates a pipe with FD_CLOEXEC set on both file descriptors
  */
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index af559ef..46e94d9 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -265,6 +265,10 @@ void qemu_set_cloexec(int fd)
 {
 }
 
+void qemu_clr_cloexec(int fd)
+{
+}
+
 /* Offset between 1/1/1601 and 1/1/1970 in 100 nanosec units */
 #define _W32_FT_OFFSET (116444736000000000ULL)
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 07/25] machine: memfd-alloc option
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (5 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 06/25] oslib: qemu_clr_cloexec Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 14:20   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 08/25] vl: add helper to request re-exec Steve Sistare
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Allocate anonymous memory using memfd_create if the memfd-alloc machine
option is set.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/core/machine.c   | 19 +++++++++++++++++++
 include/hw/boards.h |  1 +
 qemu-options.hx     |  5 +++++
 softmmu/physmem.c   | 42 +++++++++++++++++++++++++++++++++---------
 trace-events        |  1 +
 util/qemu-config.c  |  4 ++++
 6 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 57c18f9..f0656a8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -383,6 +383,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
     ms->mem_merge = value;
 }
 
+static bool machine_get_memfd_alloc(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->memfd_alloc;
+}
+
+static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->memfd_alloc = value;
+}
+
 static bool machine_get_usb(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -917,6 +931,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
     object_class_property_set_description(oc, "mem-merge",
         "Enable/disable memory merge support");
 
+    object_class_property_add_bool(oc, "memfd-alloc",
+        machine_get_memfd_alloc, machine_set_memfd_alloc);
+    object_class_property_set_description(oc, "memfd-alloc",
+        "Enable/disable allocating anonymous memory using memfd_create");
+
     object_class_property_add_bool(oc, "usb",
         machine_get_usb, machine_set_usb);
     object_class_property_set_description(oc, "usb",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index accd6ef..299e1ca 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -305,6 +305,7 @@ struct MachineState {
     char *dt_compatible;
     bool dump_guest_core;
     bool mem_merge;
+    bool memfd_alloc;
     bool usb;
     bool usb_disabled;
     char *firmware;
diff --git a/qemu-options.hx b/qemu-options.hx
index 8965dab..fa53734 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
+    "                memfd-alloc=on|off controls allocating anonymous memory using memfd_create (default: off)\n"
     "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
     "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
@@ -76,6 +77,10 @@ SRST
         supported by the host, de-duplicates identical memory pages
         among VMs instances (enabled by default).
 
+    ``memfd-alloc=on|off``
+        Enables or disables allocation of anonymous memory using memfd_create.
+        (disabled by default).
+
     ``aes-key-wrap=on|off``
         Enables or disables AES key wrapping support on s390-ccw hosts.
         This feature controls whether AES wrapping keys will be created
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 9b171c9..b149250 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -64,6 +64,7 @@
 
 #include "qemu/pmem.h"
 
+#include "qemu/memfd.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1960,35 +1961,58 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
     const bool shared = qemu_ram_is_shared(new_block);
     RAMBlock *block;
     RAMBlock *last_block = NULL;
+    struct MemoryRegion *mr = new_block->mr;
     ram_addr_t old_ram_size, new_ram_size;
     Error *err = NULL;
+    const char *name;
+    void *addr = 0;
+    size_t maxlen;
+    MachineState *ms = MACHINE(qdev_get_machine());
 
     old_ram_size = last_ram_page();
 
     qemu_mutex_lock_ramlist();
-    new_block->offset = find_ram_offset(new_block->max_length);
+    maxlen = new_block->max_length;
+    new_block->offset = find_ram_offset(maxlen);
 
     if (!new_block->host) {
         if (xen_enabled()) {
-            xen_ram_alloc(new_block->offset, new_block->max_length,
-                          new_block->mr, &err);
+            xen_ram_alloc(new_block->offset, maxlen, new_block->mr, &err);
             if (err) {
                 error_propagate(errp, err);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
         } else {
-            new_block->host = qemu_anon_ram_alloc(new_block->max_length,
-                                                  &new_block->mr->align,
-                                                  shared, noreserve);
-            if (!new_block->host) {
+            name = memory_region_name(new_block->mr);
+            if (ms->memfd_alloc) {
+                int mfd = -1;          /* placeholder until next patch */
+                mr->align = QEMU_VMALLOC_ALIGN;
+                if (mfd < 0) {
+                    mfd = qemu_memfd_create(name, maxlen + mr->align,
+                                            0, 0, 0, &err);
+                    if (mfd < 0) {
+                        return;
+                    }
+                }
+                new_block->flags |= RAM_SHARED;
+                addr = file_ram_alloc(new_block, maxlen, mfd,
+                                      false, false, 0, errp);
+                trace_anon_memfd_alloc(name, maxlen, addr, mfd);
+            } else {
+                addr = qemu_anon_ram_alloc(maxlen, &mr->align,
+                                           shared, noreserve);
+            }
+
+            if (!addr) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
-                                 memory_region_name(new_block->mr));
+                                 name);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
+            memory_try_enable_merging(addr, maxlen);
+            new_block->host = addr;
         }
     }
 
diff --git a/trace-events b/trace-events
index 765fe25..6dbcd0e 100644
--- a/trace-events
+++ b/trace-events
@@ -40,6 +40,7 @@ ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_
 # accel/tcg/cputlb.c
 memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
 memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
+anon_memfd_alloc(const char *name, size_t size, void *ptr, int fd) "%s size %zu ptr %p fd %d"
 
 # gdbstub.c
 gdbstub_op_start(const char *device) "Starting gdbstub using device %s"
diff --git a/util/qemu-config.c b/util/qemu-config.c
index 84ee6dc..6162b4d 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -207,6 +207,10 @@ static QemuOptsList machine_opts = {
             .type = QEMU_OPT_BOOL,
             .help = "enable/disable memory merge support",
         },{
+            .name = "memfd-alloc",
+            .type = QEMU_OPT_BOOL,
+            .help = "enable/disable memfd_create for anonymous memory",
+        },{
             .name = "usb",
             .type = QEMU_OPT_BOOL,
             .help = "Set on/off to enable/disable usb",
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 08/25] vl: add helper to request re-exec
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (6 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 07/25] machine: memfd-alloc option Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 14:31   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 09/25] string to strList Steve Sistare
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a qemu_system_exec_request() hook that causes the main loop to exit and
re-exec qemu using the specified arguments.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/runstate.h |  1 +
 softmmu/runstate.c        | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index ed4b735..e1ae7e5 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -57,6 +57,7 @@ void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_register_wakeup_support(void);
 void qemu_system_shutdown_request(ShutdownCause reason);
+void qemu_system_exec_request(strList *args);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_register_shutdown_notifier(Notifier *notifier);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 7fe4967..8474a01 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -355,6 +355,7 @@ static NotifierList wakeup_notifiers =
 static NotifierList shutdown_notifiers =
     NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
+static char **exec_argv;
 
 ShutdownCause qemu_shutdown_requested_get(void)
 {
@@ -371,6 +372,11 @@ static int qemu_shutdown_requested(void)
     return qatomic_xchg(&shutdown_requested, SHUTDOWN_CAUSE_NONE);
 }
 
+static int qemu_exec_requested(void)
+{
+    return exec_argv != NULL;
+}
+
 static void qemu_kill_report(void)
 {
     if (!qtest_driver() && shutdown_signal) {
@@ -645,6 +651,32 @@ void qemu_system_shutdown_request(ShutdownCause reason)
     qemu_notify_event();
 }
 
+static char **make_argv(strList *args)
+{
+    strList *arg;
+    char **argv;
+    int n = 1, i = 0;
+
+    for (arg = args; arg != NULL; arg = arg->next) {
+        n++;
+    }
+
+    argv = g_malloc(n * sizeof(char *));
+    for (arg = args; arg != NULL; arg = arg->next) {
+        argv[i++] = g_strdup(arg->value);
+    }
+    argv[i] = NULL;
+
+    return argv;
+}
+
+void qemu_system_exec_request(strList *args)
+{
+    exec_argv = make_argv(args);
+    shutdown_requested = 1;
+    qemu_notify_event();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown();
@@ -693,6 +725,11 @@ static bool main_loop_should_exit(void)
     }
     request = qemu_shutdown_requested();
     if (request) {
+
+        if (qemu_exec_requested()) {
+            execvp(exec_argv[0], exec_argv);
+            error_setg_errno(&error_fatal, errno, "execvp failed");
+        }
         qemu_kill_report();
         qemu_system_shutdown(request);
         if (shutdown_action == SHUTDOWN_ACTION_PAUSE) {
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 09/25] string to strList
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (7 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 08/25] vl: add helper to request re-exec Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 14:37   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 10/25] util: env var helpers Steve Sistare
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Generalize strList_from_comma_list to take any delimiter character.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 monitor/hmp-cmds.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 8e80581..a56f83c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -71,21 +71,21 @@ void hmp_handle_error(Monitor *mon, Error *err)
 }
 
 /*
- * Produce a strList from a comma separated list.
- * A NULL or empty input string return NULL.
+ * Produce a strList from a character delimited string.
+ * A NULL or empty input string returns NULL.
  */
-static strList *strList_from_comma_list(const char *in)
+static strList *strList_from_string(const char *in, char delim)
 {
     strList *res = NULL;
     strList **tail = &res;
 
     while (in && in[0]) {
-        char *comma = strchr(in, ',');
+        char *next = strchr(in, delim);
         char *value;
 
-        if (comma) {
-            value = g_strndup(in, comma - in);
-            in = comma + 1; /* skip the , */
+        if (next) {
+            value = g_strndup(in, next - in);
+            in = next + 1; /* skip the delim */
         } else {
             value = g_strdup(in);
             in = NULL;
@@ -1170,7 +1170,7 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
                                             migrate_announce_params());
 
     qapi_free_strList(params->interfaces);
-    params->interfaces = strList_from_comma_list(interfaces_str);
+    params->interfaces = strList_from_string(interfaces_str, ',');
     params->has_interfaces = params->interfaces != NULL;
     params->id = g_strdup(id);
     params->has_id = !!params->id;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 10/25] util: env var helpers
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (8 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 09/25] string to strList Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 15:10   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 11/25] cpr: restart mode Steve Sistare
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add functions for saving fd's and other values in the environment via
setenv, and for reading them back via getenv.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS        |  2 ++
 include/qemu/env.h | 23 +++++++++++++
 util/env.c         | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 util/meson.build   |  1 +
 4 files changed, 121 insertions(+)
 create mode 100644 include/qemu/env.h
 create mode 100644 util/env.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c48dd37..8647a97 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2865,6 +2865,8 @@ S: Maintained
 F: include/migration/cpr.h
 F: migration/cpr.c
 F: qapi/cpr.json
+F: include/qemu/env.h
+F: util/env.c
 
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
diff --git a/include/qemu/env.h b/include/qemu/env.h
new file mode 100644
index 0000000..3dad503
--- /dev/null
+++ b/include/qemu/env.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_ENV_H
+#define QEMU_ENV_H
+
+#define FD_PREFIX "QEMU_FD_"
+
+typedef int (*walkenv_cb)(const char *name, const char *val, void *handle);
+
+int getenv_fd(const char *name);
+void setenv_fd(const char *name, int fd);
+void unsetenv_fd(const char *name);
+void unsetenv_fdv(const char *fmt, ...);
+int walkenv(const char *prefix, walkenv_cb cb, void *handle);
+void printenv(void);
+
+#endif
diff --git a/util/env.c b/util/env.c
new file mode 100644
index 0000000..863678d
--- /dev/null
+++ b/util/env.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/env.h"
+
+static uint64_t getenv_ulong(const char *prefix, const char *name, int *err)
+{
+    char var[80], *val;
+    uint64_t res = 0;
+
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    val = getenv(var);
+    if (val) {
+        *err = qemu_strtoul(val, NULL, 10, &res);
+    } else {
+        *err = -ENOENT;
+    }
+    return res;
+}
+
+static void setenv_ulong(const char *prefix, const char *name, uint64_t val)
+{
+    char var[80], val_str[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
+    setenv(var, val_str, 1);
+}
+
+static void unsetenv_ulong(const char *prefix, const char *name)
+{
+    char var[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    unsetenv(var);
+}
+
+int getenv_fd(const char *name)
+{
+    int err;
+    int fd = getenv_ulong(FD_PREFIX, name, &err);
+    return err ? -1 : fd;
+}
+
+void setenv_fd(const char *name, int fd)
+{
+    setenv_ulong(FD_PREFIX, name, fd);
+}
+
+void unsetenv_fd(const char *name)
+{
+    unsetenv_ulong(FD_PREFIX, name);
+}
+
+void unsetenv_fdv(const char *fmt, ...)
+{
+    va_list args;
+    char buf[80];
+    va_start(args, fmt);
+    vsnprintf(buf, sizeof(buf), fmt, args);
+    va_end(args);
+}
+
+int walkenv(const char *prefix, walkenv_cb cb, void *handle)
+{
+    char *str, name[128];
+    char **envp = environ;
+    size_t prefix_len = strlen(prefix);
+
+    while (*envp) {
+        str = *envp++;
+        if (!strncmp(str, prefix, prefix_len)) {
+            char *val = strchr(str, '=');
+            str += prefix_len;
+            strncpy(name, str, val - str);
+            name[val - str] = 0;
+            if (cb(name, val + 1, handle)) {
+                return 1;
+            }
+        }
+    }
+    return 0;
+}
+
+void printenv(void)
+{
+    char **ptr = environ;
+    while (*ptr) {
+        puts(*ptr++);
+    }
+}
diff --git a/util/meson.build b/util/meson.build
index 0ffd7f4..5e8097a 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -23,6 +23,7 @@ util_ss.add(files('host-utils.c'))
 util_ss.add(files('bitmap.c', 'bitops.c'))
 util_ss.add(files('fifo8.c'))
 util_ss.add(files('cacheinfo.c', 'cacheflush.c'))
+util_ss.add(files('env.c'))
 util_ss.add(files('error.c', 'qemu-error.c'))
 util_ss.add(files('qemu-print.c'))
 util_ss.add(files('id.c'))
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 11/25] cpr: restart mode
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (9 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 10/25] util: env var helpers Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 15:43   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 12/25] cpr: QMP interfaces for restart Steve Sistare
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave restart mode, which preserves the guest VM across a
restart of the qemu process.  After cprsave, the caller passes qemu
command-line arguments to cprexec, which directly exec's the new qemu
binary.  The arguments must include -S so new qemu starts in a paused state.
The caller resumes the guest by calling cprload.

To use the restart mode, qemu must be started with the memfd-alloc machine
option.  The memfd's are saved to the environment and kept open across exec,
after which they are found from the environment and re-mmap'd.  Hence guest
ram is preserved in place, albeit with new virtual addresses in the qemu
process.

The restart mode supports vfio devices in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/cpr.c   | 21 +++++++++++++++++++++
 softmmu/physmem.c |  6 +++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/migration/cpr.c b/migration/cpr.c
index c5bad8a..fb57dec 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -29,6 +29,7 @@
 #include "sysemu/xen.h"
 #include "hw/vfio/vfio-common.h"
 #include "hw/virtio/vhost.h"
+#include "qemu/env.h"
 
 QEMUFile *qf_file_open(const char *path, int flags, int mode,
                               const char *name, Error **errp)
@@ -108,6 +109,26 @@ done:
     return;
 }
 
+static int preserve_fd(const char *name, const char *val, void *handle)
+{
+    qemu_clr_cloexec(atoi(val));
+    return 0;
+}
+
+void cprexec(strList *args, Error **errp)
+{
+    if (xen_enabled()) {
+        error_setg(errp, "xen does not support cprexec");
+        return;
+    }
+    if (!runstate_check(RUN_STATE_SAVE_VM)) {
+        error_setg(errp, "runstate is not save-vm");
+        return;
+    }
+    walkenv(FD_PREFIX, preserve_fd, 0);
+    qemu_system_exec_request(args);
+}
+
 void cprload(const char *file, Error **errp)
 {
     QEMUFile *f;
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index b149250..8a65ef7 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -65,6 +65,7 @@
 #include "qemu/pmem.h"
 
 #include "qemu/memfd.h"
+#include "qemu/env.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1986,7 +1987,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         } else {
             name = memory_region_name(new_block->mr);
             if (ms->memfd_alloc) {
-                int mfd = -1;          /* placeholder until next patch */
+                int mfd = getenv_fd(name);
                 mr->align = QEMU_VMALLOC_ALIGN;
                 if (mfd < 0) {
                     mfd = qemu_memfd_create(name, maxlen + mr->align,
@@ -1994,7 +1995,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                     if (mfd < 0) {
                         return;
                     }
+                    setenv_fd(name, mfd);
                 }
+                qemu_clr_cloexec(mfd);
                 new_block->flags |= RAM_SHARED;
                 addr = file_ram_alloc(new_block, maxlen, mfd,
                                       false, false, 0, errp);
@@ -2246,6 +2249,7 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     qemu_mutex_lock_ramlist();
+    unsetenv_fd(memory_region_name(block->mr));
     QLIST_REMOVE_RCU(block, next);
     ram_list.mru_block = NULL;
     /* Write list before version */
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 12/25] cpr: QMP interfaces for restart
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (10 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 11/25] cpr: restart mode Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 15:49   ` Marc-André Lureau
  2021-08-04 16:00   ` Eric Blake
  2021-07-07 17:20 ` [PATCH V5 13/25] cpr: HMP " Steve Sistare
                   ` (12 subsequent siblings)
  24 siblings, 2 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprexec calls cprexec().  Syntax:
  { 'command': 'cprexec', 'data': { 'argv': [ 'str' ] } }

Add the restart mode:
  { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 monitor/qmp-cmds.c |  5 +++++
 qapi/cpr.json      | 16 +++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 1128604..7326f7d 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -179,6 +179,11 @@ void qmp_cprsave(const char *file, CprMode mode, Error **errp)
     cprsave(file, mode, errp);
 }
 
+void qmp_cprexec(strList *args, Error **errp)
+{
+    cprexec(args, errp);
+}
+
 void qmp_cprload(const char *file, Error **errp)
 {
     cprload(file, errp);
diff --git a/qapi/cpr.json b/qapi/cpr.json
index b6fdc89..2467e48 100644
--- a/qapi/cpr.json
+++ b/qapi/cpr.json
@@ -16,10 +16,12 @@
 #
 # @reboot: checkpoint can be cprload'ed after a host kexec reboot.
 #
+# @restart: checkpoint can be cprload'ed after restarting qemu.
+#
 # Since: 6.1
 ##
 { 'enum': 'CprMode',
-  'data': [ 'reboot' ] }
+  'data': [ 'reboot', 'restart' ] }
 
 
 ##
@@ -61,6 +63,18 @@
             'mode': 'CprMode' } }
 
 ##
+# @cprexec:
+#
+# Restart qemu.
+#
+# @argv: arguments to exec
+#
+# Since: 6.1
+##
+{ 'command': 'cprexec',
+  'data': { 'argv': [ 'str' ] } }
+
+##
 # @cprload:
 #
 # Start virtual machine from checkpoint file that was created earlier using
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 13/25] cpr: HMP interfaces for restart
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (11 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 12/25] cpr: QMP interfaces for restart Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-28  4:56   ` Zheng Chuan
  2021-07-07 17:20 ` [PATCH V5 14/25] pci: export functions for cpr Steve Sistare
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave <file> <mode>
  mode may be "restart"

cprexec <command>
  Call cprexec().
  Arguments:
    command : command line to execute, with space-separated arguments

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 20 +++++++++++++++++++-
 include/monitor/hmp.h |  1 +
 monitor/hmp-cmds.c    | 11 +++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 11827ae..d956405 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -366,7 +366,7 @@ ERST
     {
         .name       = "cprsave",
         .args_type  = "file:s,mode:s",
-        .params     = "file 'reboot'",
+        .params     = "file 'restart'|'reboot'",
         .help       = "create a checkpoint of the VM in file",
         .cmd        = hmp_cprsave,
     },
@@ -379,6 +379,24 @@ If *mode* is 'reboot', the checkpoint remains valid after a host kexec
 reboot, and guest ram must be backed by persistant shared memory.  To
 resume from the checkpoint, issue the quit command, reboot the system,
 and issue the cprload command.
+
+If *mode* is 'restart', the checkpoint remains valid after restarting qemu,
+and guest ram must be allocated with the memfd-alloc machine option.  To
+resume from the checkpoint, issue the cprexec command to restart, and issue
+the cprload command.
+ERST
+
+    {
+        .name       = "cprexec",
+        .args_type  = "command:S",
+        .params     = "command",
+        .help       = "Restart qemu by directly exec'ing command",
+        .cmd        = hmp_cprexec,
+    },
+
+SRST
+``cprexec`` *command*
+Restart qemu by directly exec'ing *command*, replacing the qemu process.
 ERST
 
     {
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 98bb775..ffc5eb1 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -60,6 +60,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
 void hmp_cprinfo(Monitor *mon, const QDict *qdict);
 void hmp_cprsave(Monitor *mon, const QDict *qdict);
+void hmp_cprexec(Monitor *mon, const QDict *qdict);
 void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index a56f83c..163564e 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1217,6 +1217,17 @@ out:
     hmp_handle_error(mon, err);
 }
 
+void hmp_cprexec(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *command = qdict_get_try_str(qdict, "command");
+    strList *args = strList_from_string(command, ' ');
+
+    qmp_cprexec(args, &err);
+    qapi_free_strList(args);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_cprload(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 14/25] pci: export functions for cpr
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (12 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 13/25] cpr: HMP " Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 15/25] vfio-pci: refactor " Steve Sistare
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Export msix_is_pending, msix_init_vector_notifiers, and pci_update_mappings
for use by cpr.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/pci/msix.c         | 20 ++++++++++++++------
 hw/pci/pci.c          |  3 +--
 include/hw/pci/msix.h |  5 +++++
 include/hw/pci/pci.h  |  1 +
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331c..73f4259 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -64,7 +64,7 @@ static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
     return dev->msix_pba + vector / 8;
 }
 
-static int msix_is_pending(PCIDevice *dev, int vector)
+int msix_is_pending(PCIDevice *dev, unsigned int vector)
 {
     return *msix_pending_byte(dev, vector) & msix_pending_mask(vector);
 }
@@ -579,6 +579,17 @@ static void msix_unset_notifier_for_vector(PCIDevice *dev, unsigned int vector)
     dev->msix_vector_release_notifier(dev, vector);
 }
 
+void msix_init_vector_notifiers(PCIDevice *dev,
+                                MSIVectorUseNotifier use_notifier,
+                                MSIVectorReleaseNotifier release_notifier,
+                                MSIVectorPollNotifier poll_notifier)
+{
+    assert(use_notifier && release_notifier);
+    dev->msix_vector_use_notifier = use_notifier;
+    dev->msix_vector_release_notifier = release_notifier;
+    dev->msix_vector_poll_notifier = poll_notifier;
+}
+
 int msix_set_vector_notifiers(PCIDevice *dev,
                               MSIVectorUseNotifier use_notifier,
                               MSIVectorReleaseNotifier release_notifier,
@@ -586,11 +597,8 @@ int msix_set_vector_notifiers(PCIDevice *dev,
 {
     int vector, ret;
 
-    assert(use_notifier && release_notifier);
-
-    dev->msix_vector_use_notifier = use_notifier;
-    dev->msix_vector_release_notifier = release_notifier;
-    dev->msix_vector_poll_notifier = poll_notifier;
+    msix_init_vector_notifiers(dev, use_notifier, release_notifier,
+                               poll_notifier);
 
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 377084f..2590898 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -225,7 +225,6 @@ static const TypeInfo pcie_bus_info = {
 };
 
 static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num);
-static void pci_update_mappings(PCIDevice *d);
 static void pci_irq_handler(void *opaque, int irq_num, int level);
 static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, Error **);
 static void pci_del_option_rom(PCIDevice *pdev);
@@ -1334,7 +1333,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
     return new_addr;
 }
 
-static void pci_update_mappings(PCIDevice *d)
+void pci_update_mappings(PCIDevice *d)
 {
     PCIIORegion *r;
     int i;
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 4c4a60c..46606cf 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -32,6 +32,7 @@ int msix_present(PCIDevice *dev);
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
 void msix_clr_pending(PCIDevice *dev, int vector);
+int msix_is_pending(PCIDevice *dev, unsigned vector);
 
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
@@ -41,6 +42,10 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
+void msix_init_vector_notifiers(PCIDevice *dev,
+                                MSIVectorUseNotifier use_notifier,
+                                MSIVectorReleaseNotifier release_notifier,
+                                MSIVectorPollNotifier poll_notifier);
 int msix_set_vector_notifiers(PCIDevice *dev,
                               MSIVectorUseNotifier use_notifier,
                               MSIVectorReleaseNotifier release_notifier,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6be4e0c..bef3e49 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -902,5 +902,6 @@ extern const VMStateDescription vmstate_pci_device;
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+void pci_update_mappings(PCIDevice *d);
 
 #endif
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 15/25] vfio-pci: refactor for cpr
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (13 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 14/25] pci: export functions for cpr Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 16/25] vfio-pci: cpr part 1 Steve Sistare
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Export vfio_address_spaces and vfio_listener_skipped_section.
Add optional eventfd arg to vfio_add_kvm_msi_virq.
Refactor vector use into a helper vfio_vector_init.
All for use by cpr in a subsequent patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/common.c              |  4 ++--
 hw/vfio/pci.c                 | 41 ++++++++++++++++++++++++++++++-----------
 include/hw/vfio/vfio-common.h |  3 +++
 3 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ae5654f..9220e64 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -42,7 +42,7 @@
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
+VFIOAddressSpaceList vfio_address_spaces =
     QLIST_HEAD_INITIALIZER(vfio_address_spaces);
 
 #ifdef CONFIG_KVM
@@ -534,7 +534,7 @@ static int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
     return -1;
 }
 
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
             !memory_region_is_iommu(section->mr)) ||
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ab4077a..9fc12bc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -409,8 +409,19 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
     return ret;
 }
 
+/* Create new or reuse existing eventfd */
+static int vfio_event_notifier_init(EventNotifier *e, int eventfd)
+{
+    if (eventfd < 0) {
+        return event_notifier_init(e, 0);
+    }
+
+    event_notifier_init_fd(e, eventfd);
+    return 0;
+}
+
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
-                                  int vector_n, bool msix)
+                                  int vector_n, bool msix, int eventfd)
 {
     int virq;
 
@@ -418,7 +429,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
         return;
     }
 
-    if (event_notifier_init(&vector->kvm_interrupt, 0)) {
+    if (vfio_event_notifier_init(&vector->kvm_interrupt, eventfd)) {
         return;
     }
 
@@ -454,6 +465,20 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
     kvm_irqchip_commit_routes(kvm_state);
 }
 
+static void vfio_vector_init(VFIOPCIDevice *vdev, int nr, int eventfd)
+{
+    VFIOMSIVector *vector = &vdev->msi_vectors[nr];
+    PCIDevice *pdev = &vdev->pdev;
+
+    vector->vdev = vdev;
+    vector->virq = -1;
+    if (vfio_event_notifier_init(&vector->interrupt, eventfd)) {
+        error_report("vfio: Error: event_notifier_init failed");
+    }
+    vector->use = true;
+    msix_vector_use(pdev, nr);
+}
+
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
@@ -466,13 +491,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
     vector = &vdev->msi_vectors[nr];
 
     if (!vector->use) {
-        vector->vdev = vdev;
-        vector->virq = -1;
-        if (event_notifier_init(&vector->interrupt, 0)) {
-            error_report("vfio: Error: event_notifier_init failed");
-        }
-        vector->use = true;
-        msix_vector_use(pdev, nr);
+        vfio_vector_init(vdev, nr, -1);
     }
 
     qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
@@ -490,7 +509,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
         }
     } else {
         if (msg) {
-            vfio_add_kvm_msi_virq(vdev, vector, nr, true);
+            vfio_add_kvm_msi_virq(vdev, vector, nr, true, -1);
         }
     }
 
@@ -640,7 +659,7 @@ retry:
          * Attempt to enable route through KVM irqchip,
          * default to userspace handling if unavailable.
          */
-        vfio_add_kvm_msi_virq(vdev, vector, i, false);
+        vfio_add_kvm_msi_virq(vdev, vector, i, false, -1);
     }
 
     /* Set interrupt type prior to possible interrupts */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6141162..00acb85 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -204,6 +204,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
+typedef QLIST_HEAD(, VFIOAddressSpace) VFIOAddressSpaceList;
+extern VFIOAddressSpaceList vfio_address_spaces;
 
 bool vfio_mig_active(void);
 int64_t vfio_mig_bytes_transferred(void);
@@ -222,6 +224,7 @@ struct vfio_info_cap_header *
 vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
 #endif
 extern const MemoryListener vfio_prereg_listener;
+bool vfio_listener_skipped_section(MemoryRegionSection *section);
 
 int vfio_spapr_create_window(VFIOContainer *container,
                              MemoryRegionSection *section,
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 16/25] vfio-pci: cpr part 1
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (14 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 15/25] vfio-pci: refactor " Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-16 17:45   ` Alex Williamson
  2021-07-28  4:56   ` Zheng Chuan
  2021-07-07 17:20 ` [PATCH V5 17/25] vfio-pci: cpr part 2 Steve Sistare
                   ` (8 subsequent siblings)
  24 siblings, 2 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable vfio-pci devices to be saved and restored across an exec restart
of qemu.

At vfio creation time, save the value of vfio container, group, and device
descriptors in the environment.

In cprsave and cprexec, suspend the use of virtual addresses in DMA
mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped
at a different VA after exec.  DMA to already-mapped pages continues.  Save
the msi message area as part of vfio-pci vmstate, save the interrupt and
notifier eventfd's in the environment, and clear the close-on-exec flag
for the vfio descriptors.  The flag is not cleared earlier because the
descriptors should not persist across miscellaneous fork and exec calls
that may be performed during normal operation.

On qemu restart, vfio_realize() finds the descriptor env vars, uses
the descriptors, and notes that the device is being reused.  Device and
iommu state is already configured, so operations in vfio_realize that
would modify the configuration are skipped for a reused device, including
vfio ioctl's and writes to PCI configuration space.  The result is that
vfio_realize constructs qemu data structures that reflect the current
state of the device.  However, the reconstruction is not complete until
cprload is called. cprload loads the msi data and finds eventfds in the
environment.  It rebuilds vector data structures and attaches the
interrupts to the new KVM instance.  cprload then walks the flattened
ranges of the vfio_address_spaces and calls VFIO_DMA_MAP_FLAG_VADDR to
inform the kernel of the new VA's.  Lastly, it starts the VM and suppresses
vfio device reset.

This functionality is delivered by 2 patches for clarity.  Part 2 adds
eventfd and vector support.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS                   |   1 +
 hw/pci/pci.c                  |   4 ++
 hw/vfio/common.c              |  69 +++++++++++++++++--
 hw/vfio/cpr.c                 | 154 ++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/meson.build           |   1 +
 hw/vfio/pci.c                 |  66 +++++++++++++++++-
 hw/vfio/trace-events          |   1 +
 include/hw/pci/pci.h          |   1 +
 include/hw/vfio/vfio-common.h |   5 ++
 include/migration/cpr.h       |   3 +
 linux-headers/linux/vfio.h    |   6 ++
 migration/cpr.c               |  20 ++++++
 12 files changed, 323 insertions(+), 8 deletions(-)
 create mode 100644 hw/vfio/cpr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8647a97..58479db 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2862,6 +2862,7 @@ CPR
 M: Steve Sistare <steven.sistare@oracle.com>
 M: Mark Kanda <mark.kanda@oracle.com>
 S: Maintained
+F: hw/vfio/cpr.c
 F: include/migration/cpr.h
 F: migration/cpr.c
 F: qapi/cpr.json
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2590898..fa4a439 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -307,6 +307,10 @@ static void pci_do_device_reset(PCIDevice *dev)
 {
     int r;
 
+    if (dev->reused) {
+        return;
+    }
+
     pci_device_deassert_intx(dev);
     assert(dev->irq_state == 0);
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9220e64..40c882f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -31,6 +31,7 @@
 #include "exec/memory.h"
 #include "exec/ram_addr.h"
 #include "hw/hw.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/range.h"
@@ -440,6 +441,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
     }
 
+    if (container->reused) {
+        return 0;
+    }
+
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         /*
          * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -463,6 +468,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
+    if (unmap.size != size) {
+        warn_report("VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
+                     iova, size, unmap.size);
+    }
+
     return 0;
 }
 
@@ -477,6 +487,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
         .size = size,
     };
 
+    if (container->reused) {
+        return 0;
+    }
+
     if (!readonly) {
         map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
     }
@@ -1603,6 +1617,10 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
     if (iommu_type < 0) {
         return iommu_type;
     }
+    if (container->reused) {
+        container->iommu_type = iommu_type;
+        return 0;
+    }
 
     ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
     if (ret) {
@@ -1703,6 +1721,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 {
     VFIOContainer *container;
     int ret, fd;
+    bool reused;
+    char name[40];
     VFIOAddressSpace *space;
 
     space = vfio_get_address_space(as);
@@ -1739,16 +1759,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         return ret;
     }
 
+    snprintf(name, sizeof(name), "vfio_container_for_group_%d", group->groupid);
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+
     QLIST_FOREACH(container, &space->containers, next) {
-        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
-            group->container = container;
-            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+        if (container->fd == fd ||
+            !ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            break;
+        }
+    }
+
+    if (container) {
+        group->container = container;
+        QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+        if (!reused) {
             vfio_kvm_device_add_group(group);
-            return 0;
+            setenv_fd(name, container->fd);
         }
+        return 0;
+    }
+
+    if (!reused) {
+        fd = qemu_open_old("/dev/vfio/vfio", O_RDWR);
     }
 
-    fd = qemu_open_old("/dev/vfio/vfio", O_RDWR);
     if (fd < 0) {
         error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio");
         ret = -errno;
@@ -1766,6 +1801,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->space = space;
     container->fd = fd;
+    container->reused = reused;
     container->error = NULL;
     container->dirty_pages_supported = false;
     QLIST_INIT(&container->giommu_list);
@@ -1893,6 +1929,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     }
 
     container->initialized = true;
+    setenv_fd(name, fd);
 
     return 0;
 listener_release_exit:
@@ -1920,6 +1957,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
+    unsetenv_fdv("vfio_container_for_group_%d", group->groupid);
 
     /*
      * Explicitly release the listener first before unset container,
@@ -1978,7 +2016,12 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     group = g_malloc0(sizeof(*group));
 
     snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open_old(path, O_RDWR);
+
+    group->fd = getenv_fd(path);
+    if (group->fd < 0) {
+        group->fd = qemu_open_old(path, O_RDWR);
+    }
+
     if (group->fd < 0) {
         error_setg_errno(errp, errno, "failed to open %s", path);
         goto free_group_exit;
@@ -2012,6 +2055,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 
     QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
+    setenv_fd(path, group->fd);
+
     return group;
 
 close_fd_exit:
@@ -2036,6 +2081,7 @@ void vfio_put_group(VFIOGroup *group)
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
     trace_vfio_put_group(group->fd);
+    unsetenv_fdv("/dev/vfio/%d", group->groupid);
     close(group->fd);
     g_free(group);
 
@@ -2049,8 +2095,14 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     int ret, fd;
+    bool reused;
+
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+    if (!reused) {
+        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    }
 
-    fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
     if (fd < 0) {
         error_setg_errno(errp, errno, "error getting device from group %d",
                          group->groupid);
@@ -2095,6 +2147,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
     vbasedev->num_irqs = dev_info.num_irqs;
     vbasedev->num_regions = dev_info.num_regions;
     vbasedev->flags = dev_info.flags;
+    vbasedev->reused = reused;
+    setenv_fd(name, fd);
 
     trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
                           dev_info.num_irqs);
@@ -2111,6 +2165,7 @@ void vfio_put_base_device(VFIODevice *vbasedev)
     QLIST_REMOVE(vbasedev, next);
     vbasedev->group = NULL;
     trace_vfio_put_base_device(vbasedev->fd);
+    unsetenv_fd(vbasedev->name);
     close(vbasedev->fd);
 }
 
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
new file mode 100644
index 0000000..28f8a76
--- /dev/null
+++ b/hw/vfio/cpr.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+#include "hw/vfio/vfio-common.h"
+#include "sysemu/kvm.h"
+#include "qapi/error.h"
+#include "trace.h"
+
+static int
+vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
+{
+    struct vfio_iommu_type1_dma_unmap unmap = {
+        .argsz = sizeof(unmap),
+        .flags = VFIO_DMA_UNMAP_FLAG_VADDR | VFIO_DMA_UNMAP_FLAG_ALL,
+        .iova = 0,
+        .size = 0,
+    };
+    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+        error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
+        return -errno;
+    }
+    return 0;
+}
+
+static int vfio_dma_map_vaddr(VFIOContainer *container, hwaddr iova,
+                              ram_addr_t size, void *vaddr,
+                              Error **errp)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_VADDR,
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .iova = iova,
+        .size = size,
+    };
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
+        error_setg_errno(errp, errno,
+                         "vfio_dma_map_vaddr(iova %lu, size %ld, va %p)",
+                         iova, size, vaddr);
+        return -errno;
+    }
+    return 0;
+}
+
+static int
+vfio_region_remap(MemoryRegionSection *section, void *handle, Error **errp)
+{
+    MemoryRegion *mr = section->mr;
+    VFIOContainer *container = handle;
+    const char *name = memory_region_name(mr);
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr offset, iova, roundup;
+    void *vaddr;
+
+    if (vfio_listener_skipped_section(section) || memory_region_is_iommu(mr)) {
+        return 0;
+    }
+
+    offset = section->offset_within_address_space;
+    iova = TARGET_PAGE_ALIGN(offset);
+    roundup = iova - offset;
+    size = (size - roundup) & TARGET_PAGE_MASK;
+    vaddr = memory_region_get_ram_ptr(mr) +
+            section->offset_within_region + roundup;
+
+    trace_vfio_region_remap(name, container->fd, iova, iova + size - 1, vaddr);
+    return vfio_dma_map_vaddr(container, iova, size, vaddr, errp);
+}
+
+bool vfio_cpr_capable(VFIOContainer *container, Error **errp)
+{
+    if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR) ||
+        !ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
+        error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR "
+                         "or VFIO_UNMAP_ALL");
+        return false;
+    } else {
+        return true;
+    }
+}
+
+int vfio_cprsave(Error **errp)
+{
+    VFIOAddressSpace *space, *last_space;
+    VFIOContainer *container, *last_container;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (!vfio_cpr_capable(container, errp)) {
+                return 1;
+            }
+        }
+    }
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (vfio_dma_unmap_vaddr_all(container, errp)) {
+                goto unwind;
+            }
+        }
+    }
+    return 0;
+
+unwind:
+    last_space = space;
+    last_container = container;
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            Error *err;
+
+            if (space == last_space && container == last_container) {
+                break;
+            }
+            if (as_flat_walk(space->as, vfio_region_remap, container, &err)) {
+                error_prepend(errp, "%s", error_get_pretty(err));
+                error_free(err);
+            }
+        }
+    }
+    return 1;
+}
+
+int vfio_cprload(Error **errp)
+{
+    VFIOAddressSpace *space;
+    VFIOContainer *container;
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (!vfio_cpr_capable(container, errp)) {
+                return 1;
+            }
+            container->reused = false;
+            if (as_flat_walk(space->as, vfio_region_remap, container, errp)) {
+                return 1;
+            }
+        }
+    }
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->reused = false;
+        }
+    }
+    return 0;
+}
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index da9af29..e247b2b 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -5,6 +5,7 @@ vfio_ss.add(files(
   'migration.c',
 ))
 vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
+  'cpr.c',
   'display.c',
   'pci-quirks.c',
   'pci.c',
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9fc12bc..0f5c542 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -29,6 +29,8 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
 #include "migration/vmstate.h"
+#include "migration/cpr.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -1656,6 +1658,7 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
 static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
+    PCIDevice *pdev = &vdev->pdev;
     char *name;
 
     if (!bar->size) {
@@ -1676,7 +1679,7 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
         }
     }
 
-    pci_register_bar(&vdev->pdev, nr, bar->type, bar->mr);
+    pci_register_bar(pdev, nr, bar->type, bar->mr);
 }
 
 static void vfio_bars_register(VFIOPCIDevice *vdev)
@@ -2888,6 +2891,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         vfio_put_group(group);
         goto error;
     }
+    pdev->reused = vdev->vbasedev.reused;
 
     vfio_populate_device(vdev, &err);
     if (err) {
@@ -3157,6 +3161,10 @@ static void vfio_pci_reset(DeviceState *dev)
 {
     VFIOPCIDevice *vdev = VFIO_PCI(dev);
 
+    if (vdev->pdev.reused) {
+        return;
+    }
+
     trace_vfio_pci_reset(vdev->vbasedev.name);
 
     vfio_pci_pre_reset(vdev);
@@ -3264,6 +3272,61 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static void vfio_merge_config(VFIOPCIDevice *vdev)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    int size = MIN(pci_config_size(pdev), vdev->config_size);
+    uint8_t *phys_config = g_malloc(size);
+    uint32_t mask;
+    int ret, i;
+
+    ret = pread(vdev->vbasedev.fd, phys_config, size, vdev->config_offset);
+    if (ret < size) {
+        ret = ret < 0 ? errno : EFAULT;
+        error_report("failed to read device config space: %s", strerror(ret));
+        return;
+    }
+
+    for (i = 0; i < size; i++) {
+        mask = vdev->emulated_config_bits[i];
+        pdev->config[i] = (pdev->config[i] & mask) | (phys_config[i] & ~mask);
+    }
+
+    g_free(phys_config);
+}
+
+static int vfio_pci_post_load(void *opaque, int version_id)
+{
+    VFIOPCIDevice *vdev = opaque;
+    PCIDevice *pdev = &vdev->pdev;
+    bool enabled;
+
+    vfio_merge_config(vdev);
+
+    pdev->reused = false;
+    enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
+    memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
+
+    return 0;
+}
+
+static bool vfio_pci_needed(void *opaque)
+{
+    return cpr_mode() == CPR_MODE_RESTART;
+}
+
+static const VMStateDescription vfio_pci_vmstate = {
+    .name = "vfio-pci",
+    .unmigratable = 1,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .post_load = vfio_pci_post_load,
+    .needed = vfio_pci_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3271,6 +3334,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 
     dc->reset = vfio_pci_reset;
     device_class_set_props(dc, vfio_pci_dev_properties);
+    dc->vmsd = &vfio_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 0ef1b5f..63dd0fe 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -118,6 +118,7 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_region_remap(const char *name, int fd, uint64_t iova_start, uint64_t iova_end, void *vaddr) "%s fd %d 0x%"PRIx64" - 0x%"PRIx64" [%p]"
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index bef3e49..add7f46 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -360,6 +360,7 @@ struct PCIDevice {
     /* ID of standby device in net_failover pair */
     char *failover_pair_id;
     uint32_t acpi_index;
+    bool reused;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 00acb85..b46d850 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -85,6 +85,7 @@ typedef struct VFIOContainer {
     Error *error;
     bool initialized;
     bool dirty_pages_supported;
+    bool reused;
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
@@ -124,6 +125,7 @@ typedef struct VFIODevice {
     bool no_mmap;
     bool ram_block_discard_allowed;
     bool enable_migration;
+    bool reused;
     VFIODeviceOps *ops;
     unsigned int num_irqs;
     unsigned int num_regions;
@@ -200,6 +202,9 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
+int vfio_cprsave(Error **errp);
+int vfio_cprload(Error **errp);
+bool vfio_cpr_capable(VFIOContainer *container, Error **errp);
 
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index bffee19..1ea5046 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -10,6 +10,9 @@
 
 #include "qapi/qapi-types-cpr.h"
 
+#define CPR_MODE_NONE ((CprMode)(-1))
+
+CprMode cpr_mode(void);
 void cprsave(const char *file, CprMode mode, Error **errp);
 void cprexec(strList *args, Error **errp);
 void cprload(const char *file, Error **errp);
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index e680594..48a02c0 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -52,6 +52,12 @@
 /* Supports the vaddr flag for DMA map and unmap */
 #define VFIO_UPDATE_VADDR		10
 
+/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
+#define VFIO_UNMAP_ALL                        9
+
+/* Supports VFIO DMA map and unmap with the VADDR flag */
+#define VFIO_UPDATE_VADDR              10
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
diff --git a/migration/cpr.c b/migration/cpr.c
index fb57dec..578466c 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -31,6 +31,13 @@
 #include "hw/virtio/vhost.h"
 #include "qemu/env.h"
 
+static CprMode cpr_active_mode = CPR_MODE_NONE;
+
+CprMode cpr_mode(void)
+{
+    return cpr_active_mode;
+}
+
 QEMUFile *qf_file_open(const char *path, int flags, int mode,
                               const char *name, Error **errp)
 {
@@ -92,6 +99,7 @@ void cprsave(const char *file, CprMode mode, Error **errp)
     }
     vm_stop(RUN_STATE_SAVE_VM);
 
+    cpr_active_mode = mode;
     ret = qemu_save_device_state(f);
     qemu_fclose(f);
     if (ret < 0) {
@@ -105,6 +113,7 @@ err:
     if (saved_vm_running) {
         vm_start();
     }
+    cpr_active_mode = CPR_MODE_NONE;
 done:
     return;
 }
@@ -125,6 +134,13 @@ void cprexec(strList *args, Error **errp)
         error_setg(errp, "runstate is not save-vm");
         return;
     }
+    if (cpr_active_mode != CPR_MODE_RESTART) {
+        error_setg(errp, "cprexec requires cprsave with restart mode");
+        return;
+    }
+    if (vfio_cprsave(errp)) {
+        return;
+    }
     walkenv(FD_PREFIX, preserve_fd, 0);
     qemu_system_exec_request(args);
 }
@@ -158,6 +174,10 @@ void cprload(const char *file, Error **errp)
         return;
     }
 
+    if (vfio_cprload(errp)) {
+        return;
+    }
+
     state = global_state_get_runstate();
     if (state == RUN_STATE_RUNNING) {
         vm_start();
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (15 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 16/25] vfio-pci: cpr part 1 Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-16 20:51   ` Alex Williamson
  2021-07-07 17:20 ` [PATCH V5 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Finish cpr for vfio-pci by preserving eventfd's and vector state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 116 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0f5c542..07bd360 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2654,6 +2654,27 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
     vfio_put_base_device(&vdev->vbasedev);
 }
 
+static void save_event_fd(VFIOPCIDevice *vdev, const char *name, int nr,
+                            EventNotifier *ev)
+{
+    char envname[256];
+    int fd = event_notifier_get_fd(ev);
+    const char *vfname = vdev->vbasedev.name;
+
+    if (fd >= 0) {
+        snprintf(envname, sizeof(envname), "%s_%s_%d", vfname, name, nr);
+        setenv_fd(envname, fd);
+    }
+}
+
+static int load_event_fd(VFIOPCIDevice *vdev, const char *name, int nr)
+{
+    char envname[256];
+    const char *vfname = vdev->vbasedev.name;
+    snprintf(envname, sizeof(envname), "%s_%s_%d", vfname, name, nr);
+    return getenv_fd(envname);
+}
+
 static void vfio_err_notifier_handler(void *opaque)
 {
     VFIOPCIDevice *vdev = opaque;
@@ -2685,7 +2706,13 @@ static void vfio_err_notifier_handler(void *opaque)
 static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
 {
     Error *err = NULL;
-    int32_t fd;
+    int32_t fd = load_event_fd(vdev, "err", 0);
+
+    if (fd >= 0) {
+        event_notifier_init_fd(&vdev->err_notifier, fd);
+        qemu_set_fd_handler(fd, vfio_err_notifier_handler, NULL, vdev);
+        return;
+    }
 
     if (!vdev->pci_aer) {
         return;
@@ -2746,7 +2773,14 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
     struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
                                       .index = VFIO_PCI_REQ_IRQ_INDEX };
     Error *err = NULL;
-    int32_t fd;
+    int32_t fd = load_event_fd(vdev, "req", 0);
+
+    if (fd >= 0) {
+        event_notifier_init_fd(&vdev->req_notifier, fd);
+        qemu_set_fd_handler(fd, vfio_req_notifier_handler, NULL, vdev);
+        vdev->req_enabled = true;
+        return;
+    }
 
     if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
         return;
@@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice *vdev)
     g_free(phys_config);
 }
 
+static int vfio_pci_pre_save(void *opaque)
+{
+    VFIOPCIDevice *vdev = opaque;
+    PCIDevice *pdev = &vdev->pdev;
+    int i;
+
+    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+        error_report("%s: cpr does not support vfio-pci INTX",
+                     vdev->vbasedev.name);
+    }
+
+    for (i = 0; i < vdev->nr_vectors; i++) {
+        VFIOMSIVector *vector = &vdev->msi_vectors[i];
+        if (vector->use) {
+            save_event_fd(vdev, "interrupt", i, &vector->interrupt);
+            if (vector->virq >= 0) {
+                save_event_fd(vdev, "kvm_interrupt", i,
+                                &vector->kvm_interrupt);
+            }
+        }
+    }
+    save_event_fd(vdev, "err", 0, &vdev->err_notifier);
+    save_event_fd(vdev, "req", 0, &vdev->req_notifier);
+    return 0;
+}
+
+static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix)
+{
+    int i, fd;
+    bool pending = false;
+    PCIDevice *pdev = &vdev->pdev;
+
+    vdev->nr_vectors = nr_vectors;
+    vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
+    vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
+
+    for (i = 0; i < nr_vectors; i++) {
+        VFIOMSIVector *vector = &vdev->msi_vectors[i];
+
+        fd = load_event_fd(vdev, "interrupt", i);
+        if (fd >= 0) {
+            vfio_vector_init(vdev, i, fd);
+            qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
+        }
+
+        fd = load_event_fd(vdev, "kvm_interrupt", i);
+        if (fd >= 0) {
+            vfio_add_kvm_msi_virq(vdev, vector, i, msix, fd);
+        }
+
+        if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
+            set_bit(i, vdev->msix->pending);
+            pending = true;
+        }
+    }
+
+    if (msix) {
+        memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
+    }
+}
+
 static int vfio_pci_post_load(void *opaque, int version_id)
 {
     VFIOPCIDevice *vdev = opaque;
     PCIDevice *pdev = &vdev->pdev;
+    int nr_vectors;
     bool enabled;
 
     vfio_merge_config(vdev);
 
+    if (msix_enabled(pdev)) {
+        nr_vectors = vdev->msix->entries;
+        vfio_claim_vectors(vdev, nr_vectors, true);
+        msix_init_vector_notifiers(pdev, vfio_msix_vector_use,
+                                   vfio_msix_vector_release, NULL);
+
+    } else if (msi_enabled(pdev)) {
+        nr_vectors = msi_nr_vectors_allocated(pdev);
+        vfio_claim_vectors(vdev, nr_vectors, false);
+
+    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+        error_report("%s: cpr does not support vfio-pci INTX",
+                     vdev->vbasedev.name);
+    }
+
     pdev->reused = false;
     enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
     memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
@@ -3321,8 +3432,11 @@ static const VMStateDescription vfio_pci_vmstate = {
     .version_id = 0,
     .minimum_version_id = 0,
     .post_load = vfio_pci_post_load,
+    .pre_save = vfio_pci_pre_save,
     .needed = vfio_pci_needed,
     .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
+        VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 18/25] vhost: reset vhost devices upon cprsave
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (16 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 17/25] vfio-pci: cpr part 2 Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 19/25] hostmem-memfd: cpr support Steve Sistare
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

A vhost device is implicitly preserved across re-exec because its fd is not
closed, and the value of the fd is specified on the command line for the
new qemu to find.  However, new qemu issues an VHOST_RESET_OWNER ioctl,
which fails because the device already has an owner.  To fix, reset the
owner prior to exec.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/virtio/vhost.c         | 11 +++++++++++
 include/hw/virtio/vhost.h |  1 +
 migration/cpr.c           |  1 +
 3 files changed, 13 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e8f85a5..3934178 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1832,6 +1832,17 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
     hdev->vdev = NULL;
 }
 
+void vhost_dev_reset_all(void)
+{
+    struct vhost_dev *dev;
+
+    QLIST_FOREACH(dev, &vhost_devices, entry) {
+        if (dev->vhost_ops->vhost_reset_device(dev) < 0) {
+            VHOST_OPS_DEBUG("vhost_reset_device failed");
+        }
+    }
+}
+
 int vhost_net_set_backend(struct vhost_dev *hdev,
                           struct vhost_vring_file *file)
 {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 045d0fd..facdfc2 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -108,6 +108,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+void vhost_dev_reset_all(void);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
diff --git a/migration/cpr.c b/migration/cpr.c
index 578466c..6333988 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -142,6 +142,7 @@ void cprexec(strList *args, Error **errp)
         return;
     }
     walkenv(FD_PREFIX, preserve_fd, 0);
+    vhost_dev_reset_all();
     qemu_system_exec_request(args);
 }
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 19/25] hostmem-memfd: cpr support
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (17 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 20/25] chardev: cpr framework Steve Sistare
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Preserve memory-backend-memfd memory objects during cpr.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 backends/hostmem-memfd.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 3fc85c3..f6c193b 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -14,6 +14,7 @@
 #include "sysemu/hostmem.h"
 #include "qom/object_interfaces.h"
 #include "qemu/memfd.h"
+#include "qemu/env.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
 #include "qom/object.h"
@@ -36,23 +37,25 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
     uint32_t ram_flags;
-    char *name;
-    int fd;
+    char *name = host_memory_backend_get_name(backend);
+    int fd = getenv_fd(name);
 
     if (!backend->size) {
         error_setg(errp, "can't create backend with size 0");
         return;
     }
 
-    fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
-                           m->hugetlb, m->hugetlbsize, m->seal ?
-                           F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
-                           errp);
-    if (fd == -1) {
-        return;
+    if (fd < 0) {
+        fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
+                               m->hugetlb, m->hugetlbsize, m->seal ?
+                               F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
+                               errp);
+        if (fd == -1) {
+            return;
+        }
+        setenv_fd(name, fd);
     }
 
-    name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 20/25] chardev: cpr framework
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (18 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 19/25] hostmem-memfd: cpr support Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-08 16:03   ` Marc-André Lureau
  2021-07-07 17:20 ` [PATCH V5 21/25] chardev: cpr for simple devices Steve Sistare
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
Add the chardev close_on_cpr option for devices that can be closed on cpr
and reopened after exec.
cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr is set
for all chardevs in the configuration.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char.c         | 41 ++++++++++++++++++++++++++++++++++++++---
 include/chardev/char.h |  5 +++++
 migration/cpr.c        |  3 +++
 qapi/char.json         |  5 ++++-
 qemu-options.hx        | 26 ++++++++++++++++++++++----
 5 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/chardev/char.c b/chardev/char.c
index d959eec..f10fb94 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -36,6 +36,7 @@
 #include "qemu/help_option.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
+#include "qemu/env.h"
 #include "qemu/id.h"
 #include "qemu/coroutine.h"
 #include "qemu/yank.h"
@@ -239,6 +240,9 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
     ChardevClass *cc = CHARDEV_GET_CLASS(chr);
     /* Any ChardevCommon member would work */
     ChardevCommon *common = backend ? backend->u.null.data : NULL;
+    char fdname[40];
+
+    chr->close_on_cpr = (common && common->close_on_cpr);
 
     if (common && common->has_logfile) {
         int flags = O_WRONLY | O_CREAT;
@@ -248,7 +252,14 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
         } else {
             flags |= O_TRUNC;
         }
-        chr->logfd = qemu_open_old(common->logfile, flags, 0666);
+        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
+        chr->logfd = getenv_fd(fdname);
+        if (chr->logfd < 0) {
+            chr->logfd = qemu_open_old(common->logfile, flags, 0666);
+            if (!chr->close_on_cpr) {
+                setenv_fd(fdname, chr->logfd);
+            }
+        }
         if (chr->logfd < 0) {
             error_setg_errno(errp, errno,
                              "Unable to open logfile %s",
@@ -300,11 +311,12 @@ static void char_finalize(Object *obj)
     if (chr->be) {
         chr->be->chr = NULL;
     }
-    g_free(chr->filename);
-    g_free(chr->label);
     if (chr->logfd != -1) {
         close(chr->logfd);
+        unsetenv_fdv("%s_log", chr->label);
     }
+    g_free(chr->filename);
+    g_free(chr->label);
     qemu_mutex_destroy(&chr->chr_write_lock);
 }
 
@@ -504,6 +516,8 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend)
 
     backend->has_logappend = true;
     backend->logappend = qemu_opt_get_bool(opts, "logappend", false);
+
+    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr", false);
 }
 
 static const ChardevClass *char_get_class(const char *driver, Error **errp)
@@ -945,6 +959,9 @@ QemuOptsList qemu_chardev_opts = {
         },{
             .name = "abstract",
             .type = QEMU_OPT_BOOL,
+        },{
+            .name = "close-on-cpr",
+            .type = QEMU_OPT_BOOL,
 #endif
         },
         { /* end of list */ }
@@ -1212,6 +1229,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
     return source;
 }
 
+static int chr_cpr_capable(Object *obj, void *opaque)
+{
+    Chardev *chr = (Chardev *)obj;
+    Error **errp = opaque;
+
+    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) || chr->close_on_cpr) {
+        return 0;
+    }
+    error_setg(errp, "error: chardev %s -> %s is not capable of cpr",
+               chr->label, chr->filename);
+    return 1;
+}
+
+bool qemu_chr_cpr_capable(Error **errp)
+{
+    return !object_child_foreach(get_chardevs_root(), chr_cpr_capable, errp);
+}
+
 void qemu_chr_cleanup(void)
 {
     object_unparent(get_chardevs_root());
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 7c0444f..e488ad1 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -50,6 +50,8 @@ typedef enum {
     /* Whether the gcontext can be changed after calling
      * qemu_chr_be_update_read_handlers() */
     QEMU_CHAR_FEATURE_GCONTEXT,
+    /* Whether the device supports cpr */
+    QEMU_CHAR_FEATURE_CPR,
 
     QEMU_CHAR_FEATURE_LAST,
 } ChardevFeature;
@@ -67,6 +69,7 @@ struct Chardev {
     int be_open;
     /* used to coordinate the chardev-change special-case: */
     bool handover_yank_instance;
+    bool close_on_cpr;
     GSource *gsource;
     GMainContext *gcontext;
     DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
@@ -291,4 +294,6 @@ void resume_mux_open(void);
 /* console.c */
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
+bool qemu_chr_cpr_capable(Error **errp);
+
 #endif
diff --git a/migration/cpr.c b/migration/cpr.c
index 6333988..feff97f 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -138,6 +138,9 @@ void cprexec(strList *args, Error **errp)
         error_setg(errp, "cprexec requires cprsave with restart mode");
         return;
     }
+    if (!qemu_chr_cpr_capable(errp)) {
+        return;
+    }
     if (vfio_cprsave(errp)) {
         return;
     }
diff --git a/qapi/char.json b/qapi/char.json
index adf2685..5efaf59 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -204,12 +204,15 @@
 # @logfile: The name of a logfile to save output
 # @logappend: true to append instead of truncate
 #             (default to false to truncate)
+# @close-on-cpr: if true, close device's fd on cprsave. defaults to false.
+#                since 6.1.
 #
 # Since: 2.6
 ##
 { 'struct': 'ChardevCommon',
   'data': { '*logfile': 'str',
-            '*logappend': 'bool' } }
+            '*logappend': 'bool',
+            '*close-on-cpr': 'bool' } }
 
 ##
 # @ChardevFile:
diff --git a/qemu-options.hx b/qemu-options.hx
index fa53734..d5ff45f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3134,43 +3134,57 @@ DEFHEADING(Character device options:)
 
 DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
     "-chardev help\n"
-    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
     "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off][,reconnect=seconds]\n"
     "         [,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
-    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
+    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID][,close-on-cpr=on|off] (tcp)\n"
     "-chardev socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
-    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off] (unix)\n"
+    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off] (unix)\n"
     "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
     "         [,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
-    "         [,logfile=PATH][,logappend=on|off]\n"
+    "         [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
     "-chardev msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
     "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #ifdef _WIN32
     "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
     "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 #else
     "-chardev pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #ifdef CONFIG_BRLAPI
     "-chardev braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
         || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__DragonFly__)
     "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
     "-chardev parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(CONFIG_SPICE)
     "-chardev spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
     , QEMU_ARCH_ALL
 )
@@ -3245,6 +3259,10 @@ The general form of a character device option is:
     ``logappend`` option controls whether the log file will be truncated
     or appended to when opened.
 
+    Every backend supports the ``close-on-cpr`` option.  If on, the
+    devices's descriptor is closed during cprsave, and reopened after exec.
+    This is useful for devices that do not support cpr.
+
 The available backends are:
 
 ``-chardev null,id=id``
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 21/25] chardev: cpr for simple devices
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (19 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 20/25] chardev: cpr framework Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 22/25] chardev: cpr for pty Steve Sistare
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Set QEMU_CHAR_FEATURE_CPR for devices that trivially support cpr.
char-stdio is slightly less trivial.  Allow the gdb server by
closing it on exec.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-mux.c     | 1 +
 chardev/char-null.c    | 1 +
 chardev/char-serial.c  | 1 +
 chardev/char-stdio.c   | 8 ++++++++
 gdbstub.c              | 1 +
 include/chardev/char.h | 1 +
 migration/cpr.c        | 1 +
 7 files changed, 14 insertions(+)

diff --git a/chardev/char-mux.c b/chardev/char-mux.c
index 5baf419..bf7bad9 100644
--- a/chardev/char-mux.c
+++ b/chardev/char-mux.c
@@ -336,6 +336,7 @@ static void qemu_chr_open_mux(Chardev *chr,
      */
     *be_opened = muxes_opened;
     qemu_chr_fe_init(&d->chr, drv, errp);
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 
 static void qemu_chr_parse_mux(QemuOpts *opts, ChardevBackend *backend,
diff --git a/chardev/char-null.c b/chardev/char-null.c
index 1c6a290..02acaff 100644
--- a/chardev/char-null.c
+++ b/chardev/char-null.c
@@ -32,6 +32,7 @@ static void null_chr_open(Chardev *chr,
                           Error **errp)
 {
     *be_opened = false;
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 
 static void char_null_class_init(ObjectClass *oc, void *data)
diff --git a/chardev/char-serial.c b/chardev/char-serial.c
index 7c3d84a..b585085 100644
--- a/chardev/char-serial.c
+++ b/chardev/char-serial.c
@@ -274,6 +274,7 @@ static void qmp_chardev_open_serial(Chardev *chr,
     qemu_set_nonblock(fd);
     tty_serial_init(fd, 115200, 'N', 8, 1);
 
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
     qemu_chr_open_fd(chr, fd, fd);
 }
 #endif /* __linux__ || __sun__ */
diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c
index 403da30..9410c16 100644
--- a/chardev/char-stdio.c
+++ b/chardev/char-stdio.c
@@ -114,9 +114,17 @@ static void qemu_chr_open_stdio(Chardev *chr,
 
     stdio_allow_signal = !opts->has_signal || opts->signal;
     qemu_chr_set_echo_stdio(chr, false);
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 #endif
 
+void qemu_term_exit(void)
+{
+#ifndef _WIN32
+    term_exit();
+#endif
+}
+
 static void qemu_chr_parse_stdio(QemuOpts *opts, ChardevBackend *backend,
                                  Error **errp)
 {
diff --git a/gdbstub.c b/gdbstub.c
index 52bde5b..b014b52 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -3534,6 +3534,7 @@ int gdbserver_start(const char *device)
         mon_chr = gdbserver_state.mon_chr;
         reset_gdbserver_state();
     }
+    mon_chr->close_on_cpr = true;
 
     create_processes(&gdbserver_state);
 
diff --git a/include/chardev/char.h b/include/chardev/char.h
index e488ad1..96e5570 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -295,5 +295,6 @@ void resume_mux_open(void);
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
 bool qemu_chr_cpr_capable(Error **errp);
+void qemu_term_exit(void);
 
 #endif
diff --git a/migration/cpr.c b/migration/cpr.c
index feff97f..4600d8c 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -146,6 +146,7 @@ void cprexec(strList *args, Error **errp)
     }
     walkenv(FD_PREFIX, preserve_fd, 0);
     vhost_dev_reset_all();
+    qemu_term_exit();
     qemu_system_exec_request(args);
 }
 
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 22/25] chardev: cpr for pty
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (20 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 21/25] chardev: cpr for simple devices Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 23/25] chardev: cpr for sockets Steve Sistare
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Save and restore pty descriptors across cprsave and cprload.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-pty.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index a2d1e7c..c91151d 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -30,6 +30,7 @@
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "qemu/env.h"
 #include "qemu/qemu-print.h"
 
 #include "chardev/char-io.h"
@@ -191,6 +192,7 @@ static void char_pty_finalize(Object *obj)
     Chardev *chr = CHARDEV(obj);
     PtyChardev *s = PTY_CHARDEV(obj);
 
+    unsetenv_fd(chr->label);
     pty_chr_state(chr, 0);
     object_unref(OBJECT(s->ioc));
     pty_chr_timer_cancel(s);
@@ -207,19 +209,28 @@ static void char_pty_open(Chardev *chr,
     char pty_name[PATH_MAX];
     char *name;
 
+    master_fd = getenv_fd(chr->label);
+    if (master_fd >= 0) {
+        chr->filename = g_strdup_printf("pty:unknown");
+        goto have_fd;
+    }
+
     master_fd = qemu_openpty_raw(&slave_fd, pty_name);
     if (master_fd < 0) {
         error_setg_errno(errp, errno, "Failed to create PTY");
         return;
     }
-
+    if (!chr->close_on_cpr) {
+        setenv_fd(chr->label, master_fd);
+    }
     close(slave_fd);
     qemu_set_nonblock(master_fd);
-
     chr->filename = g_strdup_printf("pty:%s", pty_name);
     qemu_printf("char device redirected to %s (label %s)\n",
                 pty_name, chr->label);
 
+have_fd:
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
     s = PTY_CHARDEV(chr);
     s->ioc = QIO_CHANNEL(qio_channel_file_new_fd(master_fd));
     name = g_strdup_printf("chardev-pty-%s", chr->label);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 23/25] chardev: cpr for sockets
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (21 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 22/25] chardev: cpr for pty Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-29  4:04   ` Zheng Chuan
  2021-07-07 17:20 ` [PATCH V5 24/25] cpr: only-cpr-capable option Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 25/25] simplify savevm Steve Sistare
  24 siblings, 1 reply; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Save accepted socket fds in the environment before cprsave, and look for
fds in the environment after cprload.  Reject cprexec if a socket enables
the TLS or websocket option.  Allow a monitor socket by closing it on exec.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
 monitor/hmp.c         |  3 +++
 monitor/qmp.c         |  3 +++
 3 files changed, 37 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index d0fb545..dc9da8c 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -27,7 +27,9 @@
 #include "io/channel-socket.h"
 #include "io/channel-tls.h"
 #include "io/channel-websock.h"
+#include "qemu/env.h"
 #include "io/net-listener.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -414,6 +416,7 @@ static void tcp_chr_free_connection(Chardev *chr)
     SocketChardev *s = SOCKET_CHARDEV(chr);
     int i;
 
+    unsetenv_fd(chr->label);
     if (s->read_msgfds_num) {
         for (i = 0; i < s->read_msgfds_num; i++) {
             close(s->read_msgfds[i]);
@@ -976,6 +979,10 @@ static void tcp_chr_accept(QIONetListener *listener,
                                QIO_CHANNEL(cioc));
     }
     tcp_chr_new_client(chr, cioc);
+
+    if (s->sioc && !chr->close_on_cpr) {
+        setenv_fd(chr->label, s->sioc->fd);
+    }
 }
 
 
@@ -1231,6 +1238,24 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
     return false;
 }
 
+static void load_char_socket_fd(Chardev *chr, Error **errp)
+{
+    SocketChardev *sockchar = SOCKET_CHARDEV(chr);
+    QIOChannelSocket *sioc;
+    int fd = getenv_fd(chr->label);
+
+    if (fd != -1) {
+        sockchar = SOCKET_CHARDEV(chr);
+        sioc = qio_channel_socket_new_fd(fd, errp);
+        if (sioc) {
+            tcp_chr_accept(sockchar->listener, sioc, chr);
+            object_unref(OBJECT(sioc));
+        } else {
+            error_setg(errp, "error: could not restore socket for %s",
+                       chr->label);
+        }
+    }
+}
 
 static int qmp_chardev_open_socket_server(Chardev *chr,
                                           bool is_telnet,
@@ -1435,6 +1460,10 @@ static void qmp_chardev_open_socket(Chardev *chr,
     }
     s->registered_yank = true;
 
+    if (!s->tls_creds && !s->is_websock) {
+        qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
+    }
+
     /* be isn't opened until we get a connection */
     *be_opened = false;
 
@@ -1450,6 +1479,8 @@ static void qmp_chardev_open_socket(Chardev *chr,
             return;
         }
     }
+
+    load_char_socket_fd(chr, errp);
 }
 
 static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
diff --git a/monitor/hmp.c b/monitor/hmp.c
index 6c0b33a..63700b3 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1451,4 +1451,7 @@ void monitor_init_hmp(Chardev *chr, bool use_readline, Error **errp)
     qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read, monitor_read,
                              monitor_event, NULL, &mon->common, NULL, true);
     monitor_list_append(&mon->common);
+
+    /* monitor cannot yet be preserved across cpr */
+    chr->close_on_cpr = true;
 }
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 092c527..21a90bf 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -535,4 +535,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
                                  NULL, &mon->common, NULL, true);
         monitor_list_append(&mon->common);
     }
+
+    /* Monitor cannot yet be preserved across cpr */
+    chr->close_on_cpr = true;
 }
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 24/25] cpr: only-cpr-capable option
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (22 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 23/25] chardev: cpr for sockets Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  2021-07-07 17:20 ` [PATCH V5 25/25] simplify savevm Steve Sistare
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add the only-cpr-capable option, which causes qemu to exit with an error
if any devices that are not capable of cpr are added.  This guarantees that
a cprexec operation will not fail with an unsupported device error.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS             |  1 +
 chardev/char-socket.c   |  4 ++++
 hw/vfio/common.c        |  5 +++++
 hw/vfio/pci.c           |  5 +++++
 include/sysemu/sysemu.h |  1 +
 migration/migration.c   |  5 +++++
 qemu-options.hx         |  8 ++++++++
 softmmu/globals.c       |  1 +
 softmmu/physmem.c       |  5 +++++
 softmmu/vl.c            | 14 +++++++++++++-
 stubs/cpr.c             |  3 +++
 stubs/meson.build       |  1 +
 12 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 stubs/cpr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 58479db..06fabd6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2868,6 +2868,7 @@ F: migration/cpr.c
 F: qapi/cpr.json
 F: include/qemu/env.h
 F: util/env.c
+F: stubs/cpr.c
 
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index dc9da8c..c11ec80 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -40,6 +40,7 @@
 
 #include "chardev/char-io.h"
 #include "qom/object.h"
+#include "sysemu/sysemu.h"
 
 /***********************************************************/
 /* TCP Net console */
@@ -1462,6 +1463,9 @@ static void qmp_chardev_open_socket(Chardev *chr,
 
     if (!s->tls_creds && !s->is_websock) {
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
+    } else if (only_cpr_capable) {
+        error_setg(errp, "error: socket %s is not cpr capable due to %s option",
+                   chr->label, (s->tls_creds ? "TLS" : "websocket"));
     }
 
     /* be isn't opened until we get a connection */
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 40c882f..09d5e6e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -37,6 +37,7 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
+#include "sysemu/sysemu.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
@@ -1601,6 +1602,10 @@ static int vfio_get_iommu_type(VFIOContainer *container,
 
     for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
         if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
+            if (only_cpr_capable && !vfio_cpr_capable(container, errp)) {
+                error_prepend(errp, "only-cpr-capable is specified: ");
+                return -EINVAL;
+            }
             return iommu_types[i];
         }
     }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 07bd360..f179086 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -265,6 +265,11 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 
     if (!pin) {
         return 0;
+    } else if (only_cpr_capable) {
+        error_setg(errp,
+                   "%s: vfio-pci INTX is not compatible with -only-cpr-capable",
+                   vdev->vbasedev.name);
+        return -1;
     }
 
     vfio_disable_interrupts(vdev);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8fae667..6241c20 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 /* vl.c */
 
 extern int only_migratable;
+extern bool only_cpr_capable;
 extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
diff --git a/migration/migration.c b/migration/migration.c
index 5ff7ba9..63fcd2e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1247,6 +1247,11 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_X_COLO] && only_cpr_capable) {
+        error_setg(errp, "x-colo is not compatible with -only-cpr-capable");
+        return false;
+    }
+
     return true;
 }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index d5ff45f..153dfe8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4323,6 +4323,14 @@ SRST
     an unmigratable state.
 ERST
 
+DEF("only-cpr-capable", 0, QEMU_OPTION_only_cpr_capable, \
+    "-only-cpr-capable    allow only cpr capable devices\n", QEMU_ARCH_ALL)
+SRST
+``-only-cpr-capable``
+    Only allow cpr capable devices, which guarantees that cprsave will not
+    fail with an unsupported device error.
+ERST
+
 DEF("nodefaults", 0, QEMU_OPTION_nodefaults, \
     "-nodefaults     don't create default devices\n", QEMU_ARCH_ALL)
 SRST
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 7d0fc81..a18fd8d 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -59,6 +59,7 @@ int boot_menu;
 bool boot_strict;
 uint8_t *boot_splash_filedata;
 int only_migratable; /* turn it off unless user states otherwise */
+bool only_cpr_capable;
 int icount_align_option;
 
 /* The bytes in qemu_uuid are in the order specified by RFC4122, _not_ in the
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 8a65ef7..54f9072 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -46,6 +46,7 @@
 #include "sysemu/dma.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/hw_accel.h"
+#include "sysemu/sysemu.h"
 #include "sysemu/xen-mapcache.h"
 #include "trace/trace-root.h"
 
@@ -2002,6 +2003,10 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                 addr = file_ram_alloc(new_block, maxlen, mfd,
                                       false, false, 0, errp);
                 trace_anon_memfd_alloc(name, maxlen, addr, mfd);
+            } else if (only_cpr_capable) {
+                error_setg(errp,
+                    "only-cpr-capable requires -machine memfd-alloc=on");
+                return;
             } else {
                 addr = qemu_anon_ram_alloc(maxlen, &mr->align,
                                            shared, noreserve);
diff --git a/softmmu/vl.c b/softmmu/vl.c
index a50c857..9012385 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2665,6 +2665,10 @@ void qmp_x_exit_preconfig(Error **errp)
     qemu_create_cli_devices();
     qemu_machine_creation_done();
 
+    if (only_cpr_capable && !qemu_chr_cpr_capable(errp)) {
+        ;    /* not reached due to error_fatal */
+    }
+
     if (loadvm) {
         Error *local_err = NULL;
         if (!load_snapshot(loadvm, NULL, false, NULL, &local_err)) {
@@ -2674,7 +2678,12 @@ void qmp_x_exit_preconfig(Error **errp)
         }
     }
     if (replay_mode != REPLAY_MODE_NONE) {
-        replay_vmstate_init();
+        if (only_cpr_capable) {
+            error_setg(errp, "replay is not compatible with -only-cpr-capable");
+            /* not reached due to error_fatal */
+        } else {
+            replay_vmstate_init();
+        }
     }
 
     if (incoming) {
@@ -3428,6 +3437,9 @@ void qemu_init(int argc, char **argv, char **envp)
             case QEMU_OPTION_only_migratable:
                 only_migratable = 1;
                 break;
+            case QEMU_OPTION_only_cpr_capable:
+                only_cpr_capable = true;
+                break;
             case QEMU_OPTION_nodefaults:
                 has_defaults = 0;
                 break;
diff --git a/stubs/cpr.c b/stubs/cpr.c
new file mode 100644
index 0000000..aaa189e
--- /dev/null
+++ b/stubs/cpr.c
@@ -0,0 +1,3 @@
+#include "qemu/osdep.h"
+
+bool only_cpr_capable;
diff --git a/stubs/meson.build b/stubs/meson.build
index 2e79ff9..04fada0 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -5,6 +5,7 @@ stub_ss.add(files('blk-exp-close-all.c'))
 stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
 stub_ss.add(files('change-state-handler.c'))
 stub_ss.add(files('cmos.c'))
+stub_ss.add(files('cpr.c'))
 stub_ss.add(files('cpu-get-clock.c'))
 stub_ss.add(files('cpus-get-virtual-clock.c'))
 stub_ss.add(files('qemu-timer-notify-cb.c'))
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V5 25/25] simplify savevm
  2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
                   ` (23 preceding siblings ...)
  2021-07-07 17:20 ` [PATCH V5 24/25] cpr: only-cpr-capable option Steve Sistare
@ 2021-07-07 17:20 ` Steve Sistare
  24 siblings, 0 replies; 74+ messages in thread
From: Steve Sistare @ 2021-07-07 17:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Eric Blake, Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Use qf_file_open to simplify a few functions in savevm.c.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/savevm.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 72848b9..ba5250d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2901,8 +2901,9 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
 void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
                                 Error **errp)
 {
+    const char *ioc_name = "migration-xen-save-state";
+    int flags = O_WRONLY | O_CREAT | O_TRUNC;
     QEMUFile *f;
-    QIOChannelFile *ioc;
     int saved_vm_running;
     int ret;
 
@@ -2916,14 +2917,10 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
     vm_stop(RUN_STATE_SAVE_VM);
     global_state_store_running();
 
-    ioc = qio_channel_file_new_path(filename, O_WRONLY | O_CREAT | O_TRUNC,
-                                    0660, errp);
-    if (!ioc) {
+    f = qf_file_open(filename, flags, 0660, ioc_name, errp);
+    if (!f) {
         goto the_end;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-xen-save-state");
-    f = qemu_fopen_channel_output(QIO_CHANNEL(ioc));
-    object_unref(OBJECT(ioc));
     ret = qemu_save_device_state(f);
     if (ret < 0 || qemu_fclose(f) < 0) {
         error_setg(errp, QERR_IO_ERROR);
@@ -2951,8 +2948,8 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
 
 void qmp_xen_load_devices_state(const char *filename, Error **errp)
 {
+    const char *ioc_name = "migration-xen-load-state";
     QEMUFile *f;
-    QIOChannelFile *ioc;
     int ret;
 
     /* Guest must be paused before loading the device state; the RAM state
@@ -2964,14 +2961,10 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
     }
     vm_stop(RUN_STATE_RESTORE_VM);
 
-    ioc = qio_channel_file_new_path(filename, O_RDONLY | O_BINARY, 0, errp);
-    if (!ioc) {
+    f = qf_file_open(filename, O_RDONLY | O_BINARY, 0, ioc_name, errp);
+    if (!f) {
         return;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-xen-load-state");
-    f = qemu_fopen_channel_input(QIO_CHANNEL(ioc));
-    object_unref(OBJECT(ioc));
-
     ret = qemu_loadvm_state(f);
     qemu_fclose(f);
     if (ret < 0) {
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 01/25] qemu_ram_volatile
  2021-07-07 17:20 ` [PATCH V5 01/25] qemu_ram_volatile Steve Sistare
@ 2021-07-08 12:01   ` Marc-André Lureau
  2021-07-12 17:06     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 12:01 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:35 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Add a function that returns true if any ram_list block represents
> volatile memory.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  include/exec/memory.h |  8 ++++++++
>  softmmu/memory.c      | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 38 insertions(+)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index b116f7c..7ad63f8 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2649,6 +2649,14 @@ bool ram_block_discard_is_disabled(void);
>   */
>  bool ram_block_discard_is_required(void);
>
> +/**
> + * qemu_ram_volatile: return true if any memory regions are writable and
> not
> + * backed by shared memory.
> + *
> + * @errp: returned error message identifying the bad region.
> + */
> +bool qemu_ram_volatile(Error **errp);
>

Usually, bool-value functions with an error return true on success. If it
deviates from the recommendation, it should at least be documented.

Also, we have a preference for using _is_ in the function name for such
tests.

+
>  #endif
>
>  #endif
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index f016151..e9536bc 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2714,6 +2714,36 @@ void memory_global_dirty_log_stop(void)
>      memory_global_dirty_log_do_stop();
>  }
>
> +/*
> + * Return true if any memory regions are writable and not backed by shared
> + * memory.
> + */
>

Let's not duplicate API comments.

+bool qemu_ram_volatile(Error **errp)
> +{
> +    RAMBlock *block;
> +    MemoryRegion *mr;
> +    bool ret = false;
> +
> +    rcu_read_lock();
>

RCU_READ_LOCK_GUARD()


> +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>

RAMBLOCK_FOREACH() should do.

Or rather use the qemu_ram_foreach_block() helper.


+        mr = block->mr;
> +        if (mr &&
> +            memory_region_is_ram(mr) &&
> +            !memory_region_is_ram_device(mr) &&
> +            !memory_region_is_rom(mr) &&
> +            (block->fd == -1 || !qemu_ram_is_shared(block))) {
> +
> +            error_setg(errp, "Memory region %s is volatile",
> +                       memory_region_name(mr));
> +            ret = true;
> +            break;
> +        }
> +    }
> +
> +    rcu_read_unlock();
> +    return ret;
> +}
> +
>  static void listener_add_address_space(MemoryListener *listener,
>                                         AddressSpace *as)
>  {
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 4137 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 02/25] cpr: reboot mode
  2021-07-07 17:20 ` [PATCH V5 02/25] cpr: reboot mode Steve Sistare
@ 2021-07-08 12:25   ` Marc-André Lureau
  2021-07-12 17:07     ` Steven Sistare
  2021-08-04 15:48   ` Eric Blake
  1 sibling, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 12:25 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 11876 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:45 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Provide the cprsave and cprload functions for live update.  These save and
> restore VM state, with minimal guest pause time, so that qemu may be
> updated
> to a new version in between.
>
> cprsave stops the VM and saves vmstate to an ordinary file.  It supports
> any
> type of guest image and block device, but the caller must not modify guest
> block devices between cprsave and cprload.
>
> cprsave supports several modes, the first of which is reboot.  In this
> mode,
> the caller invokes cprsave and then terminates qemu.  The caller may then
> update the host kernel and system software and reboot.  The caller resumes
> the guest by running qemu with the same arguments as the original process
> and invoking cprload.  To use this mode, guest ram must be mapped to a
> persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.
>
> The reboot mode supports vfio devices if the caller first suspends the
> guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.
>
> cprload loads state from the file.  If the VM was running at cprsave time,
> then VM execution resumes.  If the VM was suspended at cprsave time, then
> the caller must issue a system_wakeup command to resume.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  MAINTAINERS               |   7 +++
>  include/migration/cpr.h   |  17 ++++++
>  include/sysemu/runstate.h |   1 +
>  migration/cpr.c           | 149
> ++++++++++++++++++++++++++++++++++++++++++++++
>  migration/meson.build     |   1 +
>  migration/savevm.h        |   2 +
>  softmmu/runstate.c        |  21 ++++++-
>  7 files changed, 197 insertions(+), 1 deletion(-)
>  create mode 100644 include/migration/cpr.h
>  create mode 100644 migration/cpr.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 684142e..c3573aa 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2858,6 +2858,13 @@ F: net/colo*
>  F: net/filter-rewriter.c
>  F: net/filter-mirror.c
>
> +CPR
> +M: Steve Sistare <steven.sistare@oracle.com>
> +M: Mark Kanda <mark.kanda@oracle.com>
> +S: Maintained
> +F: include/migration/cpr.h
> +F: migration/cpr.c
> +
>  Record/replay
>  M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
>  R: Paolo Bonzini <pbonzini@redhat.com>
> diff --git a/include/migration/cpr.h b/include/migration/cpr.h
> new file mode 100644
> index 0000000..bffee19
> --- /dev/null
> +++ b/include/migration/cpr.h
> @@ -0,0 +1,17 @@
> +/*
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef MIGRATION_CPR_H
> +#define MIGRATION_CPR_H
> +
> +#include "qapi/qapi-types-cpr.h"
> +
> +void cprsave(const char *file, CprMode mode, Error **errp);
>

I'd rather use "path" or "filename".

+void cprexec(strList *args, Error **errp);
> +void cprload(const char *file, Error **errp);
>

same

It's recommended to return a bool value TRUE for success.
(see include/qapi/error.h)

+
> +#endif
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index a535691..ed4b735 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -51,6 +51,7 @@ void qemu_system_reset_request(ShutdownCause reason);
>  void qemu_system_suspend_request(void);
>  void qemu_register_suspend_notifier(Notifier *notifier);
>  bool qemu_wakeup_suspend_enabled(void);
> +void qemu_system_start_on_wake_request(void);
>

I suggest introducing the function in a preliminary commit.

Also for consistency with the rest of symbols, use "wakeup".

 void qemu_system_wakeup_request(WakeupReason reason, Error **errp);
>  void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
>  void qemu_register_wakeup_notifier(Notifier *notifier);
> diff --git a/migration/cpr.c b/migration/cpr.c
> new file mode 100644
> index 0000000..c5bad8a
> --- /dev/null
> +++ b/migration/cpr.c
> @@ -0,0 +1,149 @@
> +/*
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "monitor/monitor.h"
> +#include "migration.h"
> +#include "migration/snapshot.h"
> +#include "chardev/char.h"
> +#include "migration/misc.h"
> +#include "migration/cpr.h"
> +#include "migration/global_state.h"
> +#include "qemu-file-channel.h"
> +#include "qemu-file.h"
> +#include "savevm.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qemu/error-report.h"
> +#include "io/channel-buffer.h"
> +#include "io/channel-file.h"
> +#include "sysemu/cpu-timers.h"
> +#include "sysemu/runstate.h"
> +#include "sysemu/runstate-action.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
> +#include "sysemu/xen.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/virtio/vhost.h"
> +
> +QEMUFile *qf_file_open(const char *path, int flags, int mode,
> +                              const char *name, Error **errp)
>

None of our functions have qf_ prefix. We are not very consistent with
QEMUFile functions, but I suggest to spell it out qemu_file_open().

Also, it should probably be in migration/qemu-file.c.

+{
>

I'd ERRP_GUARD on every function with an errp argument.

+    QIOChannelFile *fioc;
>

Let's not miss an opportunity to use g_auto
    g_autoptr(QIOChannelFile) fioc = NULL;

+    QIOChannel *ioc;
> +    QEMUFile *f;
> +
> +    if (flags & O_RDWR) {
> +        error_setg(errp, "qf_file_open %s: O_RDWR not supported", path);
> +        return 0;
> +    }
> +
> +    fioc = qio_channel_file_new_path(path, flags, mode, errp);
> +    if (!fioc) {
> +        return 0;
> +    }
> +
> +    ioc = QIO_CHANNEL(fioc);
> +    qio_channel_set_name(ioc, name);
> +    f = (flags & O_WRONLY) ? qemu_fopen_channel_output(ioc) :
> +                             qemu_fopen_channel_input(ioc);
>
 +    object_unref(OBJECT(fioc));

With g_auto, can be removed, and value returned directly.

+    return f;
> +}
> +
> +void cprsave(const char *file, CprMode mode, Error **errp)
> +{
> +    int ret;
> +    QEMUFile *f;
> +    int saved_vm_running = runstate_is_running();
> +
> +    if (mode == CPR_MODE_REBOOT && qemu_ram_volatile(errp)) {
> +        return;
> +    }
> +
> +    if (migrate_colo_enabled()) {
> +        error_setg(errp, "error: cprsave does not support x-colo");
>

Remove error:

+        return;
> +    }
> +
> +    if (replay_mode != REPLAY_MODE_NONE) {
> +        error_setg(errp, "error: cprsave does not support replay");
>

same

+        return;
> +    }
> +
> +    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, "cprsave",
> errp);
> +    if (!f) {
> +        return;
> +    }
> +
> +    if (global_state_store()) {
> +        error_setg(errp, "Error saving global state");
> +        qemu_fclose(f);
> +        return;
> +    }
>

Could be called before opening cprsave file?

+    if (runstate_check(RUN_STATE_SUSPENDED)) {
> +        /* Update timers_state before saving.  Suspend did not so do. */
> +        cpu_disable_ticks();
> +    }
> +    vm_stop(RUN_STATE_SAVE_VM);
> +
> +    ret = qemu_save_device_state(f);
> +    qemu_fclose(f);
> +    if (ret < 0) {
> +        error_setg(errp, "Error %d while saving VM state", ret);
> +        goto err;
>

Needless goto / labels.


> +    }
> +
> +    goto done;
> +
> +err:
> +    if (saved_vm_running) {
> +        vm_start();
> +    }
> +done:
> +    return;
> +}
> +
> +void cprload(const char *file, Error **errp)
> +{
> +    QEMUFile *f;
> +    int ret;
> +    RunState state;
> +
> +    if (runstate_is_running()) {
> +        error_setg(errp, "cprload called for a running VM");
> +        return;
> +    }
> +
> +    f = qf_file_open(file, O_RDONLY, 0, "cprload", errp);
> +    if (!f) {
> +        return;
> +    }
> +
> +    if (qemu_get_be32(f) != QEMU_VM_FILE_MAGIC ||
> +        qemu_get_be32(f) != QEMU_VM_FILE_VERSION) {
> +        error_setg(errp, "error: %s is not a vmstate file", file);
>

f is leaked

+        return;
> +    }
> +
> +    ret = qemu_load_device_state(f);
> +    qemu_fclose(f);
> +    if (ret < 0) {
> +        error_setg(errp, "Error %d while loading VM state", ret);
> +        return;
> +    }
> +
> +    state = global_state_get_runstate();
> +    if (state == RUN_STATE_RUNNING) {
> +        vm_start();
> +    } else {
> +        runstate_set(state);
> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> +            qemu_system_start_on_wake_request();
> +        }
> +    }
> +}
> diff --git a/migration/meson.build b/migration/meson.build
> index f8714dc..fd59281 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -15,6 +15,7 @@ softmmu_ss.add(files(
>    'channel.c',
>    'colo-failover.c',
>    'colo.c',
> +  'cpr.c',
>    'exec.c',
>    'fd.c',
>    'global_state.c',
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 6461342..ce5d710 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f,
> MigrationIncomingState *mis);
>  int qemu_load_device_state(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>          bool in_postcopy, bool inactivate_disks);
> +QEMUFile *qf_file_open(const char *path, int flags, int mode,
> +                       const char *name, Error **errp);
>
>  #endif
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index 10d9b73..7fe4967 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -115,6 +115,8 @@ static const RunStateTransition
> runstate_transitions_def[] = {
>      { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
>      { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
>      { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> +    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
> +    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
>
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
> @@ -335,6 +337,7 @@ void vm_state_notify(bool running, RunState state)
>      }
>  }
>
> +static bool start_on_wake_requested;
>  static ShutdownCause reset_requested;
>  static ShutdownCause shutdown_requested;
>  static int shutdown_signal;
> @@ -562,6 +565,11 @@ void qemu_register_suspend_notifier(Notifier
> *notifier)
>      notifier_list_add(&suspend_notifiers, notifier);
>  }
>
> +void qemu_system_start_on_wake_request(void)
> +{
> +    start_on_wake_requested = true;
> +}
> +
>  void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
>  {
>      trace_system_wakeup_request(reason);
> @@ -574,7 +582,18 @@ void qemu_system_wakeup_request(WakeupReason reason,
> Error **errp)
>      if (!(wakeup_reason_mask & (1 << reason))) {
>          return;
>      }
> -    runstate_set(RUN_STATE_RUNNING);
> +
> +    /*
> +     * Must call vm_start if it has never been called, to invoke the state
> +     * change callbacks for the first time.
> +     */
> +    if (start_on_wake_requested) {
> +        start_on_wake_requested = false;
> +        vm_start();
> +    } else {
> +        runstate_set(RUN_STATE_RUNNING);
> +    }
> +
>      wakeup_reason = reason;
>      qemu_notify_event();
>  }
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 16286 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 03/25] cpr: QMP interfaces for reboot
  2021-07-07 17:20 ` [PATCH V5 03/25] cpr: QMP interfaces for reboot Steve Sistare
@ 2021-07-08 13:27   ` Marc-André Lureau
  2021-07-12 17:07     ` Steven Sistare
  2021-08-04 15:48   ` Eric Blake
  1 sibling, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 13:27 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 5460 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:28 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> cprsave calls cprsave().  Syntax:
>   { 'enum': 'CprMode', 'data': [ 'reboot' ] }
>   { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }
>
> cprload calls cprload().  Syntax:
>   { 'command': 'cprload', 'data': { 'file': 'str' } }
>
> cprinfo returns a list of supported modes.  Syntax:
>   { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
>   { 'command': 'cprinfo', 'returns': 'CprInfo' }
>

It may not be necessary, we may instead rely on query-qmp-schema
introspection.


> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  MAINTAINERS           |  1 +
>  monitor/qmp-cmds.c    | 31 +++++++++++++++++++++
>  qapi/cpr.json         | 74
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi/meson.build      |  1 +
>  qapi/qapi-schema.json |  1 +
>  5 files changed, 108 insertions(+)
>  create mode 100644 qapi/cpr.json
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c3573aa..c48dd37 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2864,6 +2864,7 @@ M: Mark Kanda <mark.kanda@oracle.com>
>  S: Maintained
>  F: include/migration/cpr.h
>  F: migration/cpr.c
> +F: qapi/cpr.json
>
>  Record/replay
>  M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index f7d64a6..1128604 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -37,9 +37,11 @@
>  #include "qapi/qapi-commands-machine.h"
>  #include "qapi/qapi-commands-misc.h"
>  #include "qapi/qapi-commands-ui.h"
> +#include "qapi/qapi-commands-cpr.h"
>  #include "qapi/qmp/qerror.h"
>  #include "hw/mem/memory-device.h"
>  #include "hw/acpi/acpi_dev_interface.h"
> +#include "migration/cpr.h"
>
>  NameInfo *qmp_query_name(Error **errp)
>  {
> @@ -153,6 +155,35 @@ void qmp_cont(Error **errp)
>      }
>  }
>
> +CprInfo *qmp_cprinfo(Error **errp)
> +{
> +    CprInfo *cprinfo;
> +    CprModeList *mode, *mode_list = NULL;
> +    CprMode i;
> +
> +    cprinfo = g_malloc0(sizeof(*cprinfo));
> +
> +    for (i = 0; i < CPR_MODE__MAX; i++) {
> +        mode = g_malloc0(sizeof(*mode));
> +        mode->value = i;
> +        mode->next = mode_list;
> +        mode_list = mode;
> +    }
> +
> +    cprinfo->modes = mode_list;
> +    return cprinfo;
> +}
> +
> +void qmp_cprsave(const char *file, CprMode mode, Error **errp)
> +{
> +    cprsave(file, mode, errp);
> +}
> +
> +void qmp_cprload(const char *file, Error **errp)
> +{
> +    cprload(file, errp);
> +}
> +
>  void qmp_system_wakeup(Error **errp)
>  {
>      if (!qemu_wakeup_suspend_enabled()) {
> diff --git a/qapi/cpr.json b/qapi/cpr.json
> new file mode 100644
> index 0000000..b6fdc89
> --- /dev/null
> +++ b/qapi/cpr.json
> @@ -0,0 +1,74 @@
> +# -*- Mode: Python -*-
> +#
> +# Copyright (c) 2021 Oracle and/or its affiliates.
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2.
> +# See the COPYING file in the top-level directory.
> +
> +##
> +# = CPR
>

Please spell it out in the doc at least (it's not obvious, I had to search
for the meaning in list archives ;).

+##
> +
> +{ 'include': 'common.json' }
> +
> +##
> +# @CprMode:
> +#
> +# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
> +#
> +# Since: 6.1
> +##
> +{ 'enum': 'CprMode',
> +  'data': [ 'reboot' ] }
> +
> +
> +##
> +# @CprInfo:
> +#
> +# @modes: @CprMode list
> +#
> +# Since: 6.1
> +##
> +{ 'struct': 'CprInfo',
> +  'data': { 'modes': [ 'CprMode' ] } }
> +
> +##
> +# @cprinfo:
> +#
> +# Returns the modes supported by @cprsave.
> +#
> +# Returns: @CprInfo
> +#
> +# Since: 6.1
> +#
> +##
> +{ 'command': 'cprinfo',
> +  'returns': 'CprInfo' }
> +
> +##
> +# @cprsave:
> +#
> +# Create a checkpoint of the virtual machine device state in @file.
> +# Guest RAM and guest block device blocks are not saved.
> +#
>

It would be worth highlighting the differences with snapshot-save/load.

I guess it would make sense to consider this as an extension/variant to
those commands.


> +# @file: name of checkpoint file
> +# @mode: @CprMode mode
> +#
> +# Since: 6.1
> +##
> +{ 'command': 'cprsave',
> +  'data': { 'file': 'str',
> +            'mode': 'CprMode' } }
> +
> +##
> +# @cprload:
> +#
> +# Start virtual machine from checkpoint file that was created earlier
> using
> +# the cprsave command.
> +#
> +# @file: name of checkpoint file
> +#
> +# Since: 6.1
> +##
> +{ 'command': 'cprload',
> +  'data': { 'file': 'str' } }
> diff --git a/qapi/meson.build b/qapi/meson.build
> index 376f4ce..7e7c48a 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -26,6 +26,7 @@ qapi_all_modules = [
>    'common',
>    'compat',
>    'control',
> +  'cpr',
>    'crypto',
>    'dump',
>    'error',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index 4912b97..001d790 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -77,6 +77,7 @@
>  { 'include': 'ui.json' }
>  { 'include': 'authz.json' }
>  { 'include': 'migration.json' }
> +{ 'include': 'cpr.json' }
>  { 'include': 'transaction.json' }
>  { 'include': 'trace.json' }
>  { 'include': 'compat.json' }
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 7805 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 05/25] as_flat_walk
  2021-07-07 17:20 ` [PATCH V5 05/25] as_flat_walk Steve Sistare
@ 2021-07-08 13:49   ` Marc-André Lureau
  2021-07-12 17:07     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 13:49 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2670 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:28 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Add an iterator over the sections of a flattened address space.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  include/exec/memory.h | 17 +++++++++++++++++
>  softmmu/memory.c      | 18 ++++++++++++++++++
>  2 files changed, 35 insertions(+)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 7ad63f8..a030aef 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2023,6 +2023,23 @@ bool memory_region_present(MemoryRegion *container,
> hwaddr addr);
>   */
>  bool memory_region_is_mapped(MemoryRegion *mr);
>
> +typedef int (*qemu_flat_walk_cb)(MemoryRegionSection *s,
> +                                 void *handle,
> +                                 Error **errp);
>

Please document the callback type, especially returned values. (see for
example flatview_cb)

Usually, the user pointer is called "opaque".

Could it be named memory_region_section_cb instead ?

+
> +/**
> + * as_flat_walk: walk the ranges in the address space flat view and call
> @func
> + * for each.  Return 0 on success, else return non-zero with a message in
> + * @errp.
>

Suggest address_space_flat_for_each_section() name ?



> + *
> + * @as: target address space
> + * @func: callback function
> + * @handle: passed to @func
>

opaque

+ * @errp: passed to @func
> + */
> +int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
> +                 void *handle, Error **errp);
> +
>  /**
>   * memory_region_find: translate an address/size relative to a
>   * MemoryRegion into a #MemoryRegionSection.
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index e9536bc..1ec1e25 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2577,6 +2577,24 @@ bool memory_region_is_mapped(MemoryRegion *mr)
>      return mr->container ? true : false;
>  }
>
> +int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
> +                 void *handle, Error **errp)
> +{
> +    FlatView *view = address_space_get_flatview(as);
> +    FlatRange *fr;
> +    int ret;
> +
> +    FOR_EACH_FLAT_RANGE(fr, view) {
> +        MemoryRegionSection section = section_from_flat_range(fr, view);
> +        ret = func(&section, handle, errp);
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  /* Same as memory_region_find, but it does not add a reference to the
>   * returned region.  It must be called from an RCU critical section.
>   */
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 3934 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 06/25] oslib: qemu_clr_cloexec
  2021-07-07 17:20 ` [PATCH V5 06/25] oslib: qemu_clr_cloexec Steve Sistare
@ 2021-07-08 13:58   ` Marc-André Lureau
  2021-07-12 17:07     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 13:58 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:33 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Define qemu_clr_cloexec, analogous to qemu_set_cloexec.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  include/qemu/osdep.h | 1 +
>  util/oslib-posix.c   | 9 +++++++++
>  util/oslib-win32.c   | 4 ++++
>  3 files changed, 14 insertions(+)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index c91a78b..3d6a6ca 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -637,6 +637,7 @@ static inline void qemu_timersub(const struct timeval
> *val1,
>  #endif
>
>  void qemu_set_cloexec(int fd);
> +void qemu_clr_cloexec(int fd);
>

I wish we would have a single function to set or unset, tbh. (as _clr_
isn't as readable to me)

 /* Starting on QEMU 2.5, qemu_hw_version() returns "2.5+" by default
>   * instead of QEMU_VERSION, so setting hw_version on MachineClass
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index e8bdb02..97577f1 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -309,6 +309,15 @@ void qemu_set_cloexec(int fd)
>      assert(f != -1);
>  }
>
> +void qemu_clr_cloexec(int fd)
> +{
> +    int f;
> +    f = fcntl(fd, F_GETFD);
> +    assert(f != -1);
> +    f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
> +    assert(f != -1);
> +}
>

(asserting() may not be very judicious for calls that we intend to make
during running time, but that's the way it is so far)

+
>  /*
>   * Creates a pipe with FD_CLOEXEC set on both file descriptors
>   */
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index af559ef..46e94d9 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -265,6 +265,10 @@ void qemu_set_cloexec(int fd)
>  {
>  }
>
> +void qemu_clr_cloexec(int fd)
> +{
> +}
> +
>  /* Offset between 1/1/1601 and 1/1/1970 in 100 nanosec units */
>  #define _W32_FT_OFFSET (116444736000000000ULL)
>
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 3094 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 07/25] machine: memfd-alloc option
  2021-07-07 17:20 ` [PATCH V5 07/25] machine: memfd-alloc option Steve Sistare
@ 2021-07-08 14:20   ` Marc-André Lureau
  2021-07-12 17:07     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 14:20 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 8555 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:39 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Allocate anonymous memory using memfd_create if the memfd-alloc machine
> option is set.
>

Nice, I'd suggest you send this patch separately. (we had discussions about
an option like this several times)


> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  hw/core/machine.c   | 19 +++++++++++++++++++
>  include/hw/boards.h |  1 +
>  qemu-options.hx     |  5 +++++
>  softmmu/physmem.c   | 42 +++++++++++++++++++++++++++++++++---------
>  trace-events        |  1 +
>  util/qemu-config.c  |  4 ++++
>  6 files changed, 63 insertions(+), 9 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 57c18f9..f0656a8 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -383,6 +383,20 @@ static void machine_set_mem_merge(Object *obj, bool
> value, Error **errp)
>      ms->mem_merge = value;
>  }
>
> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    return ms->memfd_alloc;
> +}
> +
> +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    ms->memfd_alloc = value;
> +}
> +
>  static bool machine_get_usb(Object *obj, Error **errp)
>  {
>      MachineState *ms = MACHINE(obj);
> @@ -917,6 +931,11 @@ static void machine_class_init(ObjectClass *oc, void
> *data)
>      object_class_property_set_description(oc, "mem-merge",
>          "Enable/disable memory merge support");
>
> +    object_class_property_add_bool(oc, "memfd-alloc",
> +        machine_get_memfd_alloc, machine_set_memfd_alloc);
> +    object_class_property_set_description(oc, "memfd-alloc",
> +        "Enable/disable allocating anonymous memory using memfd_create");
> +
>      object_class_property_add_bool(oc, "usb",
>          machine_get_usb, machine_set_usb);
>      object_class_property_set_description(oc, "usb",
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index accd6ef..299e1ca 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -305,6 +305,7 @@ struct MachineState {
>      char *dt_compatible;
>      bool dump_guest_core;
>      bool mem_merge;
> +    bool memfd_alloc;
>      bool usb;
>      bool usb_disabled;
>      char *firmware;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8965dab..fa53734 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                vmport=on|off|auto controls emulation of vmport
> (default: auto)\n"
>      "                dump-guest-core=on|off include guest memory in a
> core dump (default=on)\n"
>      "                mem-merge=on|off controls memory merge support
> (default: on)\n"
> +    "                memfd-alloc=on|off controls allocating anonymous
> memory using memfd_create (default: off)\n"
>      "                aes-key-wrap=on|off controls support for AES key
> wrapping (default=on)\n"
>      "                dea-key-wrap=on|off controls support for DEA key
> wrapping (default=on)\n"
>      "                suppress-vmdesc=on|off disables self-describing
> migration (default=off)\n"
> @@ -76,6 +77,10 @@ SRST
>          supported by the host, de-duplicates identical memory pages
>          among VMs instances (enabled by default).
>
> +    ``memfd-alloc=on|off``
> +        Enables or disables allocation of anonymous memory using
> memfd_create.
> +        (disabled by default).
> +
>      ``aes-key-wrap=on|off``
>          Enables or disables AES key wrapping support on s390-ccw hosts.
>          This feature controls whether AES wrapping keys will be created
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 9b171c9..b149250 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -64,6 +64,7 @@
>
>  #include "qemu/pmem.h"
>
> +#include "qemu/memfd.h"
>  #include "migration/vmstate.h"
>
>  #include "qemu/range.h"
> @@ -1960,35 +1961,58 @@ static void ram_block_add(RAMBlock *new_block,
> Error **errp)
>      const bool shared = qemu_ram_is_shared(new_block);
>      RAMBlock *block;
>      RAMBlock *last_block = NULL;
> +    struct MemoryRegion *mr = new_block->mr;
>      ram_addr_t old_ram_size, new_ram_size;
>      Error *err = NULL;
> +    const char *name;
> +    void *addr = 0;
> +    size_t maxlen;
> +    MachineState *ms = MACHINE(qdev_get_machine());
>
>      old_ram_size = last_ram_page();
>
>      qemu_mutex_lock_ramlist();
> -    new_block->offset = find_ram_offset(new_block->max_length);
> +    maxlen = new_block->max_length;
> +    new_block->offset = find_ram_offset(maxlen);
>
>      if (!new_block->host) {
>          if (xen_enabled()) {
> -            xen_ram_alloc(new_block->offset, new_block->max_length,
> -                          new_block->mr, &err);
> +            xen_ram_alloc(new_block->offset, maxlen, new_block->mr, &err);
>              if (err) {
>                  error_propagate(errp, err);
>                  qemu_mutex_unlock_ramlist();
>                  return;
>              }
>          } else {
> -            new_block->host = qemu_anon_ram_alloc(new_block->max_length,
> -                                                  &new_block->mr->align,
> -                                                  shared, noreserve);
> -            if (!new_block->host) {
> +            name = memory_region_name(new_block->mr);
> +            if (ms->memfd_alloc) {
> +                int mfd = -1;          /* placeholder until next patch */
> +                mr->align = QEMU_VMALLOC_ALIGN;
> +                if (mfd < 0) {
> +                    mfd = qemu_memfd_create(name, maxlen + mr->align,
> +                                            0, 0, 0, &err);
> +                    if (mfd < 0) {
> +                        return;
> +                    }
> +                }
> +                new_block->flags |= RAM_SHARED;
>

I wonder if ram_backend_memory_alloc() shouldn't be updated to reflect that
the memory backend is "share" = true. And I would say so in the doc as well.


+                addr = file_ram_alloc(new_block, maxlen, mfd,
> +                                      false, false, 0, errp);
> +                trace_anon_memfd_alloc(name, maxlen, addr, mfd);
> +            } else {
> +                addr = qemu_anon_ram_alloc(maxlen, &mr->align,
> +                                           shared, noreserve);
> +            }
> +
> +            if (!addr) {
>                  error_setg_errno(errp, errno,
>                                   "cannot set up guest memory '%s'",
> -                                 memory_region_name(new_block->mr));
> +                                 name);
>                  qemu_mutex_unlock_ramlist();
>                  return;
>              }
> -            memory_try_enable_merging(new_block->host,
> new_block->max_length);
> +            memory_try_enable_merging(addr, maxlen);
> +            new_block->host = addr;
>          }
>      }
>
> diff --git a/trace-events b/trace-events
> index 765fe25..6dbcd0e 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -40,6 +40,7 @@ ram_block_discard_range(const char *rbname, void *hva,
> size_t length, bool need_
>  # accel/tcg/cputlb.c
>  memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned
> size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
>  memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
> +anon_memfd_alloc(const char *name, size_t size, void *ptr, int fd) "%s
> size %zu ptr %p fd %d"
>
>  # gdbstub.c
>  gdbstub_op_start(const char *device) "Starting gdbstub using device %s"
> diff --git a/util/qemu-config.c b/util/qemu-config.c
> index 84ee6dc..6162b4d 100644
> --- a/util/qemu-config.c
> +++ b/util/qemu-config.c
> @@ -207,6 +207,10 @@ static QemuOptsList machine_opts = {
>              .type = QEMU_OPT_BOOL,
>              .help = "enable/disable memory merge support",
>          },{
> +            .name = "memfd-alloc",
> +            .type = QEMU_OPT_BOOL,
> +            .help = "enable/disable memfd_create for anonymous memory",
> +        },{
>              .name = "usb",
>              .type = QEMU_OPT_BOOL,
>              .help = "Set on/off to enable/disable usb",
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 10919 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 08/25] vl: add helper to request re-exec
  2021-07-07 17:20 ` [PATCH V5 08/25] vl: add helper to request re-exec Steve Sistare
@ 2021-07-08 14:31   ` Marc-André Lureau
  2021-07-12 17:07     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 14:31 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 3567 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:46 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Add a qemu_system_exec_request() hook that causes the main loop to exit and
> re-exec qemu using the specified arguments.
>

I assume it works ok with -sandbox on,spawn=allow ?


> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  include/sysemu/runstate.h |  1 +
>  softmmu/runstate.c        | 37 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 38 insertions(+)
>
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index ed4b735..e1ae7e5 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -57,6 +57,7 @@ void qemu_system_wakeup_enable(WakeupReason reason, bool
> enabled);
>  void qemu_register_wakeup_notifier(Notifier *notifier);
>  void qemu_register_wakeup_support(void);
>  void qemu_system_shutdown_request(ShutdownCause reason);
> +void qemu_system_exec_request(strList *args);
>  void qemu_system_powerdown_request(void);
>  void qemu_register_powerdown_notifier(Notifier *notifier);
>  void qemu_register_shutdown_notifier(Notifier *notifier);
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index 7fe4967..8474a01 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -355,6 +355,7 @@ static NotifierList wakeup_notifiers =
>  static NotifierList shutdown_notifiers =
>      NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
>  static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
> +static char **exec_argv;
>
>  ShutdownCause qemu_shutdown_requested_get(void)
>  {
> @@ -371,6 +372,11 @@ static int qemu_shutdown_requested(void)
>      return qatomic_xchg(&shutdown_requested, SHUTDOWN_CAUSE_NONE);
>  }
>
> +static int qemu_exec_requested(void)
> +{
> +    return exec_argv != NULL;
> +}
> +
>  static void qemu_kill_report(void)
>  {
>      if (!qtest_driver() && shutdown_signal) {
> @@ -645,6 +651,32 @@ void qemu_system_shutdown_request(ShutdownCause
> reason)
>      qemu_notify_event();
>  }
>
> +static char **make_argv(strList *args)
>

I'd suggest making it a generic strv_from_strList() function. Take const as
argument too.


> +{
> +    strList *arg;
> +    char **argv;
> +    int n = 1, i = 0;
> +
> +    for (arg = args; arg != NULL; arg = arg->next) {
> +        n++;
> +    }
>

We could use a QAPI_LIST_LENGTH() in qapi/util.h

+
> +    argv = g_malloc(n * sizeof(char *));
> +    for (arg = args; arg != NULL; arg = arg->next) {
> +        argv[i++] = g_strdup(arg->value);
> +    }
> +    argv[i] = NULL;
> +
> +    return argv;
> +}
> +
> +void qemu_system_exec_request(strList *args)
>

const args, and documentation could help.

+{
> +    exec_argv = make_argv(args);
> +    shutdown_requested = 1;
> +    qemu_notify_event();
> +}
> +
>  static void qemu_system_powerdown(void)
>  {
>      qapi_event_send_powerdown();
> @@ -693,6 +725,11 @@ static bool main_loop_should_exit(void)
>      }
>      request = qemu_shutdown_requested();
>      if (request) {
> +
> +        if (qemu_exec_requested()) {
> +            execvp(exec_argv[0], exec_argv);
> +            error_setg_errno(&error_fatal, errno, "execvp failed");
>

Can this be handled more gracefully instead?

g_strfreev the argv and report an error?


> +        }
>          qemu_kill_report();
>          qemu_system_shutdown(request);
>          if (shutdown_action == SHUTDOWN_ACTION_PAUSE) {
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 5285 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 09/25] string to strList
  2021-07-07 17:20 ` [PATCH V5 09/25] string to strList Steve Sistare
@ 2021-07-08 14:37   ` Marc-André Lureau
  0 siblings, 0 replies; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 14:37 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2090 bytes --]

On Wed, Jul 7, 2021 at 9:36 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Generalize strList_from_comma_list to take any delimiter character.
> No functional change.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  monitor/hmp-cmds.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 8e80581..a56f83c 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -71,21 +71,21 @@ void hmp_handle_error(Monitor *mon, Error *err)
>  }
>
>  /*
> - * Produce a strList from a comma separated list.
> - * A NULL or empty input string return NULL.
> + * Produce a strList from a character delimited string.
> + * A NULL or empty input string returns NULL.
>   */
> -static strList *strList_from_comma_list(const char *in)
> +static strList *strList_from_string(const char *in, char delim)
>  {
>      strList *res = NULL;
>      strList **tail = &res;
>
>      while (in && in[0]) {
> -        char *comma = strchr(in, ',');
> +        char *next = strchr(in, delim);
>          char *value;
>
> -        if (comma) {
> -            value = g_strndup(in, comma - in);
> -            in = comma + 1; /* skip the , */
> +        if (next) {
> +            value = g_strndup(in, next - in);
> +            in = next + 1; /* skip the delim */
>          } else {
>              value = g_strdup(in);
>              in = NULL;
> @@ -1170,7 +1170,7 @@ void hmp_announce_self(Monitor *mon, const QDict
> *qdict)
>                                              migrate_announce_params());
>
>      qapi_free_strList(params->interfaces);
> -    params->interfaces = strList_from_comma_list(interfaces_str);
> +    params->interfaces = strList_from_string(interfaces_str, ',');
>      params->has_interfaces = params->interfaces != NULL;
>      params->id = g_strdup(id);
>      params->has_id = !!params->id;
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 3040 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 10/25] util: env var helpers
  2021-07-07 17:20 ` [PATCH V5 10/25] util: env var helpers Steve Sistare
@ 2021-07-08 15:10   ` Marc-André Lureau
  2021-07-12 19:19     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 15:10 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 6008 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:30 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Add functions for saving fd's and other values in the environment via
> setenv, and for reading them back via getenv.
>
>
I understand that the rest of the series will rely on environment variables
to associate and recover the child-passed FDs, but I am not really
convinced that it is a good idea.

Environment variables have a number of issues that we may encounter down
the road: namespace, limits, concurrency, observability etc.. I wonder if
the VMState couldn't have a section about the FD to recover. Or maybe just
another shared memory region?

Some comments below. These new utils could also have some unit tests.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  MAINTAINERS        |  2 ++
>  include/qemu/env.h | 23 +++++++++++++
>  util/env.c         | 95
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  util/meson.build   |  1 +
>  4 files changed, 121 insertions(+)
>  create mode 100644 include/qemu/env.h
>  create mode 100644 util/env.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c48dd37..8647a97 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2865,6 +2865,8 @@ S: Maintained
>  F: include/migration/cpr.h
>  F: migration/cpr.c
>  F: qapi/cpr.json
> +F: include/qemu/env.h
> +F: util/env.c
>
>  Record/replay
>  M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> diff --git a/include/qemu/env.h b/include/qemu/env.h
> new file mode 100644
> index 0000000..3dad503
> --- /dev/null
> +++ b/include/qemu/env.h
> @@ -0,0 +1,23 @@
> +/*
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_ENV_H
> +#define QEMU_ENV_H
> +
> +#define FD_PREFIX "QEMU_FD_"
> +
> +typedef int (*walkenv_cb)(const char *name, const char *val, void
> *handle);
> +
> +int getenv_fd(const char *name);
> +void setenv_fd(const char *name, int fd);
> +void unsetenv_fd(const char *name);
> +void unsetenv_fdv(const char *fmt, ...);
> +int walkenv(const char *prefix, walkenv_cb cb, void *handle);
> +void printenv(void);
>

Please use qemu prefix, that avoids potential confusion with system
libraries.

+
> +#endif
> diff --git a/util/env.c b/util/env.c
> new file mode 100644
> index 0000000..863678d
> --- /dev/null
> +++ b/util/env.c
> @@ -0,0 +1,95 @@
> +/*
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> +#include "qemu/env.h"
> +
> +static uint64_t getenv_ulong(const char *prefix, const char *name, int
> *err)
> +{
> +    char var[80], *val;
> +    uint64_t res = 0;
> +
> +    snprintf(var, sizeof(var), "%s%s", prefix, name);
>

No check for success / truncation...

Please use g_autofree char *var = g_strdup_printf()..

+    val = getenv(var);
>

For consistency, I'd use g_getenv()

+    if (val) {
> +        *err = qemu_strtoul(val, NULL, 10, &res);
> +    } else {
> +        *err = -ENOENT;
> +    }
> +    return res;
> +}
> +
> +static void setenv_ulong(const char *prefix, const char *name, uint64_t
> val)
> +{
> +    char var[80], val_str[80];
> +    snprintf(var, sizeof(var), "%s%s", prefix, name);
> +    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
>

g_strdup_printf

+    setenv(var, val_str, 1);
>

g_setenv(), and return error value (or assert() if that makes more sense)

+}
> +
> +static void unsetenv_ulong(const char *prefix, const char *name)
> +{
> +    char var[80];
> +    snprintf(var, sizeof(var), "%s%s", prefix, name);
>

g_strdup_printf


> +    unsetenv(var);
>

g_unsetenv

+}
> +
> +int getenv_fd(const char *name)
> +{
> +    int err;
> +    int fd = getenv_ulong(FD_PREFIX, name, &err);
>

I'd try to use qemu_parse_fd() instead.

+    return err ? -1 : fd;
> +}
> +
> +void setenv_fd(const char *name, int fd)
> +{
>

Maybe check fd >= 0 ?

+    setenv_ulong(FD_PREFIX, name, fd);
> +}
> +
> +void unsetenv_fd(const char *name)
> +{
> +    unsetenv_ulong(FD_PREFIX, name);
> +}
> +
> +void unsetenv_fdv(const char *fmt, ...)
> +{
> +    va_list args;
> +    char buf[80];
> +    va_start(args, fmt);
> +    vsnprintf(buf, sizeof(buf), fmt, args);
> +    va_end(args);
>

That seems to be a leftover.

+}
> +
> +int walkenv(const char *prefix, walkenv_cb cb, void *handle)
>
+{
> +    char *str, name[128];
> +    char **envp = environ;
> +    size_t prefix_len = strlen(prefix);
> +
> +    while (*envp) {
> +        str = *envp++;
> +        if (!strncmp(str, prefix, prefix_len)) {
>
+            char *val = strchr(str, '=');
> +            str += prefix_len;
> +            strncpy(name, str, val - str);
>

g_strndup() to avoid potential buffer overflow.

+            name[val - str] = 0;
> +            if (cb(name, val + 1, handle)) {
> +                return 1;
> +            }
> +        }
> +    }
> +    return 0;
> +}
> +
> +void printenv(void)
> +{
> +    char **ptr = environ;
> +    while (*ptr) {
> +        puts(*ptr++);
> +    }
> +}
>

Is this really useful? I doubt it.



> diff --git a/util/meson.build b/util/meson.build
> index 0ffd7f4..5e8097a 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -23,6 +23,7 @@ util_ss.add(files('host-utils.c'))
>  util_ss.add(files('bitmap.c', 'bitops.c'))
>  util_ss.add(files('fifo8.c'))
>  util_ss.add(files('cacheinfo.c', 'cacheflush.c'))
> +util_ss.add(files('env.c'))
>  util_ss.add(files('error.c', 'qemu-error.c'))
>  util_ss.add(files('qemu-print.c'))
>  util_ss.add(files('id.c'))
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 9762 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 11/25] cpr: restart mode
  2021-07-07 17:20 ` [PATCH V5 11/25] cpr: restart mode Steve Sistare
@ 2021-07-08 15:43   ` Marc-André Lureau
  2021-07-08 15:54     ` Marc-André Lureau
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 15:43 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 4147 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:31 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Provide the cprsave restart mode, which preserves the guest VM across a
> restart of the qemu process.  After cprsave, the caller passes qemu
> command-line arguments to cprexec, which directly exec's the new qemu
> binary.  The arguments must include -S so new qemu starts in a paused
> state.
> The caller resumes the guest by calling cprload.
>
> To use the restart mode, qemu must be started with the memfd-alloc machine
> option.  The memfd's are saved to the environment and kept open across
> exec,
> after which they are found from the environment and re-mmap'd.  Hence guest
> ram is preserved in place, albeit with new virtual addresses in the qemu
> process.
>
> The restart mode supports vfio devices in a subsequent patch.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>

What's the plan to make it work with -object memory-backend-memfd -machine
memory-backend? (or memory-backend-file, I guess that should work?)

There should be some extra checks before accepting cprexec() on a
misconfigured VM.

---
>  migration/cpr.c   | 21 +++++++++++++++++++++
>  softmmu/physmem.c |  6 +++++-
>  2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/migration/cpr.c b/migration/cpr.c
> index c5bad8a..fb57dec 100644
> --- a/migration/cpr.c
> +++ b/migration/cpr.c
> @@ -29,6 +29,7 @@
>  #include "sysemu/xen.h"
>  #include "hw/vfio/vfio-common.h"
>  #include "hw/virtio/vhost.h"
> +#include "qemu/env.h"
>
>  QEMUFile *qf_file_open(const char *path, int flags, int mode,
>                                const char *name, Error **errp)
> @@ -108,6 +109,26 @@ done:
>      return;
>  }
>
> +static int preserve_fd(const char *name, const char *val, void *handle)
> +{
> +    qemu_clr_cloexec(atoi(val));
> +    return 0;
> +}
> +
> +void cprexec(strList *args, Error **errp)
> +{
> +    if (xen_enabled()) {
> +        error_setg(errp, "xen does not support cprexec");
> +        return;
> +    }
> +    if (!runstate_check(RUN_STATE_SAVE_VM)) {
> +        error_setg(errp, "runstate is not save-vm");
> +        return;
> +    }
> +    walkenv(FD_PREFIX, preserve_fd, 0);


I am  not convinced that relying on environment variables here is the best
thing to do.

+    qemu_system_exec_request(args);
> +}
> +
>  void cprload(const char *file, Error **errp)
>  {
>      QEMUFile *f;
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index b149250..8a65ef7 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -65,6 +65,7 @@
>  #include "qemu/pmem.h"
>
>  #include "qemu/memfd.h"
> +#include "qemu/env.h"
>  #include "migration/vmstate.h"
>
>  #include "qemu/range.h"
> @@ -1986,7 +1987,7 @@ static void ram_block_add(RAMBlock *new_block, Error
> **errp)
>          } else {
>              name = memory_region_name(new_block->mr);
>              if (ms->memfd_alloc) {
>


-                int mfd = -1;          /* placeholder until next patch */
> +                int mfd = getenv_fd(name);
>                  mr->align = QEMU_VMALLOC_ALIGN;
>                  if (mfd < 0) {
>                      mfd = qemu_memfd_create(name, maxlen + mr->align,
> @@ -1994,7 +1995,9 @@ static void ram_block_add(RAMBlock *new_block, Error
> **errp)
>                      if (mfd < 0) {
>                          return;
>                      }
> +                    setenv_fd(name, mfd);
>                  }
> +                qemu_clr_cloexec(mfd);
>

Why clear it now, and on exec again?

                 new_block->flags |= RAM_SHARED;
>                  addr = file_ram_alloc(new_block, maxlen, mfd,
>                                        false, false, 0, errp);
> @@ -2246,6 +2249,7 @@ void qemu_ram_free(RAMBlock *block)
>      }
>
>      qemu_mutex_lock_ramlist();
> +    unsetenv_fd(memory_region_name(block->mr));
>      QLIST_REMOVE_RCU(block, next);
>      ram_list.mru_block = NULL;
>      /* Write list before version */
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 5880 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 12/25] cpr: QMP interfaces for restart
  2021-07-07 17:20 ` [PATCH V5 12/25] cpr: QMP interfaces for restart Steve Sistare
@ 2021-07-08 15:49   ` Marc-André Lureau
  2021-07-12 19:19     ` Steven Sistare
  2021-08-04 16:00   ` Eric Blake
  1 sibling, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 15:49 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:33 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> cprexec calls cprexec().  Syntax:
>   { 'command': 'cprexec', 'data': { 'argv': [ 'str' ] } }
>
> Add the restart mode:
>   { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  monitor/qmp-cmds.c |  5 +++++
>  qapi/cpr.json      | 16 +++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index 1128604..7326f7d 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -179,6 +179,11 @@ void qmp_cprsave(const char *file, CprMode mode,
> Error **errp)
>      cprsave(file, mode, errp);
>  }
>
> +void qmp_cprexec(strList *args, Error **errp)
> +{
> +    cprexec(args, errp);
> +}
> +
>  void qmp_cprload(const char *file, Error **errp)
>  {
>      cprload(file, errp);
> diff --git a/qapi/cpr.json b/qapi/cpr.json
> index b6fdc89..2467e48 100644
> --- a/qapi/cpr.json
> +++ b/qapi/cpr.json
> @@ -16,10 +16,12 @@
>  #
>  # @reboot: checkpoint can be cprload'ed after a host kexec reboot.
>  #
> +# @restart: checkpoint can be cprload'ed after restarting qemu.
> +#
>  # Since: 6.1
>  ##
>  { 'enum': 'CprMode',
> -  'data': [ 'reboot' ] }
> +  'data': [ 'reboot', 'restart' ] }
>
>
>  ##
> @@ -61,6 +63,18 @@
>              'mode': 'CprMode' } }
>
>  ##
> +# @cprexec:
> +#
> +# Restart qemu.
> +#
> +# @argv: arguments to exec
>

Why is it not then called cpr-restart ? Why does it take the whole argv?
Could argv be made optional?

+#
> +# Since: 6.1
> +##
> +{ 'command': 'cprexec',
> +  'data': { 'argv': [ 'str' ] } }
> +
> +##
>  # @cprload:
>  #
>  # Start virtual machine from checkpoint file that was created earlier
> using
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 2898 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 11/25] cpr: restart mode
  2021-07-08 15:43   ` Marc-André Lureau
@ 2021-07-08 15:54     ` Marc-André Lureau
  2021-07-12 19:19       ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 15:54 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 4656 bytes --]

On Thu, Jul 8, 2021 at 7:43 PM Marc-André Lureau <marcandre.lureau@gmail.com>
wrote:

> Hi
>
> On Wed, Jul 7, 2021 at 9:31 PM Steve Sistare <steven.sistare@oracle.com>
> wrote:
>
>> Provide the cprsave restart mode, which preserves the guest VM across a
>> restart of the qemu process.  After cprsave, the caller passes qemu
>> command-line arguments to cprexec, which directly exec's the new qemu
>> binary.  The arguments must include -S so new qemu starts in a paused
>> state.
>> The caller resumes the guest by calling cprload.
>>
>> To use the restart mode, qemu must be started with the memfd-alloc machine
>> option.  The memfd's are saved to the environment and kept open across
>> exec,
>> after which they are found from the environment and re-mmap'd.  Hence
>> guest
>> ram is preserved in place, albeit with new virtual addresses in the qemu
>> process.
>>
>> The restart mode supports vfio devices in a subsequent patch.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>
>
> What's the plan to make it work with -object memory-backend-memfd -machine
> memory-backend? (or memory-backend-file, I guess that should work?)
>
>
It seems to be addressed in some way in a later "hostmem-memfd: cpr
support" patch. Imho it's worth mentioning in the commit message,
reorganize patches closer. And the checks be added anyway for unsupported
configurations.


There should be some extra checks before accepting cprexec() on a
> misconfigured VM.
>
> ---
>>  migration/cpr.c   | 21 +++++++++++++++++++++
>>  softmmu/physmem.c |  6 +++++-
>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/cpr.c b/migration/cpr.c
>> index c5bad8a..fb57dec 100644
>> --- a/migration/cpr.c
>> +++ b/migration/cpr.c
>> @@ -29,6 +29,7 @@
>>  #include "sysemu/xen.h"
>>  #include "hw/vfio/vfio-common.h"
>>  #include "hw/virtio/vhost.h"
>> +#include "qemu/env.h"
>>
>>  QEMUFile *qf_file_open(const char *path, int flags, int mode,
>>                                const char *name, Error **errp)
>> @@ -108,6 +109,26 @@ done:
>>      return;
>>  }
>>
>> +static int preserve_fd(const char *name, const char *val, void *handle)
>> +{
>> +    qemu_clr_cloexec(atoi(val));
>> +    return 0;
>> +}
>> +
>> +void cprexec(strList *args, Error **errp)
>> +{
>> +    if (xen_enabled()) {
>> +        error_setg(errp, "xen does not support cprexec");
>> +        return;
>> +    }
>> +    if (!runstate_check(RUN_STATE_SAVE_VM)) {
>> +        error_setg(errp, "runstate is not save-vm");
>> +        return;
>> +    }
>> +    walkenv(FD_PREFIX, preserve_fd, 0);
>
>
> I am  not convinced that relying on environment variables here is the best
> thing to do.
>
> +    qemu_system_exec_request(args);
>> +}
>> +
>>  void cprload(const char *file, Error **errp)
>>  {
>>      QEMUFile *f;
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index b149250..8a65ef7 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -65,6 +65,7 @@
>>  #include "qemu/pmem.h"
>>
>>  #include "qemu/memfd.h"
>> +#include "qemu/env.h"
>>  #include "migration/vmstate.h"
>>
>>  #include "qemu/range.h"
>> @@ -1986,7 +1987,7 @@ static void ram_block_add(RAMBlock *new_block,
>> Error **errp)
>>          } else {
>>              name = memory_region_name(new_block->mr);
>>              if (ms->memfd_alloc) {
>>
>
>
> -                int mfd = -1;          /* placeholder until next patch */
>> +                int mfd = getenv_fd(name);
>>                  mr->align = QEMU_VMALLOC_ALIGN;
>>                  if (mfd < 0) {
>>                      mfd = qemu_memfd_create(name, maxlen + mr->align,
>> @@ -1994,7 +1995,9 @@ static void ram_block_add(RAMBlock *new_block,
>> Error **errp)
>>                      if (mfd < 0) {
>>                          return;
>>                      }
>> +                    setenv_fd(name, mfd);
>>                  }
>> +                qemu_clr_cloexec(mfd);
>>
>
> Why clear it now, and on exec again?
>
>                  new_block->flags |= RAM_SHARED;
>>                  addr = file_ram_alloc(new_block, maxlen, mfd,
>>                                        false, false, 0, errp);
>> @@ -2246,6 +2249,7 @@ void qemu_ram_free(RAMBlock *block)
>>      }
>>
>>      qemu_mutex_lock_ramlist();
>> +    unsetenv_fd(memory_region_name(block->mr));
>>      QLIST_REMOVE_RCU(block, next);
>>      ram_list.mru_block = NULL;
>>      /* Write list before version */
>> --
>> 1.8.3.1
>>
>>
>>
>
> --
> Marc-André Lureau
>


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 6860 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 20/25] chardev: cpr framework
  2021-07-07 17:20 ` [PATCH V5 20/25] chardev: cpr framework Steve Sistare
@ 2021-07-08 16:03   ` Marc-André Lureau
  2021-07-12 19:20     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-08 16:03 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 11284 bytes --]

Hi

On Wed, Jul 7, 2021 at 9:37 PM Steve Sistare <steven.sistare@oracle.com>
wrote:

> Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
> Add the chardev close_on_cpr option for devices that can be closed on cpr
> and reopened after exec.
> cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr is set
> for all chardevs in the configuration.
>

Why not do the right thing by default?

Could use some tests in tests/unit/test-char.c


> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  chardev/char.c         | 41 ++++++++++++++++++++++++++++++++++++++---
>  include/chardev/char.h |  5 +++++
>  migration/cpr.c        |  3 +++
>  qapi/char.json         |  5 ++++-
>  qemu-options.hx        | 26 ++++++++++++++++++++++----
>  5 files changed, 72 insertions(+), 8 deletions(-)
>
> diff --git a/chardev/char.c b/chardev/char.c
> index d959eec..f10fb94 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -36,6 +36,7 @@
>  #include "qemu/help_option.h"
>  #include "qemu/module.h"
>  #include "qemu/option.h"
> +#include "qemu/env.h"
>  #include "qemu/id.h"
>  #include "qemu/coroutine.h"
>  #include "qemu/yank.h"
> @@ -239,6 +240,9 @@ static void qemu_char_open(Chardev *chr,
> ChardevBackend *backend,
>      ChardevClass *cc = CHARDEV_GET_CLASS(chr);
>      /* Any ChardevCommon member would work */
>      ChardevCommon *common = backend ? backend->u.null.data : NULL;
> +    char fdname[40];
>

Please use g_autoptr char *fdname = NULL; & g_strdup_printf()

+
> +    chr->close_on_cpr = (common && common->close_on_cpr);
>
>      if (common && common->has_logfile) {
>          int flags = O_WRONLY | O_CREAT;
> @@ -248,7 +252,14 @@ static void qemu_char_open(Chardev *chr,
> ChardevBackend *backend,
>          } else {
>              flags |= O_TRUNC;
>          }
> -        chr->logfd = qemu_open_old(common->logfile, flags, 0666);
> +        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
> +        chr->logfd = getenv_fd(fdname);
> +        if (chr->logfd < 0) {
> +            chr->logfd = qemu_open_old(common->logfile, flags, 0666);
> +            if (!chr->close_on_cpr) {
> +                setenv_fd(fdname, chr->logfd);
> +            }
> +        }
>          if (chr->logfd < 0) {
>              error_setg_errno(errp, errno,
>                               "Unable to open logfile %s",
> @@ -300,11 +311,12 @@ static void char_finalize(Object *obj)
>      if (chr->be) {
>          chr->be->chr = NULL;
>      }
> -    g_free(chr->filename);
> -    g_free(chr->label);
>      if (chr->logfd != -1) {
>          close(chr->logfd);
> +        unsetenv_fdv("%s_log", chr->label);
>      }
> +    g_free(chr->filename);
> +    g_free(chr->label);
>      qemu_mutex_destroy(&chr->chr_write_lock);
>  }
>
> @@ -504,6 +516,8 @@ void qemu_chr_parse_common(QemuOpts *opts,
> ChardevCommon *backend)
>
>      backend->has_logappend = true;
>      backend->logappend = qemu_opt_get_bool(opts, "logappend", false);
> +
> +    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr",
> false);
>

If set to true and the backend doesn't implement the CPR feature, it should
raise an error.

 }
>
>  static const ChardevClass *char_get_class(const char *driver, Error
> **errp)
> @@ -945,6 +959,9 @@ QemuOptsList qemu_chardev_opts = {
>          },{
>              .name = "abstract",
>              .type = QEMU_OPT_BOOL,
> +        },{
> +            .name = "close-on-cpr",
> +            .type = QEMU_OPT_BOOL,
>  #endif
>          },
>          { /* end of list */ }
> @@ -1212,6 +1229,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr,
> guint ms,
>      return source;
>  }
>
> +static int chr_cpr_capable(Object *obj, void *opaque)
> +{
> +    Chardev *chr = (Chardev *)obj;
> +    Error **errp = opaque;
> +
> +    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) ||
> chr->close_on_cpr) {
>

That'd be easy to misuse. Chardev should always explicitly support CPR
feature (even if close_on_cpr is set)


> +        return 0;
> +    }
> +    error_setg(errp, "error: chardev %s -> %s is not capable of cpr",
> +               chr->label, chr->filename);
> +    return 1;
> +}
> +
> +bool qemu_chr_cpr_capable(Error **errp)
> +{
> +    return !object_child_foreach(get_chardevs_root(), chr_cpr_capable,
> errp);
> +}
> +
>  void qemu_chr_cleanup(void)
>  {
>      object_unparent(get_chardevs_root());
> diff --git a/include/chardev/char.h b/include/chardev/char.h
> index 7c0444f..e488ad1 100644
> --- a/include/chardev/char.h
> +++ b/include/chardev/char.h
> @@ -50,6 +50,8 @@ typedef enum {
>      /* Whether the gcontext can be changed after calling
>       * qemu_chr_be_update_read_handlers() */
>      QEMU_CHAR_FEATURE_GCONTEXT,
> +    /* Whether the device supports cpr */
> +    QEMU_CHAR_FEATURE_CPR,
>
>      QEMU_CHAR_FEATURE_LAST,
>  } ChardevFeature;
> @@ -67,6 +69,7 @@ struct Chardev {
>      int be_open;
>      /* used to coordinate the chardev-change special-case: */
>      bool handover_yank_instance;
> +    bool close_on_cpr;
>      GSource *gsource;
>      GMainContext *gcontext;
>      DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
> @@ -291,4 +294,6 @@ void resume_mux_open(void);
>  /* console.c */
>  void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error
> **errp);
>
> +bool qemu_chr_cpr_capable(Error **errp);
> +
>  #endif
> diff --git a/migration/cpr.c b/migration/cpr.c
> index 6333988..feff97f 100644
> --- a/migration/cpr.c
> +++ b/migration/cpr.c
> @@ -138,6 +138,9 @@ void cprexec(strList *args, Error **errp)
>          error_setg(errp, "cprexec requires cprsave with restart mode");
>          return;
>      }
> +    if (!qemu_chr_cpr_capable(errp)) {
> +        return;
> +    }
>      if (vfio_cprsave(errp)) {
>          return;
>      }
> diff --git a/qapi/char.json b/qapi/char.json
> index adf2685..5efaf59 100644
> --- a/qapi/char.json
> +++ b/qapi/char.json
> @@ -204,12 +204,15 @@
>  # @logfile: The name of a logfile to save output
>  # @logappend: true to append instead of truncate
>  #             (default to false to truncate)
> +# @close-on-cpr: if true, close device's fd on cprsave. defaults to false.
> +#                since 6.1.
>  #
>  # Since: 2.6
>  ##
>  { 'struct': 'ChardevCommon',
>    'data': { '*logfile': 'str',
> -            '*logappend': 'bool' } }
> +            '*logappend': 'bool',
> +            '*close-on-cpr': 'bool' } }
>
>  ##
>  # @ChardevFile:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index fa53734..d5ff45f 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3134,43 +3134,57 @@ DEFHEADING(Character device options:)
>
>  DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
>      "-chardev help\n"
> -    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "-chardev
> null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
>      "-chardev
> socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off][,reconnect=seconds]\n"
>      "
>  [,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
> -    "
>  [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
> +    "
>  [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID][,close-on-cpr=on|off]
> (tcp)\n"
>      "-chardev
> socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
> -    "
>  [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off]
> (unix)\n"
> +    "
>  [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off]
> (unix)\n"
>      "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
>      "
>  [,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
> -    "         [,logfile=PATH][,logappend=on|off]\n"
> +    "         [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
>      "-chardev
> msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
>      "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>  #ifdef _WIN32
>      "-chardev
> console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>      "-chardev
> serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>  #else
>      "-chardev pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>  #endif
>  #ifdef CONFIG_BRLAPI
>      "-chardev
> braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>  #endif
>  #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
>          || defined(__NetBSD__) || defined(__OpenBSD__) ||
> defined(__DragonFly__)
>      "-chardev
> serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>  #endif
>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
>      "-chardev
> parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>  #endif
>  #if defined(CONFIG_SPICE)
>      "-chardev
> spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>      "-chardev
> spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
> +    "         [,close-on-cpr=on|off]\n"
>  #endif
>      , QEMU_ARCH_ALL
>  )
> @@ -3245,6 +3259,10 @@ The general form of a character device option is:
>      ``logappend`` option controls whether the log file will be truncated
>      or appended to when opened.
>
> +    Every backend supports the ``close-on-cpr`` option.  If on, the
> +    devices's descriptor is closed during cprsave, and reopened after
> exec.
> +    This is useful for devices that do not support cpr.
> +
>  The available backends are:
>
>  ``-chardev null,id=id``
> --
> 1.8.3.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 14377 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 01/25] qemu_ram_volatile
  2021-07-08 12:01   ` Marc-André Lureau
@ 2021-07-12 17:06     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:06 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

Will do for all comments - steve

On 7/8/2021 8:01 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:35 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Add a function that returns true if any ram_list block represents
>     volatile memory.
> 
>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      include/exec/memory.h |  8 ++++++++
>      softmmu/memory.c      | 30 ++++++++++++++++++++++++++++++
>      2 files changed, 38 insertions(+)
> 
>     diff --git a/include/exec/memory.h b/include/exec/memory.h
>     index b116f7c..7ad63f8 100644
>     --- a/include/exec/memory.h
>     +++ b/include/exec/memory.h
>     @@ -2649,6 +2649,14 @@ bool ram_block_discard_is_disabled(void);
>       */
>      bool ram_block_discard_is_required(void);
> 
>     +/**
>     + * qemu_ram_volatile: return true if any memory regions are writable and not
>     + * backed by shared memory.
>     + *
>     + * @errp: returned error message identifying the bad region.
>     + */
>     +bool qemu_ram_volatile(Error **errp);
> 
> 
> Usually, bool-value functions with an error return true on success. If it deviates from the recommendation, it should at least be documented.
> 
> Also, we have a preference for using _is_ in the function name for such tests.
> 
>     +
>      #endif
> 
>      #endif
>     diff --git a/softmmu/memory.c b/softmmu/memory.c
>     index f016151..e9536bc 100644
>     --- a/softmmu/memory.c
>     +++ b/softmmu/memory.c
>     @@ -2714,6 +2714,36 @@ void memory_global_dirty_log_stop(void)
>          memory_global_dirty_log_do_stop();
>      }
> 
>     +/*
>     + * Return true if any memory regions are writable and not backed by shared
>     + * memory.
>     + */
> 
> 
> Let's not duplicate API comments.
> 
>     +bool qemu_ram_volatile(Error **errp)
>     +{
>     +    RAMBlock *block;
>     +    MemoryRegion *mr;
>     +    bool ret = false;
>     +
>     +    rcu_read_lock();
> 
> 
> RCU_READ_LOCK_GUARD()
>  
> 
>     +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> 
> 
> RAMBLOCK_FOREACH() should do.
> 
> Or rather use the qemu_ram_foreach_block() helper.
> 
> 
>     +        mr = block->mr;
>     +        if (mr &&
>     +            memory_region_is_ram(mr) &&
>     +            !memory_region_is_ram_device(mr) &&
>     +            !memory_region_is_rom(mr) &&
>     +            (block->fd == -1 || !qemu_ram_is_shared(block))) {
>     +
>     +            error_setg(errp, "Memory region %s is volatile",
>     +                       memory_region_name(mr));
>     +            ret = true;
>     +            break;
>     +        }
>     +    }
>     +
>     +    rcu_read_unlock();
>     +    return ret;
>     +}
>     +
>      static void listener_add_address_space(MemoryListener *listener,
>                                             AddressSpace *as)
>      {
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 02/25] cpr: reboot mode
  2021-07-08 12:25   ` Marc-André Lureau
@ 2021-07-12 17:07     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:07 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

Will do for all - steve

On 7/8/2021 8:25 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:45 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Provide the cprsave and cprload functions for live update.  These save and
>     restore VM state, with minimal guest pause time, so that qemu may be updated
>     to a new version in between.
> 
>     cprsave stops the VM and saves vmstate to an ordinary file.  It supports any
>     type of guest image and block device, but the caller must not modify guest
>     block devices between cprsave and cprload.
> 
>     cprsave supports several modes, the first of which is reboot.  In this mode,
>     the caller invokes cprsave and then terminates qemu.  The caller may then
>     update the host kernel and system software and reboot.  The caller resumes
>     the guest by running qemu with the same arguments as the original process
>     and invoking cprload.  To use this mode, guest ram must be mapped to a
>     persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.
> 
>     The reboot mode supports vfio devices if the caller first suspends the
>     guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
>     guest drivers' suspend methods flush outstanding requests and re-initialize
>     the devices, and thus there is no device state to save and restore.
> 
>     cprload loads state from the file.  If the VM was running at cprsave time,
>     then VM execution resumes.  If the VM was suspended at cprsave time, then
>     the caller must issue a system_wakeup command to resume.
> 
>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      MAINTAINERS               |   7 +++
>      include/migration/cpr.h   |  17 ++++++
>      include/sysemu/runstate.h |   1 +
>      migration/cpr.c           | 149 ++++++++++++++++++++++++++++++++++++++++++++++
>      migration/meson.build     |   1 +
>      migration/savevm.h        |   2 +
>      softmmu/runstate.c        |  21 ++++++-
>      7 files changed, 197 insertions(+), 1 deletion(-)
>      create mode 100644 include/migration/cpr.h
>      create mode 100644 migration/cpr.c
> 
>     diff --git a/MAINTAINERS b/MAINTAINERS
>     index 684142e..c3573aa 100644
>     --- a/MAINTAINERS
>     +++ b/MAINTAINERS
>     @@ -2858,6 +2858,13 @@ F: net/colo*
>      F: net/filter-rewriter.c
>      F: net/filter-mirror.c
> 
>     +CPR
>     +M: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     +M: Mark Kanda <mark.kanda@oracle.com <mailto:mark.kanda@oracle.com>>
>     +S: Maintained
>     +F: include/migration/cpr.h
>     +F: migration/cpr.c
>     +
>      Record/replay
>      M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru <mailto:pavel.dovgaluk@ispras.ru>>
>      R: Paolo Bonzini <pbonzini@redhat.com <mailto:pbonzini@redhat.com>>
>     diff --git a/include/migration/cpr.h b/include/migration/cpr.h
>     new file mode 100644
>     index 0000000..bffee19
>     --- /dev/null
>     +++ b/include/migration/cpr.h
>     @@ -0,0 +1,17 @@
>     +/*
>     + * Copyright (c) 2021 Oracle and/or its affiliates.
>     + *
>     + * This work is licensed under the terms of the GNU GPL, version 2.
>     + * See the COPYING file in the top-level directory.
>     + */
>     +
>     +#ifndef MIGRATION_CPR_H
>     +#define MIGRATION_CPR_H
>     +
>     +#include "qapi/qapi-types-cpr.h"
>     +
>     +void cprsave(const char *file, CprMode mode, Error **errp);
> 
> 
> I'd rather use "path" or "filename".
> 
>     +void cprexec(strList *args, Error **errp);
>     +void cprload(const char *file, Error **errp);
> 
> 
> same
> 
> It's recommended to return a bool value TRUE for success.
> (see include/qapi/error.h)
> 
>     +
>     +#endif
>     diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>     index a535691..ed4b735 100644
>     --- a/include/sysemu/runstate.h
>     +++ b/include/sysemu/runstate.h
>     @@ -51,6 +51,7 @@ void qemu_system_reset_request(ShutdownCause reason);
>      void qemu_system_suspend_request(void);
>      void qemu_register_suspend_notifier(Notifier *notifier);
>      bool qemu_wakeup_suspend_enabled(void);
>     +void qemu_system_start_on_wake_request(void);
> 
> 
> I suggest introducing the function in a preliminary commit.
> 
> Also for consistency with the rest of symbols, use "wakeup".
> 
>      void qemu_system_wakeup_request(WakeupReason reason, Error **errp);
>      void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
>      void qemu_register_wakeup_notifier(Notifier *notifier);
>     diff --git a/migration/cpr.c b/migration/cpr.c
>     new file mode 100644
>     index 0000000..c5bad8a
>     --- /dev/null
>     +++ b/migration/cpr.c
>     @@ -0,0 +1,149 @@
>     +/*
>     + * Copyright (c) 2021 Oracle and/or its affiliates.
>     + *
>     + * This work is licensed under the terms of the GNU GPL, version 2.
>     + * See the COPYING file in the top-level directory.
>     + */
>     +
>     +#include "qemu/osdep.h"
>     +#include "monitor/monitor.h"
>     +#include "migration.h"
>     +#include "migration/snapshot.h"
>     +#include "chardev/char.h"
>     +#include "migration/misc.h"
>     +#include "migration/cpr.h"
>     +#include "migration/global_state.h"
>     +#include "qemu-file-channel.h"
>     +#include "qemu-file.h"
>     +#include "savevm.h"
>     +#include "qapi/error.h"
>     +#include "qapi/qmp/qerror.h"
>     +#include "qemu/error-report.h"
>     +#include "io/channel-buffer.h"
>     +#include "io/channel-file.h"
>     +#include "sysemu/cpu-timers.h"
>     +#include "sysemu/runstate.h"
>     +#include "sysemu/runstate-action.h"
>     +#include "sysemu/sysemu.h"
>     +#include "sysemu/replay.h"
>     +#include "sysemu/xen.h"
>     +#include "hw/vfio/vfio-common.h"
>     +#include "hw/virtio/vhost.h"
>     +
>     +QEMUFile *qf_file_open(const char *path, int flags, int mode,
>     +                              const char *name, Error **errp)
> 
> 
> None of our functions have qf_ prefix. We are not very consistent with QEMUFile functions, but I suggest to spell it out qemu_file_open().
> 
> Also, it should probably be in migration/qemu-file.c.
> 
>     +{
> 
> 
> I'd ERRP_GUARD on every function with an errp argument.
> 
>     +    QIOChannelFile *fioc;
> 
> 
> Let's not miss an opportunity to use g_auto
>     g_autoptr(QIOChannelFile) fioc = NULL;
> 
>     +    QIOChannel *ioc;
>     +    QEMUFile *f;
>     +
>     +    if (flags & O_RDWR) {
>     +        error_setg(errp, "qf_file_open %s: O_RDWR not supported", path);
>     +        return 0;
>     +    }
>     +
>     +    fioc = qio_channel_file_new_path(path, flags, mode, errp);
>     +    if (!fioc) {
>     +        return 0;
>     +    }
>     +
>     +    ioc = QIO_CHANNEL(fioc);
>     +    qio_channel_set_name(ioc, name);
>     +    f = (flags & O_WRONLY) ? qemu_fopen_channel_output(ioc) :
>     +                             qemu_fopen_channel_input(ioc);
> 
>  +    object_unref(OBJECT(fioc));
>  
> With g_auto, can be removed, and value returned directly.
> 
>     +    return f;
>     +}
>     +
>     +void cprsave(const char *file, CprMode mode, Error **errp)
>     +{
>     +    int ret;
>     +    QEMUFile *f;
>     +    int saved_vm_running = runstate_is_running();
>     +
>     +    if (mode == CPR_MODE_REBOOT && qemu_ram_volatile(errp)) {
>     +        return;
>     +    }
>     +
>     +    if (migrate_colo_enabled()) {
>     +        error_setg(errp, "error: cprsave does not support x-colo");
> 
> 
> Remove error:
> 
>     +        return;
>     +    }
>     +
>     +    if (replay_mode != REPLAY_MODE_NONE) {
>     +        error_setg(errp, "error: cprsave does not support replay");
> 
> 
> same
> 
>     +        return;
>     +    }
>     +
>     +    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, "cprsave", errp);
>     +    if (!f) {
>     +        return;
>     +    }
>     +
>     +    if (global_state_store()) {
>     +        error_setg(errp, "Error saving global state");
>     +        qemu_fclose(f);
>     +        return;
>     +    }
> 
> 
> Could be called before opening cprsave file?
> 
>     +    if (runstate_check(RUN_STATE_SUSPENDED)) {
>     +        /* Update timers_state before saving.  Suspend did not so do. */
>     +        cpu_disable_ticks();
>     +    }
>     +    vm_stop(RUN_STATE_SAVE_VM);
>     +
>     +    ret = qemu_save_device_state(f);
>     +    qemu_fclose(f);
>     +    if (ret < 0) {
>     +        error_setg(errp, "Error %d while saving VM state", ret);
>     +        goto err;
> 
> 
> Needless goto / labels.
>  
> 
>     +    }
>     +
>     +    goto done;
>     +
>     +err:
>     +    if (saved_vm_running) {
>     +        vm_start();
>     +    }
>     +done:
>     +    return;
>     +}
>     +
>     +void cprload(const char *file, Error **errp)
>     +{
>     +    QEMUFile *f;
>     +    int ret;
>     +    RunState state;
>     +
>     +    if (runstate_is_running()) {
>     +        error_setg(errp, "cprload called for a running VM");
>     +        return;
>     +    }
>     +
>     +    f = qf_file_open(file, O_RDONLY, 0, "cprload", errp);
>     +    if (!f) {
>     +        return;
>     +    }
>     +
>     +    if (qemu_get_be32(f) != QEMU_VM_FILE_MAGIC ||
>     +        qemu_get_be32(f) != QEMU_VM_FILE_VERSION) {
>     +        error_setg(errp, "error: %s is not a vmstate file", file);
> 
> 
> f is leaked
> 
>     +        return;
>     +    }
>     +
>     +    ret = qemu_load_device_state(f);
>     +    qemu_fclose(f);
>     +    if (ret < 0) {
>     +        error_setg(errp, "Error %d while loading VM state", ret);
>     +        return;
>     +    }
>     +
>     +    state = global_state_get_runstate();
>     +    if (state == RUN_STATE_RUNNING) {
>     +        vm_start();
>     +    } else {
>     +        runstate_set(state);
>     +        if (runstate_check(RUN_STATE_SUSPENDED)) {
>     +            qemu_system_start_on_wake_request();
>     +        }
>     +    }
>     +}
>     diff --git a/migration/meson.build b/migration/meson.build
>     index f8714dc..fd59281 100644
>     --- a/migration/meson.build
>     +++ b/migration/meson.build
>     @@ -15,6 +15,7 @@ softmmu_ss.add(files(
>        'channel.c',
>        'colo-failover.c',
>        'colo.c',
>     +  'cpr.c',
>        'exec.c',
>        'fd.c',
>        'global_state.c',
>     diff --git a/migration/savevm.h b/migration/savevm.h
>     index 6461342..ce5d710 100644
>     --- a/migration/savevm.h
>     +++ b/migration/savevm.h
>     @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
>      int qemu_load_device_state(QEMUFile *f);
>      int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>              bool in_postcopy, bool inactivate_disks);
>     +QEMUFile *qf_file_open(const char *path, int flags, int mode,
>     +                       const char *name, Error **errp);
> 
>      #endif
>     diff --git a/softmmu/runstate.c b/softmmu/runstate.c
>     index 10d9b73..7fe4967 100644
>     --- a/softmmu/runstate.c
>     +++ b/softmmu/runstate.c
>     @@ -115,6 +115,8 @@ static const RunStateTransition runstate_transitions_def[] = {
>          { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
>          { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
>          { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
>     +    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
>     +    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
> 
>          { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>          { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
>     @@ -335,6 +337,7 @@ void vm_state_notify(bool running, RunState state)
>          }
>      }
> 
>     +static bool start_on_wake_requested;
>      static ShutdownCause reset_requested;
>      static ShutdownCause shutdown_requested;
>      static int shutdown_signal;
>     @@ -562,6 +565,11 @@ void qemu_register_suspend_notifier(Notifier *notifier)
>          notifier_list_add(&suspend_notifiers, notifier);
>      }
> 
>     +void qemu_system_start_on_wake_request(void)
>     +{
>     +    start_on_wake_requested = true;
>     +}
>     +
>      void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
>      {
>          trace_system_wakeup_request(reason);
>     @@ -574,7 +582,18 @@ void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
>          if (!(wakeup_reason_mask & (1 << reason))) {
>              return;
>          }
>     -    runstate_set(RUN_STATE_RUNNING);
>     +
>     +    /*
>     +     * Must call vm_start if it has never been called, to invoke the state
>     +     * change callbacks for the first time.
>     +     */
>     +    if (start_on_wake_requested) {
>     +        start_on_wake_requested = false;
>     +        vm_start();
>     +    } else {
>     +        runstate_set(RUN_STATE_RUNNING);
>     +    }
>     +
>          wakeup_reason = reason;
>          qemu_notify_event();
>      }
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 03/25] cpr: QMP interfaces for reboot
  2021-07-08 13:27   ` Marc-André Lureau
@ 2021-07-12 17:07     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:07 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

Will do for all.  Good idea on schema introspection. - steve

On 7/8/2021 9:27 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:28 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     cprsave calls cprsave().  Syntax:
>       { 'enum': 'CprMode', 'data': [ 'reboot' ] }
>       { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }
> 
>     cprload calls cprload().  Syntax:
>       { 'command': 'cprload', 'data': { 'file': 'str' } }
> 
>     cprinfo returns a list of supported modes.  Syntax:
>       { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
>       { 'command': 'cprinfo', 'returns': 'CprInfo' }
> 
> 
> It may not be necessary, we may instead rely on query-qmp-schema introspection.
> 
> 
>     Signed-off-by: Mark Kanda <mark.kanda@oracle.com <mailto:mark.kanda@oracle.com>>
>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      MAINTAINERS           |  1 +
>      monitor/qmp-cmds.c    | 31 +++++++++++++++++++++
>      qapi/cpr.json         | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++
>      qapi/meson.build      |  1 +
>      qapi/qapi-schema.json |  1 +
>      5 files changed, 108 insertions(+)
>      create mode 100644 qapi/cpr.json
> 
>     diff --git a/MAINTAINERS b/MAINTAINERS
>     index c3573aa..c48dd37 100644
>     --- a/MAINTAINERS
>     +++ b/MAINTAINERS
>     @@ -2864,6 +2864,7 @@ M: Mark Kanda <mark.kanda@oracle.com <mailto:mark.kanda@oracle.com>>
>      S: Maintained
>      F: include/migration/cpr.h
>      F: migration/cpr.c
>     +F: qapi/cpr.json
> 
>      Record/replay
>      M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru <mailto:pavel.dovgaluk@ispras.ru>>
>     diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
>     index f7d64a6..1128604 100644
>     --- a/monitor/qmp-cmds.c
>     +++ b/monitor/qmp-cmds.c
>     @@ -37,9 +37,11 @@
>      #include "qapi/qapi-commands-machine.h"
>      #include "qapi/qapi-commands-misc.h"
>      #include "qapi/qapi-commands-ui.h"
>     +#include "qapi/qapi-commands-cpr.h"
>      #include "qapi/qmp/qerror.h"
>      #include "hw/mem/memory-device.h"
>      #include "hw/acpi/acpi_dev_interface.h"
>     +#include "migration/cpr.h"
> 
>      NameInfo *qmp_query_name(Error **errp)
>      {
>     @@ -153,6 +155,35 @@ void qmp_cont(Error **errp)
>          }
>      }
> 
>     +CprInfo *qmp_cprinfo(Error **errp)
>     +{
>     +    CprInfo *cprinfo;
>     +    CprModeList *mode, *mode_list = NULL;
>     +    CprMode i;
>     +
>     +    cprinfo = g_malloc0(sizeof(*cprinfo));
>     +
>     +    for (i = 0; i < CPR_MODE__MAX; i++) {
>     +        mode = g_malloc0(sizeof(*mode));
>     +        mode->value = i;
>     +        mode->next = mode_list;
>     +        mode_list = mode;
>     +    }
>     +
>     +    cprinfo->modes = mode_list;
>     +    return cprinfo;
>     +}
>     +
>     +void qmp_cprsave(const char *file, CprMode mode, Error **errp)
>     +{
>     +    cprsave(file, mode, errp);
>     +}
>     +
>     +void qmp_cprload(const char *file, Error **errp)
>     +{
>     +    cprload(file, errp);
>     +}
>     +
>      void qmp_system_wakeup(Error **errp)
>      {
>          if (!qemu_wakeup_suspend_enabled()) {
>     diff --git a/qapi/cpr.json b/qapi/cpr.json
>     new file mode 100644
>     index 0000000..b6fdc89
>     --- /dev/null
>     +++ b/qapi/cpr.json
>     @@ -0,0 +1,74 @@
>     +# -*- Mode: Python -*-
>     +#
>     +# Copyright (c) 2021 Oracle and/or its affiliates.
>     +#
>     +# This work is licensed under the terms of the GNU GPL, version 2.
>     +# See the COPYING file in the top-level directory.
>     +
>     +##
>     +# = CPR
> 
> 
> Please spell it out in the doc at least (it's not obvious, I had to search for the meaning in list archives ;).
> 
>     +##
>     +
>     +{ 'include': 'common.json' }
>     +
>     +##
>     +# @CprMode:
>     +#
>     +# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
>     +#
>     +# Since: 6.1
>     +##
>     +{ 'enum': 'CprMode',
>     +  'data': [ 'reboot' ] }
>     +
>     +
>     +##
>     +# @CprInfo:
>     +#
>     +# @modes: @CprMode list
>     +#
>     +# Since: 6.1
>     +##
>     +{ 'struct': 'CprInfo',
>     +  'data': { 'modes': [ 'CprMode' ] } }
>     +
>     +##
>     +# @cprinfo:
>     +#
>     +# Returns the modes supported by @cprsave.
>     +#
>     +# Returns: @CprInfo
>     +#
>     +# Since: 6.1
>     +#
>     +##
>     +{ 'command': 'cprinfo',
>     +  'returns': 'CprInfo' }
>     +
>     +##
>     +# @cprsave:
>     +#
>     +# Create a checkpoint of the virtual machine device state in @file.
>     +# Guest RAM and guest block device blocks are not saved.
>     +#
> 
> 
> It would be worth highlighting the differences with snapshot-save/load.
> 
> I guess it would make sense to consider this as an extension/variant to those commands.
>  
> 
>     +# @file: name of checkpoint file
>     +# @mode: @CprMode mode
>     +#
>     +# Since: 6.1
>     +##
>     +{ 'command': 'cprsave',
>     +  'data': { 'file': 'str',
>     +            'mode': 'CprMode' } }
>     +
>     +##
>     +# @cprload:
>     +#
>     +# Start virtual machine from checkpoint file that was created earlier using
>     +# the cprsave command.
>     +#
>     +# @file: name of checkpoint file
>     +#
>     +# Since: 6.1
>     +##
>     +{ 'command': 'cprload',
>     +  'data': { 'file': 'str' } }
>     diff --git a/qapi/meson.build b/qapi/meson.build
>     index 376f4ce..7e7c48a 100644
>     --- a/qapi/meson.build
>     +++ b/qapi/meson.build
>     @@ -26,6 +26,7 @@ qapi_all_modules = [
>        'common',
>        'compat',
>        'control',
>     +  'cpr',
>        'crypto',
>        'dump',
>        'error',
>     diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
>     index 4912b97..001d790 100644
>     --- a/qapi/qapi-schema.json
>     +++ b/qapi/qapi-schema.json
>     @@ -77,6 +77,7 @@
>      { 'include': 'ui.json' }
>      { 'include': 'authz.json' }
>      { 'include': 'migration.json' }
>     +{ 'include': 'cpr.json' }
>      { 'include': 'transaction.json' }
>      { 'include': 'trace.json' }
>      { 'include': 'compat.json' }
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 05/25] as_flat_walk
  2021-07-08 13:49   ` Marc-André Lureau
@ 2021-07-12 17:07     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:07 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

Will do for all - steve

On 7/8/2021 9:49 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:28 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Add an iterator over the sections of a flattened address space.
> 
>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      include/exec/memory.h | 17 +++++++++++++++++
>      softmmu/memory.c      | 18 ++++++++++++++++++
>      2 files changed, 35 insertions(+)
> 
>     diff --git a/include/exec/memory.h b/include/exec/memory.h
>     index 7ad63f8..a030aef 100644
>     --- a/include/exec/memory.h
>     +++ b/include/exec/memory.h
>     @@ -2023,6 +2023,23 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
>       */
>      bool memory_region_is_mapped(MemoryRegion *mr);
> 
>     +typedef int (*qemu_flat_walk_cb)(MemoryRegionSection *s,
>     +                                 void *handle,
>     +                                 Error **errp);
> 
> 
> Please document the callback type, especially returned values. (see for example flatview_cb)
> 
> Usually, the user pointer is called "opaque".
> 
> Could it be named memory_region_section_cb instead ?
> 
>     +
>     +/**
>     + * as_flat_walk: walk the ranges in the address space flat view and call @func
>     + * for each.  Return 0 on success, else return non-zero with a message in
>     + * @errp.
> 
> 
> Suggest address_space_flat_for_each_section() name ?
> 
>  
> 
>     + *
>     + * @as: target address space
>     + * @func: callback function
>     + * @handle: passed to @func
> 
> 
> opaque
> 
>     + * @errp: passed to @func
>     + */
>     +int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
>     +                 void *handle, Error **errp);
>     +
>      /**
>       * memory_region_find: translate an address/size relative to a
>       * MemoryRegion into a #MemoryRegionSection.
>     diff --git a/softmmu/memory.c b/softmmu/memory.c
>     index e9536bc..1ec1e25 100644
>     --- a/softmmu/memory.c
>     +++ b/softmmu/memory.c
>     @@ -2577,6 +2577,24 @@ bool memory_region_is_mapped(MemoryRegion *mr)
>          return mr->container ? true : false;
>      }
> 
>     +int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func,
>     +                 void *handle, Error **errp)
>     +{
>     +    FlatView *view = address_space_get_flatview(as);
>     +    FlatRange *fr;
>     +    int ret;
>     +
>     +    FOR_EACH_FLAT_RANGE(fr, view) {
>     +        MemoryRegionSection section = section_from_flat_range(fr, view);
>     +        ret = func(&section, handle, errp);
>     +        if (ret) {
>     +            return ret;
>     +        }
>     +    }
>     +
>     +    return 0;
>     +}
>     +
>      /* Same as memory_region_find, but it does not add a reference to the
>       * returned region.  It must be called from an RCU critical section.
>       */
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 06/25] oslib: qemu_clr_cloexec
  2021-07-08 13:58   ` Marc-André Lureau
@ 2021-07-12 17:07     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:07 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 9:58 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:33 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Define qemu_clr_cloexec, analogous to qemu_set_cloexec.
> 
>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com <mailto:dgilbert@redhat.com>>
>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      include/qemu/osdep.h | 1 +
>      util/oslib-posix.c   | 9 +++++++++
>      util/oslib-win32.c   | 4 ++++
>      3 files changed, 14 insertions(+)
> 
>     diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>     index c91a78b..3d6a6ca 100644
>     --- a/include/qemu/osdep.h
>     +++ b/include/qemu/osdep.h
>     @@ -637,6 +637,7 @@ static inline void qemu_timersub(const struct timeval *val1,
>      #endif
> 
>      void qemu_set_cloexec(int fd);
>     +void qemu_clr_cloexec(int fd);
> 
> 
> I wish we would have a single function to set or unset, tbh. (as _clr_ isn't as readable to me)

I would rather not replace the existing qemu_set_cloexec calls, but I will expand clr to clear.

>      /* Starting on QEMU 2.5, qemu_hw_version() returns "2.5+" by default
>       * instead of QEMU_VERSION, so setting hw_version on MachineClass
>     diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>     index e8bdb02..97577f1 100644
>     --- a/util/oslib-posix.c
>     +++ b/util/oslib-posix.c
>     @@ -309,6 +309,15 @@ void qemu_set_cloexec(int fd)
>          assert(f != -1);
>      }
> 
>     +void qemu_clr_cloexec(int fd)
>     +{
>     +    int f;
>     +    f = fcntl(fd, F_GETFD);
>     +    assert(f != -1);
>     +    f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
>     +    assert(f != -1);
>     +}
> 
> 
> (asserting() may not be very judicious for calls that we intend to make during running time, but that's the way it is so far)

yep.

- Steve

>     +
>      /*
>       * Creates a pipe with FD_CLOEXEC set on both file descriptors
>       */
>     diff --git a/util/oslib-win32.c b/util/oslib-win32.c
>     index af559ef..46e94d9 100644
>     --- a/util/oslib-win32.c
>     +++ b/util/oslib-win32.c
>     @@ -265,6 +265,10 @@ void qemu_set_cloexec(int fd)
>      {
>      }
> 
>     +void qemu_clr_cloexec(int fd)
>     +{
>     +}
>     +
>      /* Offset between 1/1/1601 and 1/1/1970 in 100 nanosec units */
>      #define _W32_FT_OFFSET (116444736000000000ULL)
> 
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 07/25] machine: memfd-alloc option
  2021-07-08 14:20   ` Marc-André Lureau
@ 2021-07-12 17:07     ` Steven Sistare
  2021-07-12 17:45       ` Marc-André Lureau
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:07 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 10:20 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:39 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Allocate anonymous memory using memfd_create if the memfd-alloc machine
>     option is set.
> 
> 
> Nice, I'd suggest you send this patch separately. (we had discussions about an option like this several times)

I would like to keep it with this series to make sure it meets our needs as the patches are
reviewed and evolve.  We can always push it solo later if the series stalls.

>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      hw/core/machine.c   | 19 +++++++++++++++++++
>      include/hw/boards.h |  1 +
>      qemu-options.hx     |  5 +++++
>      softmmu/physmem.c   | 42 +++++++++++++++++++++++++++++++++---------
>      trace-events        |  1 +
>      util/qemu-config.c  |  4 ++++
>      6 files changed, 63 insertions(+), 9 deletions(-)
> 
>     diff --git a/hw/core/machine.c b/hw/core/machine.c
>     index 57c18f9..f0656a8 100644
>     --- a/hw/core/machine.c
>     +++ b/hw/core/machine.c
>     @@ -383,6 +383,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
>          ms->mem_merge = value;
>      }
> 
>     +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
>     +{
>     +    MachineState *ms = MACHINE(obj);
>     +
>     +    return ms->memfd_alloc;
>     +}
>     +
>     +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
>     +{
>     +    MachineState *ms = MACHINE(obj);
>     +
>     +    ms->memfd_alloc = value;
>     +}
>     +
>      static bool machine_get_usb(Object *obj, Error **errp)
>      {
>          MachineState *ms = MACHINE(obj);
>     @@ -917,6 +931,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
>          object_class_property_set_description(oc, "mem-merge",
>              "Enable/disable memory merge support");
> 
>     +    object_class_property_add_bool(oc, "memfd-alloc",
>     +        machine_get_memfd_alloc, machine_set_memfd_alloc);
>     +    object_class_property_set_description(oc, "memfd-alloc",
>     +        "Enable/disable allocating anonymous memory using memfd_create");
>     +
>          object_class_property_add_bool(oc, "usb",
>              machine_get_usb, machine_set_usb);
>          object_class_property_set_description(oc, "usb",
>     diff --git a/include/hw/boards.h b/include/hw/boards.h
>     index accd6ef..299e1ca 100644
>     --- a/include/hw/boards.h
>     +++ b/include/hw/boards.h
>     @@ -305,6 +305,7 @@ struct MachineState {
>          char *dt_compatible;
>          bool dump_guest_core;
>          bool mem_merge;
>     +    bool memfd_alloc;
>          bool usb;
>          bool usb_disabled;
>          char *firmware;
>     diff --git a/qemu-options.hx b/qemu-options.hx
>     index 8965dab..fa53734 100644
>     --- a/qemu-options.hx
>     +++ b/qemu-options.hx
>     @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>          "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
>          "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>          "                mem-merge=on|off controls memory merge support (default: on)\n"
>     +    "                memfd-alloc=on|off controls allocating anonymous memory using memfd_create (default: off)\n"
>          "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
>          "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
>          "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
>     @@ -76,6 +77,10 @@ SRST
>              supported by the host, de-duplicates identical memory pages
>              among VMs instances (enabled by default).
> 
>     +    ``memfd-alloc=on|off``
>     +        Enables or disables allocation of anonymous memory using memfd_create.
>     +        (disabled by default).
>     +
>          ``aes-key-wrap=on|off``
>              Enables or disables AES key wrapping support on s390-ccw hosts.
>              This feature controls whether AES wrapping keys will be created
>     diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>     index 9b171c9..b149250 100644
>     --- a/softmmu/physmem.c
>     +++ b/softmmu/physmem.c
>     @@ -64,6 +64,7 @@
> 
>      #include "qemu/pmem.h"
> 
>     +#include "qemu/memfd.h"
>      #include "migration/vmstate.h"
> 
>      #include "qemu/range.h"
>     @@ -1960,35 +1961,58 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>          const bool shared = qemu_ram_is_shared(new_block);
>          RAMBlock *block;
>          RAMBlock *last_block = NULL;
>     +    struct MemoryRegion *mr = new_block->mr;
>          ram_addr_t old_ram_size, new_ram_size;
>          Error *err = NULL;
>     +    const char *name;
>     +    void *addr = 0;
>     +    size_t maxlen;
>     +    MachineState *ms = MACHINE(qdev_get_machine());
> 
>          old_ram_size = last_ram_page();
> 
>          qemu_mutex_lock_ramlist();
>     -    new_block->offset = find_ram_offset(new_block->max_length);
>     +    maxlen = new_block->max_length;
>     +    new_block->offset = find_ram_offset(maxlen);
> 
>          if (!new_block->host) {
>              if (xen_enabled()) {
>     -            xen_ram_alloc(new_block->offset, new_block->max_length,
>     -                          new_block->mr, &err);
>     +            xen_ram_alloc(new_block->offset, maxlen, new_block->mr, &err);
>                  if (err) {
>                      error_propagate(errp, err);
>                      qemu_mutex_unlock_ramlist();
>                      return;
>                  }
>              } else {
>     -            new_block->host = qemu_anon_ram_alloc(new_block->max_length,
>     -                                                  &new_block->mr->align,
>     -                                                  shared, noreserve);
>     -            if (!new_block->host) {
>     +            name = memory_region_name(new_block->mr);
>     +            if (ms->memfd_alloc) {
>     +                int mfd = -1;          /* placeholder until next patch */
>     +                mr->align = QEMU_VMALLOC_ALIGN;
>     +                if (mfd < 0) {
>     +                    mfd = qemu_memfd_create(name, maxlen + mr->align,
>     +                                            0, 0, 0, &err);
>     +                    if (mfd < 0) {
>     +                        return;
>     +                    }
>     +                }
>     +                new_block->flags |= RAM_SHARED;
> 
> 
> I wonder if ram_backend_memory_alloc() shouldn't be updated to reflect that the memory backend is "share" = true. 

It already does this:
  ram_flags = backend->share ? RAM_SHARED : 0;
Did you have something else in mind?

> And I would say so in the doc as well.

Will do.

- Steve

>     +                addr = file_ram_alloc(new_block, maxlen, mfd,
>     +                                      false, false, 0, errp);
>     +                trace_anon_memfd_alloc(name, maxlen, addr, mfd);
>     +            } else {
>     +                addr = qemu_anon_ram_alloc(maxlen, &mr->align,
>     +                                           shared, noreserve);
>     +            }
>     +
>     +            if (!addr) {
>                      error_setg_errno(errp, errno,
>                                       "cannot set up guest memory '%s'",
>     -                                 memory_region_name(new_block->mr));
>     +                                 name);
>                      qemu_mutex_unlock_ramlist();
>                      return;
>                  }
>     -            memory_try_enable_merging(new_block->host, new_block->max_length);
>     +            memory_try_enable_merging(addr, maxlen);
>     +            new_block->host = addr;
>              }
>          }
> 
>     diff --git a/trace-events b/trace-events
>     index 765fe25..6dbcd0e 100644
>     --- a/trace-events
>     +++ b/trace-events
>     @@ -40,6 +40,7 @@ ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_
>      # accel/tcg/cputlb.c
>      memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
>      memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
>     +anon_memfd_alloc(const char *name, size_t size, void *ptr, int fd) "%s size %zu ptr %p fd %d"
> 
>      # gdbstub.c
>      gdbstub_op_start(const char *device) "Starting gdbstub using device %s"
>     diff --git a/util/qemu-config.c b/util/qemu-config.c
>     index 84ee6dc..6162b4d 100644
>     --- a/util/qemu-config.c
>     +++ b/util/qemu-config.c
>     @@ -207,6 +207,10 @@ static QemuOptsList machine_opts = {
>                  .type = QEMU_OPT_BOOL,
>                  .help = "enable/disable memory merge support",
>              },{
>     +            .name = "memfd-alloc",
>     +            .type = QEMU_OPT_BOOL,
>     +            .help = "enable/disable memfd_create for anonymous memory",
>     +        },{
>                  .name = "usb",
>                  .type = QEMU_OPT_BOOL,
>                  .help = "Set on/off to enable/disable usb",
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 08/25] vl: add helper to request re-exec
  2021-07-08 14:31   ` Marc-André Lureau
@ 2021-07-12 17:07     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 17:07 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 10:31 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:46 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Add a qemu_system_exec_request() hook that causes the main loop to exit and
>     re-exec qemu using the specified arguments.
> 
> 
> I assume it works ok with -sandbox on,spawn=allow ?

Yes, I tested that.

Will do for all below.

- Steve

>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      include/sysemu/runstate.h |  1 +
>      softmmu/runstate.c        | 37 +++++++++++++++++++++++++++++++++++++
>      2 files changed, 38 insertions(+)
> 
>     diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>     index ed4b735..e1ae7e5 100644
>     --- a/include/sysemu/runstate.h
>     +++ b/include/sysemu/runstate.h
>     @@ -57,6 +57,7 @@ void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
>      void qemu_register_wakeup_notifier(Notifier *notifier);
>      void qemu_register_wakeup_support(void);
>      void qemu_system_shutdown_request(ShutdownCause reason);
>     +void qemu_system_exec_request(strList *args);
>      void qemu_system_powerdown_request(void);
>      void qemu_register_powerdown_notifier(Notifier *notifier);
>      void qemu_register_shutdown_notifier(Notifier *notifier);
>     diff --git a/softmmu/runstate.c b/softmmu/runstate.c
>     index 7fe4967..8474a01 100644
>     --- a/softmmu/runstate.c
>     +++ b/softmmu/runstate.c
>     @@ -355,6 +355,7 @@ static NotifierList wakeup_notifiers =
>      static NotifierList shutdown_notifiers =
>          NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
>      static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
>     +static char **exec_argv;
> 
>      ShutdownCause qemu_shutdown_requested_get(void)
>      {
>     @@ -371,6 +372,11 @@ static int qemu_shutdown_requested(void)
>          return qatomic_xchg(&shutdown_requested, SHUTDOWN_CAUSE_NONE);
>      }
> 
>     +static int qemu_exec_requested(void)
>     +{
>     +    return exec_argv != NULL;
>     +}
>     +
>      static void qemu_kill_report(void)
>      {
>          if (!qtest_driver() && shutdown_signal) {
>     @@ -645,6 +651,32 @@ void qemu_system_shutdown_request(ShutdownCause reason)
>          qemu_notify_event();
>      }
> 
>     +static char **make_argv(strList *args)
> 
> 
> I'd suggest making it a generic strv_from_strList() function. Take const as argument too.
>  
> 
>     +{
>     +    strList *arg;
>     +    char **argv;
>     +    int n = 1, i = 0;
>     +
>     +    for (arg = args; arg != NULL; arg = arg->next) {
>     +        n++;
>     +    }
> 
> 
> We could use a QAPI_LIST_LENGTH() in qapi/util.h
> 
>     +
>     +    argv = g_malloc(n * sizeof(char *));
>     +    for (arg = args; arg != NULL; arg = arg->next) {
>     +        argv[i++] = g_strdup(arg->value);
>     +    }
>     +    argv[i] = NULL;
>     +
>     +    return argv;
>     +}
>     +
>     +void qemu_system_exec_request(strList *args)
> 
> 
> const args, and documentation could help.
> 
>     +{
>     +    exec_argv = make_argv(args);
>     +    shutdown_requested = 1;
>     +    qemu_notify_event();
>     +}
>     +
>      static void qemu_system_powerdown(void)
>      {
>          qapi_event_send_powerdown();
>     @@ -693,6 +725,11 @@ static bool main_loop_should_exit(void)
>          }
>          request = qemu_shutdown_requested();
>          if (request) {
>     +
>     +        if (qemu_exec_requested()) {
>     +            execvp(exec_argv[0], exec_argv);
>     +            error_setg_errno(&error_fatal, errno, "execvp failed");
> 
> 
> Can this be handled more gracefully instead?
> 
> g_strfreev the argv and report an error?
>  
> 
>     +        }
>              qemu_kill_report();
>              qemu_system_shutdown(request);
>              if (shutdown_action == SHUTDOWN_ACTION_PAUSE) {
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 07/25] machine: memfd-alloc option
  2021-07-12 17:07     ` Steven Sistare
@ 2021-07-12 17:45       ` Marc-André Lureau
  0 siblings, 0 replies; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-12 17:45 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 10383 bytes --]

Hi

On Mon, Jul 12, 2021 at 9:07 PM Steven Sistare <steven.sistare@oracle.com>
wrote:

> On 7/8/2021 10:20 AM, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Jul 7, 2021 at 9:39 PM Steve Sistare <steven.sistare@oracle.com
> <mailto:steven.sistare@oracle.com>> wrote:
> >
> >     Allocate anonymous memory using memfd_create if the memfd-alloc
> machine
> >     option is set.
> >
> >
> > Nice, I'd suggest you send this patch separately. (we had discussions
> about an option like this several times)
>
> I would like to keep it with this series to make sure it meets our needs
> as the patches are
> reviewed and evolve.  We can always push it solo later if the series
> stalls.
>
> >     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:
> steven.sistare@oracle.com>>
> >     ---
> >      hw/core/machine.c   | 19 +++++++++++++++++++
> >      include/hw/boards.h |  1 +
> >      qemu-options.hx     |  5 +++++
> >      softmmu/physmem.c   | 42 +++++++++++++++++++++++++++++++++---------
> >      trace-events        |  1 +
> >      util/qemu-config.c  |  4 ++++
> >      6 files changed, 63 insertions(+), 9 deletions(-)
> >
> >     diff --git a/hw/core/machine.c b/hw/core/machine.c
> >     index 57c18f9..f0656a8 100644
> >     --- a/hw/core/machine.c
> >     +++ b/hw/core/machine.c
> >     @@ -383,6 +383,20 @@ static void machine_set_mem_merge(Object *obj,
> bool value, Error **errp)
> >          ms->mem_merge = value;
> >      }
> >
> >     +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> >     +{
> >     +    MachineState *ms = MACHINE(obj);
> >     +
> >     +    return ms->memfd_alloc;
> >     +}
> >     +
> >     +static void machine_set_memfd_alloc(Object *obj, bool value, Error
> **errp)
> >     +{
> >     +    MachineState *ms = MACHINE(obj);
> >     +
> >     +    ms->memfd_alloc = value;
> >     +}
> >     +
> >      static bool machine_get_usb(Object *obj, Error **errp)
> >      {
> >          MachineState *ms = MACHINE(obj);
> >     @@ -917,6 +931,11 @@ static void machine_class_init(ObjectClass *oc,
> void *data)
> >          object_class_property_set_description(oc, "mem-merge",
> >              "Enable/disable memory merge support");
> >
> >     +    object_class_property_add_bool(oc, "memfd-alloc",
> >     +        machine_get_memfd_alloc, machine_set_memfd_alloc);
> >     +    object_class_property_set_description(oc, "memfd-alloc",
> >     +        "Enable/disable allocating anonymous memory using
> memfd_create");
> >     +
> >          object_class_property_add_bool(oc, "usb",
> >              machine_get_usb, machine_set_usb);
> >          object_class_property_set_description(oc, "usb",
> >     diff --git a/include/hw/boards.h b/include/hw/boards.h
> >     index accd6ef..299e1ca 100644
> >     --- a/include/hw/boards.h
> >     +++ b/include/hw/boards.h
> >     @@ -305,6 +305,7 @@ struct MachineState {
> >          char *dt_compatible;
> >          bool dump_guest_core;
> >          bool mem_merge;
> >     +    bool memfd_alloc;
> >          bool usb;
> >          bool usb_disabled;
> >          char *firmware;
> >     diff --git a/qemu-options.hx b/qemu-options.hx
> >     index 8965dab..fa53734 100644
> >     --- a/qemu-options.hx
> >     +++ b/qemu-options.hx
> >     @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >          "                vmport=on|off|auto controls emulation of
> vmport (default: auto)\n"
> >          "                dump-guest-core=on|off include guest memory in
> a core dump (default=on)\n"
> >          "                mem-merge=on|off controls memory merge support
> (default: on)\n"
> >     +    "                memfd-alloc=on|off controls allocating
> anonymous memory using memfd_create (default: off)\n"
> >          "                aes-key-wrap=on|off controls support for AES
> key wrapping (default=on)\n"
> >          "                dea-key-wrap=on|off controls support for DEA
> key wrapping (default=on)\n"
> >          "                suppress-vmdesc=on|off disables
> self-describing migration (default=off)\n"
> >     @@ -76,6 +77,10 @@ SRST
> >              supported by the host, de-duplicates identical memory pages
> >              among VMs instances (enabled by default).
> >
> >     +    ``memfd-alloc=on|off``
> >     +        Enables or disables allocation of anonymous memory using
> memfd_create.
> >     +        (disabled by default).
> >     +
> >          ``aes-key-wrap=on|off``
> >              Enables or disables AES key wrapping support on s390-ccw
> hosts.
> >              This feature controls whether AES wrapping keys will be
> created
> >     diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> >     index 9b171c9..b149250 100644
> >     --- a/softmmu/physmem.c
> >     +++ b/softmmu/physmem.c
> >     @@ -64,6 +64,7 @@
> >
> >      #include "qemu/pmem.h"
> >
> >     +#include "qemu/memfd.h"
> >      #include "migration/vmstate.h"
> >
> >      #include "qemu/range.h"
> >     @@ -1960,35 +1961,58 @@ static void ram_block_add(RAMBlock
> *new_block, Error **errp)
> >          const bool shared = qemu_ram_is_shared(new_block);
> >          RAMBlock *block;
> >          RAMBlock *last_block = NULL;
> >     +    struct MemoryRegion *mr = new_block->mr;
> >          ram_addr_t old_ram_size, new_ram_size;
> >          Error *err = NULL;
> >     +    const char *name;
> >     +    void *addr = 0;
> >     +    size_t maxlen;
> >     +    MachineState *ms = MACHINE(qdev_get_machine());
> >
> >          old_ram_size = last_ram_page();
> >
> >          qemu_mutex_lock_ramlist();
> >     -    new_block->offset = find_ram_offset(new_block->max_length);
> >     +    maxlen = new_block->max_length;
> >     +    new_block->offset = find_ram_offset(maxlen);
> >
> >          if (!new_block->host) {
> >              if (xen_enabled()) {
> >     -            xen_ram_alloc(new_block->offset, new_block->max_length,
> >     -                          new_block->mr, &err);
> >     +            xen_ram_alloc(new_block->offset, maxlen, new_block->mr,
> &err);
> >                  if (err) {
> >                      error_propagate(errp, err);
> >                      qemu_mutex_unlock_ramlist();
> >                      return;
> >                  }
> >              } else {
> >     -            new_block->host =
> qemu_anon_ram_alloc(new_block->max_length,
> >     -
> &new_block->mr->align,
> >     -                                                  shared,
> noreserve);
> >     -            if (!new_block->host) {
> >     +            name = memory_region_name(new_block->mr);
> >     +            if (ms->memfd_alloc) {
> >     +                int mfd = -1;          /* placeholder until next
> patch */
> >     +                mr->align = QEMU_VMALLOC_ALIGN;
> >     +                if (mfd < 0) {
> >     +                    mfd = qemu_memfd_create(name, maxlen +
> mr->align,
> >     +                                            0, 0, 0, &err);
> >     +                    if (mfd < 0) {
> >     +                        return;
> >     +                    }
> >     +                }
> >     +                new_block->flags |= RAM_SHARED;
> >
> >
> > I wonder if ram_backend_memory_alloc() shouldn't be updated to reflect
> that the memory backend is "share" = true.
>
> It already does this:
>   ram_flags = backend->share ? RAM_SHARED : 0;
> Did you have something else in mind?
>

I mean the backend->share value should be updated, as it's always
RAM_SHARED.


> > And I would say so in the doc as well.
>
> Will do.
>
> - Steve
>
> >     +                addr = file_ram_alloc(new_block, maxlen, mfd,
> >     +                                      false, false, 0, errp);
> >     +                trace_anon_memfd_alloc(name, maxlen, addr, mfd);
> >     +            } else {
> >     +                addr = qemu_anon_ram_alloc(maxlen, &mr->align,
> >     +                                           shared, noreserve);
> >     +            }
> >     +
> >     +            if (!addr) {
> >                      error_setg_errno(errp, errno,
> >                                       "cannot set up guest memory '%s'",
> >     -                                 memory_region_name(new_block->mr));
> >     +                                 name);
> >                      qemu_mutex_unlock_ramlist();
> >                      return;
> >                  }
> >     -            memory_try_enable_merging(new_block->host,
> new_block->max_length);
> >     +            memory_try_enable_merging(addr, maxlen);
> >     +            new_block->host = addr;
> >              }
> >          }
> >
> >     diff --git a/trace-events b/trace-events
> >     index 765fe25..6dbcd0e 100644
> >     --- a/trace-events
> >     +++ b/trace-events
> >     @@ -40,6 +40,7 @@ ram_block_discard_range(const char *rbname, void
> *hva, size_t length, bool need_
> >      # accel/tcg/cputlb.c
> >      memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr,
> unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
> >      memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
> >     +anon_memfd_alloc(const char *name, size_t size, void *ptr, int fd)
> "%s size %zu ptr %p fd %d"
> >
> >      # gdbstub.c
> >      gdbstub_op_start(const char *device) "Starting gdbstub using device
> %s"
> >     diff --git a/util/qemu-config.c b/util/qemu-config.c
> >     index 84ee6dc..6162b4d 100644
> >     --- a/util/qemu-config.c
> >     +++ b/util/qemu-config.c
> >     @@ -207,6 +207,10 @@ static QemuOptsList machine_opts = {
> >                  .type = QEMU_OPT_BOOL,
> >                  .help = "enable/disable memory merge support",
> >              },{
> >     +            .name = "memfd-alloc",
> >     +            .type = QEMU_OPT_BOOL,
> >     +            .help = "enable/disable memfd_create for anonymous
> memory",
> >     +        },{
> >                  .name = "usb",
> >                  .type = QEMU_OPT_BOOL,
> >                  .help = "Set on/off to enable/disable usb",
> >     --
> >     1.8.3.1
> >
> >
> >
> >
> > --
> > Marc-André Lureau
>


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 14022 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 10/25] util: env var helpers
  2021-07-08 15:10   ` Marc-André Lureau
@ 2021-07-12 19:19     ` Steven Sistare
  2021-07-12 19:36       ` Marc-André Lureau
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 19:19 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 11:10 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:30 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Add functions for saving fd's and other values in the environment via
>     setenv, and for reading them back via getenv.
> 
> 
> I understand that the rest of the series will rely on environment variables to associate and recover the child-passed FDs, but I am not really convinced that it is a good idea.
> 
> Environment variables have a number of issues that we may encounter down the road: namespace, limits, concurrency, observability etc.. I wonder if the VMState couldn't have a section about the FD to recover. Or maybe just another shared memory region?

They also have some advantages.  Their post-exec value can be observed via /proc/$pid/environ,
and modified values can be observed by calling printenv() in a debugger.  They are naturally carried
across exec, with no external file to create and potentially lose.  Lastly, libcs already defines
put and get methods, so the additional layered code is small and simple.  The number of variables
is small, and I would rather not over-engineer an alternate solution until the env proves
inadequate.  The limits on env size are huge on Linux.  The limits are smaller on Windows, but
that is just one of multiple issues to be addressed to support live update on windows.

For the alternatives, shared memory is no more observable (maybe less) and also has no concurrency
protection.  VMstate does not help because the descriptors are needed before the vmstate file
is opened.
 
> Some comments below. These new utils could also have some unit tests.

OK.

>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      MAINTAINERS        |  2 ++
>      include/qemu/env.h | 23 +++++++++++++
>      util/env.c         | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>      util/meson.build   |  1 +
>      4 files changed, 121 insertions(+)
>      create mode 100644 include/qemu/env.h
>      create mode 100644 util/env.c
> 
>     diff --git a/MAINTAINERS b/MAINTAINERS
>     index c48dd37..8647a97 100644
>     --- a/MAINTAINERS
>     +++ b/MAINTAINERS
>     @@ -2865,6 +2865,8 @@ S: Maintained
>      F: include/migration/cpr.h
>      F: migration/cpr.c
>      F: qapi/cpr.json
>     +F: include/qemu/env.h
>     +F: util/env.c
> 
>      Record/replay
>      M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru <mailto:pavel.dovgaluk@ispras.ru>>
>     diff --git a/include/qemu/env.h b/include/qemu/env.h
>     new file mode 100644
>     index 0000000..3dad503
>     --- /dev/null
>     +++ b/include/qemu/env.h
>     @@ -0,0 +1,23 @@
>     +/*
>     + * Copyright (c) 2021 Oracle and/or its affiliates.
>     + *
>     + * This work is licensed under the terms of the GNU GPL, version 2.
>     + * See the COPYING file in the top-level directory.
>     + *
>     + */
>     +
>     +#ifndef QEMU_ENV_H
>     +#define QEMU_ENV_H
>     +
>     +#define FD_PREFIX "QEMU_FD_"
>     +
>     +typedef int (*walkenv_cb)(const char *name, const char *val, void *handle);
>     +
>     +int getenv_fd(const char *name);
>     +void setenv_fd(const char *name, int fd);
>     +void unsetenv_fd(const char *name);
>     +void unsetenv_fdv(const char *fmt, ...);
>     +int walkenv(const char *prefix, walkenv_cb cb, void *handle);
>     +void printenv(void);
> 
> 
> Please use qemu prefix, that avoids potential confusion with system libraries.
> 
>     +
>     +#endif
>     diff --git a/util/env.c b/util/env.c
>     new file mode 100644
>     index 0000000..863678d
>     --- /dev/null
>     +++ b/util/env.c
>     @@ -0,0 +1,95 @@
>     +/*
>     + * Copyright (c) 2021 Oracle and/or its affiliates.
>     + *
>     + * This work is licensed under the terms of the GNU GPL, version 2.
>     + * See the COPYING file in the top-level directory.
>     + */
>     +
>     +#include "qemu/osdep.h"
>     +#include "qemu/cutils.h"
>     +#include "qemu/env.h"
>     +
>     +static uint64_t getenv_ulong(const char *prefix, const char *name, int *err)
>     +{
>     +    char var[80], *val;
>     +    uint64_t res = 0;
>     +
>     +    snprintf(var, sizeof(var), "%s%s", prefix, name);
> 
> 
> No check for success / truncation...
> 
> Please use g_autofree char *var = g_strdup_printf()..
> 
>     +    val = getenv(var);
> 
> 
> For consistency, I'd use g_getenv()
> 
>     +    if (val) {
>     +        *err = qemu_strtoul(val, NULL, 10, &res);
>     +    } else {
>     +        *err = -ENOENT;
>     +    }
>     +    return res;
>     +}
>     +
>     +static void setenv_ulong(const char *prefix, const char *name, uint64_t val)
>     +{
>     +    char var[80], val_str[80];
>     +    snprintf(var, sizeof(var), "%s%s", prefix, name);
>     +    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
> 
> 
> g_strdup_printf
> 
>     +    setenv(var, val_str, 1);
> 
> 
> g_setenv(), and return error value (or assert() if that makes more sense)
> 
>     +}
>     +
>     +static void unsetenv_ulong(const char *prefix, const char *name)
>     +{
>     +    char var[80];
>     +    snprintf(var, sizeof(var), "%s%s", prefix, name);
> 
> 
> g_strdup_printf
>  
> 
>     +    unsetenv(var);
> 
> 
> g_unsetenv
> 
>     +}
>     +
>     +int getenv_fd(const char *name)
>     +{
>     +    int err;
>     +    int fd = getenv_ulong(FD_PREFIX, name, &err);
> 
> 
> I'd try to use qemu_parse_fd() instead.
> 
>     +    return err ? -1 : fd;
>     +}
>     +
>     +void setenv_fd(const char *name, int fd)
>     +{
> 
> 
> Maybe check fd >= 0 ?
> 
>     +    setenv_ulong(FD_PREFIX, name, fd);
>     +}
>     +
>     +void unsetenv_fd(const char *name)
>     +{
>     +    unsetenv_ulong(FD_PREFIX, name);
>     +}
>     +
>     +void unsetenv_fdv(const char *fmt, ...)
>     +{
>     +    va_list args;
>     +    char buf[80];
>     +    va_start(args, fmt);
>     +    vsnprintf(buf, sizeof(buf), fmt, args);
>     +    va_end(args);
> 
> 
> That seems to be a leftover.

It is called in the subsequent vfio cpr patches.

>     +}
>     +
>     +int walkenv(const char *prefix, walkenv_cb cb, void *handle)
> 
>     +{
>     +    char *str, name[128];
>     +    char **envp = environ;
>     +    size_t prefix_len = strlen(prefix);
>     +
>     +    while (*envp) {
>     +        str = *envp++;
>     +        if (!strncmp(str, prefix, prefix_len)) {
> 
>     +            char *val = strchr(str, '=');
>     +            str += prefix_len;
>     +            strncpy(name, str, val - str);
> 
> 
> g_strndup() to avoid potential buffer overflow.
> 
>     +            name[val - str] = 0;
>     +            if (cb(name, val + 1, handle)) {
>     +                return 1;
>     +            }
>     +        }
>     +    }
>     +    return 0;
>     +}
>     +
>     +void printenv(void)
>     +{
>     +    char **ptr = environ;
>     +    while (*ptr) {
>     +        puts(*ptr++);
>     +    }
>     +}
> 
> 
> Is this really useful? I doubt it.

I call it from gdb for debugging, but I can delete it and cast g_listenv() instead:
  print *(((char ** (*)(void))g_listenv)())@100

Will do on the rest.

- Steve

>     diff --git a/util/meson.build b/util/meson.build
>     index 0ffd7f4..5e8097a 100644
>     --- a/util/meson.build
>     +++ b/util/meson.build
>     @@ -23,6 +23,7 @@ util_ss.add(files('host-utils.c'))
>      util_ss.add(files('bitmap.c', 'bitops.c'))
>      util_ss.add(files('fifo8.c'))
>      util_ss.add(files('cacheinfo.c', 'cacheflush.c'))
>     +util_ss.add(files('env.c'))
>      util_ss.add(files('error.c', 'qemu-error.c'))
>      util_ss.add(files('qemu-print.c'))
>      util_ss.add(files('id.c'))
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 11/25] cpr: restart mode
  2021-07-08 15:54     ` Marc-André Lureau
@ 2021-07-12 19:19       ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 19:19 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 11:54 AM, Marc-André Lureau wrote:
> On Thu, Jul 8, 2021 at 7:43 PM Marc-André Lureau <marcandre.lureau@gmail.com <mailto:marcandre.lureau@gmail.com>> wrote:
> 
>     Hi
> 
>     On Wed, Jul 7, 2021 at 9:31 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>         Provide the cprsave restart mode, which preserves the guest VM across a
>         restart of the qemu process.  After cprsave, the caller passes qemu
>         command-line arguments to cprexec, which directly exec's the new qemu
>         binary.  The arguments must include -S so new qemu starts in a paused state.
>         The caller resumes the guest by calling cprload.
> 
>         To use the restart mode, qemu must be started with the memfd-alloc machine
>         option.  The memfd's are saved to the environment and kept open across exec,
>         after which they are found from the environment and re-mmap'd.  Hence guest
>         ram is preserved in place, albeit with new virtual addresses in the qemu
>         process.
> 
>         The restart mode supports vfio devices in a subsequent patch.
> 
>         Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
> 
> 
>     What's the plan to make it work with -object memory-backend-memfd -machine memory-backend? > (or memory-backend-file, I guess that should work?)
> 
> 
> It seems to be addressed in some way in a later "hostmem-memfd: cpr support" patch. 

Correct, but in both cases you also need the memfd-alloc machine option so that misc small 
segments are preserved.  For some discussion see:
  https://lore.kernel.org/qemu-devel/YKPEWicpOeh3yo5%2F@stefanha-x1.localdomain/

> Imho it's worth mentioning in the commit message, reorganize patches closer. 

OK.

> And the checks be added anyway for unsupported configurations.

The only-cpr-capable option in the next to last patch performs those checks.

>     There should be some extra checks before accepting cprexec() on a misconfigured VM.
> 
>         ---
>          migration/cpr.c   | 21 +++++++++++++++++++++
>          softmmu/physmem.c |  6 +++++-
>          2 files changed, 26 insertions(+), 1 deletion(-)
> 
>         diff --git a/migration/cpr.c b/migration/cpr.c
>         index c5bad8a..fb57dec 100644
>         --- a/migration/cpr.c
>         +++ b/migration/cpr.c
>         @@ -29,6 +29,7 @@
>          #include "sysemu/xen.h"
>          #include "hw/vfio/vfio-common.h"
>          #include "hw/virtio/vhost.h"
>         +#include "qemu/env.h"
> 
>          QEMUFile *qf_file_open(const char *path, int flags, int mode,
>                                        const char *name, Error **errp)
>         @@ -108,6 +109,26 @@ done:
>              return;
>          }
> 
>         +static int preserve_fd(const char *name, const char *val, void *handle)
>         +{
>         +    qemu_clr_cloexec(atoi(val));
>         +    return 0;
>         +}
>         +
>         +void cprexec(strList *args, Error **errp)
>         +{
>         +    if (xen_enabled()) {
>         +        error_setg(errp, "xen does not support cprexec");
>         +        return;
>         +    }
>         +    if (!runstate_check(RUN_STATE_SAVE_VM)) {
>         +        error_setg(errp, "runstate is not save-vm");
>         +        return;
>         +    }
>         +    walkenv(FD_PREFIX, preserve_fd, 0);
> 
> 
>     I am  not convinced that relying on environment variables here is the best thing to do.
> 
>         +    qemu_system_exec_request(args);
>         +}
>         +
>          void cprload(const char *file, Error **errp)
>          {
>              QEMUFile *f;
>         diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>         index b149250..8a65ef7 100644
>         --- a/softmmu/physmem.c
>         +++ b/softmmu/physmem.c
>         @@ -65,6 +65,7 @@
>          #include "qemu/pmem.h"
> 
>          #include "qemu/memfd.h"
>         +#include "qemu/env.h"
>          #include "migration/vmstate.h"
> 
>          #include "qemu/range.h"
>         @@ -1986,7 +1987,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>                  } else {
>                      name = memory_region_name(new_block->mr);
>                      if (ms->memfd_alloc) {
> 
> 
> 
>         -                int mfd = -1;          /* placeholder until next patch */
>         +                int mfd = getenv_fd(name);
>                          mr->align = QEMU_VMALLOC_ALIGN;
>                          if (mfd < 0) {
>                              mfd = qemu_memfd_create(name, maxlen + mr->align,
>         @@ -1994,7 +1995,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>                              if (mfd < 0) {
>                                  return;
>                              }
>         +                    setenv_fd(name, mfd);
>                          }
>         +                qemu_clr_cloexec(mfd);
> 
> 
>     Why clear it now, and on exec again?

That's a bug, thanks.  This should be qemu_set_cloexec(), so the mfd is closed for
any misc fork/exec calls prior to cprexec.

- Steve


>                          new_block->flags |= RAM_SHARED;
>                          addr = file_ram_alloc(new_block, maxlen, mfd,
>                                                false, false, 0, errp);
>         @@ -2246,6 +2249,7 @@ void qemu_ram_free(RAMBlock *block)
>              }
> 
>              qemu_mutex_lock_ramlist();
>         +    unsetenv_fd(memory_region_name(block->mr));
>              QLIST_REMOVE_RCU(block, next);
>              ram_list.mru_block = NULL;
>              /* Write list before version */
>         -- 
>         1.8.3.1
> 
> 
> 
> 
>     -- 
>     Marc-André Lureau
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 12/25] cpr: QMP interfaces for restart
  2021-07-08 15:49   ` Marc-André Lureau
@ 2021-07-12 19:19     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 19:19 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 11:49 AM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:33 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     cprexec calls cprexec().  Syntax:
>       { 'command': 'cprexec', 'data': { 'argv': [ 'str' ] } }
> 
>     Add the restart mode:
>       { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }
> 
>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      monitor/qmp-cmds.c |  5 +++++
>      qapi/cpr.json      | 16 +++++++++++++++-
>      2 files changed, 20 insertions(+), 1 deletion(-)
> 
>     diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
>     index 1128604..7326f7d 100644
>     --- a/monitor/qmp-cmds.c
>     +++ b/monitor/qmp-cmds.c
>     @@ -179,6 +179,11 @@ void qmp_cprsave(const char *file, CprMode mode, Error **errp)
>          cprsave(file, mode, errp);
>      }
> 
>     +void qmp_cprexec(strList *args, Error **errp)
>     +{
>     +    cprexec(args, errp);
>     +}
>     +
>      void qmp_cprload(const char *file, Error **errp)
>      {
>          cprload(file, errp);
>     diff --git a/qapi/cpr.json b/qapi/cpr.json
>     index b6fdc89..2467e48 100644
>     --- a/qapi/cpr.json
>     +++ b/qapi/cpr.json
>     @@ -16,10 +16,12 @@
>      #
>      # @reboot: checkpoint can be cprload'ed after a host kexec reboot.
>      #
>     +# @restart: checkpoint can be cprload'ed after restarting qemu.
>     +#
>      # Since: 6.1
>      ##
>      { 'enum': 'CprMode',
>     -  'data': [ 'reboot' ] }
>     +  'data': [ 'reboot', 'restart' ] }
> 
> 
>      ##
>     @@ -61,6 +63,18 @@
>                  'mode': 'CprMode' } }
> 
>      ##
>     +# @cprexec:
>     +#
>     +# Restart qemu.
>     +#
>     +# @argv: arguments to exec
> 
> 
> Why is it not then called cpr-restart ? 

I'll change the description.  exec is the key aspect to convey.

< Why does it take the whole argv? 

It takes the whole argv because the caller may provide a prefix command to
modify the process context before executing qemu.  We do that.

> Could argv be made optional?

If argv is omitted, I could exec the qemu binary with no args, but I don't think that 
would be useful.  It may even be confusing, if the caller has a bug and passes no args;
qemu would start and do nothing, rather than fail with an "exec failed" message.

- Steve

>     +#
>     +# Since: 6.1
>     +##
>     +{ 'command': 'cprexec',
>     +  'data': { 'argv': [ 'str' ] } }
>     +
>     +##
>      # @cprload:
>      #
>      # Start virtual machine from checkpoint file that was created earlier using
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 20/25] chardev: cpr framework
  2021-07-08 16:03   ` Marc-André Lureau
@ 2021-07-12 19:20     ` Steven Sistare
  2021-07-12 19:49       ` Marc-André Lureau
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Sistare @ 2021-07-12 19:20 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/8/2021 12:03 PM, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 7, 2021 at 9:37 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
>     Add the chardev close_on_cpr option for devices that can be closed on cpr
>     and reopened after exec.
>     cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr is set
>     for all chardevs in the configuration.
> 
> 
> Why not do the right thing by default?

Char devices with buffering in the qemu process do not support cpr, as there is no general mechanism 
for saving and restoring the buffer and synchronizing that with device operation.  In theory vmstate 
could provide that mechanism, but sync'ing the device with vmstate operations would be non-trivial, 
as every device handles it differently, and I did not tackle it.  However, some very  useful devices 
do not buffer, and do support cpr, so I introduce QEMU_CHAR_FEATURE_CPR to identify them.  CPR support
can be incrementally added to more devices in the future via this mechanism.

> Could use some tests in tests/unit/test-char.c

OK, I'll check it out.  I have deferred adding unit tests until I get more buy in on the patch series.

>     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>
>     ---
>      chardev/char.c         | 41 ++++++++++++++++++++++++++++++++++++++---
>      include/chardev/char.h |  5 +++++
>      migration/cpr.c        |  3 +++
>      qapi/char.json         |  5 ++++-
>      qemu-options.hx        | 26 ++++++++++++++++++++++----
>      5 files changed, 72 insertions(+), 8 deletions(-)
> 
>     diff --git a/chardev/char.c b/chardev/char.c
>     index d959eec..f10fb94 100644
>     --- a/chardev/char.c
>     +++ b/chardev/char.c
>     @@ -36,6 +36,7 @@
>      #include "qemu/help_option.h"
>      #include "qemu/module.h"
>      #include "qemu/option.h"
>     +#include "qemu/env.h"
>      #include "qemu/id.h"
>      #include "qemu/coroutine.h"
>      #include "qemu/yank.h"
>     @@ -239,6 +240,9 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
>          ChardevClass *cc = CHARDEV_GET_CLASS(chr);
>          /* Any ChardevCommon member would work */
>          ChardevCommon *common = backend ? backend->u.null.data : NULL;
>     +    char fdname[40];
> 
> 
> Please use g_autoptr char *fdname = NULL; & g_strdup_printf()

Will do.  
(the glibc functions are new to me, and my fingers do not automatically type them).

>     +
>     +    chr->close_on_cpr = (common && common->close_on_cpr);
> 
>          if (common && common->has_logfile) {
>              int flags = O_WRONLY | O_CREAT;
>     @@ -248,7 +252,14 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
>              } else {
>                  flags |= O_TRUNC;
>              }
>     -        chr->logfd = qemu_open_old(common->logfile, flags, 0666);
>     +        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
>     +        chr->logfd = getenv_fd(fdname);
>     +        if (chr->logfd < 0) {
>     +            chr->logfd = qemu_open_old(common->logfile, flags, 0666);
>     +            if (!chr->close_on_cpr) {
>     +                setenv_fd(fdname, chr->logfd);
>     +            }
>     +        }
>              if (chr->logfd < 0) {
>                  error_setg_errno(errp, errno,
>                                   "Unable to open logfile %s",
>     @@ -300,11 +311,12 @@ static void char_finalize(Object *obj)
>          if (chr->be) {
>              chr->be->chr = NULL;
>          }
>     -    g_free(chr->filename);
>     -    g_free(chr->label);
>          if (chr->logfd != -1) {
>              close(chr->logfd);
>     +        unsetenv_fdv("%s_log", chr->label);
>          }
>     +    g_free(chr->filename);
>     +    g_free(chr->label);
>          qemu_mutex_destroy(&chr->chr_write_lock);
>      }
> 
>     @@ -504,6 +516,8 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend)
> 
>          backend->has_logappend = true;
>          backend->logappend = qemu_opt_get_bool(opts, "logappend", false);
>     +
>     +    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr", false);
> 
> 
> If set to true and the backend doesn't implement the CPR feature, it should raise an error.

Setting to true is the workaround for missing CPR support, so that cpr may still be performed.  
The device will be reopened post exec.  That is not as nice as transparently preserving the device, 
but is nicer than disallowing cpr because some device(s) of many do not support it.

>      }
> 
>      static const ChardevClass *char_get_class(const char *driver, Error **errp)
>     @@ -945,6 +959,9 @@ QemuOptsList qemu_chardev_opts = {
>              },{
>                  .name = "abstract",
>                  .type = QEMU_OPT_BOOL,
>     +        },{
>     +            .name = "close-on-cpr",
>     +            .type = QEMU_OPT_BOOL,
>      #endif
>              },
>              { /* end of list */ }
>     @@ -1212,6 +1229,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
>          return source;
>      }
> 
>     +static int chr_cpr_capable(Object *obj, void *opaque)
>     +{
>     +    Chardev *chr = (Chardev *)obj;
>     +    Error **errp = opaque;
>     +
>     +    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) || chr->close_on_cpr) {
> 
> 
> That'd be easy to misuse. Chardev should always explicitly support CPR feature (even if close_on_cpr is set)

Given my explanation at top, does this make sense now?

- Steve


>     +        return 0;
>     +    }
>     +    error_setg(errp, "error: chardev %s -> %s is not capable of cpr",
>     +               chr->label, chr->filename);
>     +    return 1;
>     +}
>     +
>     +bool qemu_chr_cpr_capable(Error **errp)
>     +{
>     +    return !object_child_foreach(get_chardevs_root(), chr_cpr_capable, errp);
>     +}
>     +
>      void qemu_chr_cleanup(void)
>      {
>          object_unparent(get_chardevs_root());
>     diff --git a/include/chardev/char.h b/include/chardev/char.h
>     index 7c0444f..e488ad1 100644
>     --- a/include/chardev/char.h
>     +++ b/include/chardev/char.h
>     @@ -50,6 +50,8 @@ typedef enum {
>          /* Whether the gcontext can be changed after calling
>           * qemu_chr_be_update_read_handlers() */
>          QEMU_CHAR_FEATURE_GCONTEXT,
>     +    /* Whether the device supports cpr */
>     +    QEMU_CHAR_FEATURE_CPR,
> 
>          QEMU_CHAR_FEATURE_LAST,
>      } ChardevFeature;
>     @@ -67,6 +69,7 @@ struct Chardev {
>          int be_open;
>          /* used to coordinate the chardev-change special-case: */
>          bool handover_yank_instance;
>     +    bool close_on_cpr;
>          GSource *gsource;
>          GMainContext *gcontext;
>          DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
>     @@ -291,4 +294,6 @@ void resume_mux_open(void);
>      /* console.c */
>      void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
> 
>     +bool qemu_chr_cpr_capable(Error **errp);
>     +
>      #endif
>     diff --git a/migration/cpr.c b/migration/cpr.c
>     index 6333988..feff97f 100644
>     --- a/migration/cpr.c
>     +++ b/migration/cpr.c
>     @@ -138,6 +138,9 @@ void cprexec(strList *args, Error **errp)
>              error_setg(errp, "cprexec requires cprsave with restart mode");
>              return;
>          }
>     +    if (!qemu_chr_cpr_capable(errp)) {
>     +        return;
>     +    }
>          if (vfio_cprsave(errp)) {
>              return;
>          }
>     diff --git a/qapi/char.json b/qapi/char.json
>     index adf2685..5efaf59 100644
>     --- a/qapi/char.json
>     +++ b/qapi/char.json
>     @@ -204,12 +204,15 @@
>      # @logfile: The name of a logfile to save output
>      # @logappend: true to append instead of truncate
>      #             (default to false to truncate)
>     +# @close-on-cpr: if true, close device's fd on cprsave. defaults to false.
>     +#                since 6.1.
>      #
>      # Since: 2.6
>      ##
>      { 'struct': 'ChardevCommon',
>        'data': { '*logfile': 'str',
>     -            '*logappend': 'bool' } }
>     +            '*logappend': 'bool',
>     +            '*close-on-cpr': 'bool' } }
> 
>      ##
>      # @ChardevFile:
>     diff --git a/qemu-options.hx b/qemu-options.hx
>     index fa53734..d5ff45f 100644
>     --- a/qemu-options.hx
>     +++ b/qemu-options.hx
>     @@ -3134,43 +3134,57 @@ DEFHEADING(Character device options:)
> 
>      DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
>          "-chardev help\n"
>     -    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
>          "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off][,reconnect=seconds]\n"
>          "         [,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
>     -    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
>     +    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID][,close-on-cpr=on|off] (tcp)\n"
>          "-chardev socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
>     -    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off] (unix)\n"
>     +    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off] (unix)\n"
>          "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
>          "         [,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
>     -    "         [,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
>          "-chardev msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
>          "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>      #ifdef _WIN32
>          "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>          "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>      #else
>          "-chardev pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>      #endif
>      #ifdef CONFIG_BRLAPI
>          "-chardev braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>      #endif
>      #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
>              || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__DragonFly__)
>          "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>      #endif
>      #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
>          "-chardev parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>      #endif
>      #if defined(CONFIG_SPICE)
>          "-chardev spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>          "-chardev spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
>     +    "         [,close-on-cpr=on|off]\n"
>      #endif
>          , QEMU_ARCH_ALL
>      )
>     @@ -3245,6 +3259,10 @@ The general form of a character device option is:
>          ``logappend`` option controls whether the log file will be truncated
>          or appended to when opened.
> 
>     +    Every backend supports the ``close-on-cpr`` option.  If on, the
>     +    devices's descriptor is closed during cprsave, and reopened after exec.
>     +    This is useful for devices that do not support cpr.
>     +
>      The available backends are:
> 
>      ``-chardev null,id=id``
>     -- 
>     1.8.3.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 10/25] util: env var helpers
  2021-07-12 19:19     ` Steven Sistare
@ 2021-07-12 19:36       ` Marc-André Lureau
  2021-07-13 16:15         ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-12 19:36 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 9257 bytes --]

Hi

On Mon, Jul 12, 2021 at 11:19 PM Steven Sistare <steven.sistare@oracle.com>
wrote:

> On 7/8/2021 11:10 AM, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Jul 7, 2021 at 9:30 PM Steve Sistare <steven.sistare@oracle.com
> <mailto:steven.sistare@oracle.com>> wrote:
> >
> >     Add functions for saving fd's and other values in the environment via
> >     setenv, and for reading them back via getenv.
> >
> >
> > I understand that the rest of the series will rely on environment
> variables to associate and recover the child-passed FDs, but I am not
> really convinced that it is a good idea.
> >
> > Environment variables have a number of issues that we may encounter down
> the road: namespace, limits, concurrency, observability etc.. I wonder if
> the VMState couldn't have a section about the FD to recover. Or maybe just
> another shared memory region?
>
> They also have some advantages.  Their post-exec value can be observed via
> /proc/$pid/environ,
> and modified values can be observed by calling printenv() in a debugger.
> They are naturally carried
> across exec, with no external file to create and potentially lose.
> Lastly, libcs already defines
> put and get methods, so the additional layered code is small and simple.
> The number of variables
> is small, and I would rather not over-engineer an alternate solution until
> the env proves
> inadequate.  The limits on env size are huge on Linux.  The limits are
> smaller on Windows, but
> that is just one of multiple issues to be addressed to support live update
> on windows.
>
> For the alternatives, shared memory is no more observable (maybe less) and
> also has no concurrency
> protection.  VMstate does not help because the descriptors are needed
> before the vmstate file
> is opened.
>

Why does it need to be "observable" from outside the process?

I meant memory to be shared between the qemu instances (without concurrency
etc).

You would only need that memory fd to be passed as argument to the next
qemu instance, to restore the rest of the contexts/fds I suppose.

I think we need to do this right, as it may have consequences for future
updates. It's effectively a kind of protocol. We have better chances to
handle different versions correctly by reusing VMState imho.


> > Some comments below. These new utils could also have some unit tests.
>
> OK.
>
> >     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:
> steven.sistare@oracle.com>>
> >     ---
> >      MAINTAINERS        |  2 ++
> >      include/qemu/env.h | 23 +++++++++++++
> >      util/env.c         | 95
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >      util/meson.build   |  1 +
> >      4 files changed, 121 insertions(+)
> >      create mode 100644 include/qemu/env.h
> >      create mode 100644 util/env.c
> >
> >     diff --git a/MAINTAINERS b/MAINTAINERS
> >     index c48dd37..8647a97 100644
> >     --- a/MAINTAINERS
> >     +++ b/MAINTAINERS
> >     @@ -2865,6 +2865,8 @@ S: Maintained
> >      F: include/migration/cpr.h
> >      F: migration/cpr.c
> >      F: qapi/cpr.json
> >     +F: include/qemu/env.h
> >     +F: util/env.c
> >
> >      Record/replay
> >      M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru <mailto:
> pavel.dovgaluk@ispras.ru>>
> >     diff --git a/include/qemu/env.h b/include/qemu/env.h
> >     new file mode 100644
> >     index 0000000..3dad503
> >     --- /dev/null
> >     +++ b/include/qemu/env.h
> >     @@ -0,0 +1,23 @@
> >     +/*
> >     + * Copyright (c) 2021 Oracle and/or its affiliates.
> >     + *
> >     + * This work is licensed under the terms of the GNU GPL, version 2.
> >     + * See the COPYING file in the top-level directory.
> >     + *
> >     + */
> >     +
> >     +#ifndef QEMU_ENV_H
> >     +#define QEMU_ENV_H
> >     +
> >     +#define FD_PREFIX "QEMU_FD_"
> >     +
> >     +typedef int (*walkenv_cb)(const char *name, const char *val, void
> *handle);
> >     +
> >     +int getenv_fd(const char *name);
> >     +void setenv_fd(const char *name, int fd);
> >     +void unsetenv_fd(const char *name);
> >     +void unsetenv_fdv(const char *fmt, ...);
> >     +int walkenv(const char *prefix, walkenv_cb cb, void *handle);
> >     +void printenv(void);
> >
> >
> > Please use qemu prefix, that avoids potential confusion with system
> libraries.
> >
> >     +
> >     +#endif
> >     diff --git a/util/env.c b/util/env.c
> >     new file mode 100644
> >     index 0000000..863678d
> >     --- /dev/null
> >     +++ b/util/env.c
> >     @@ -0,0 +1,95 @@
> >     +/*
> >     + * Copyright (c) 2021 Oracle and/or its affiliates.
> >     + *
> >     + * This work is licensed under the terms of the GNU GPL, version 2.
> >     + * See the COPYING file in the top-level directory.
> >     + */
> >     +
> >     +#include "qemu/osdep.h"
> >     +#include "qemu/cutils.h"
> >     +#include "qemu/env.h"
> >     +
> >     +static uint64_t getenv_ulong(const char *prefix, const char *name,
> int *err)
> >     +{
> >     +    char var[80], *val;
> >     +    uint64_t res = 0;
> >     +
> >     +    snprintf(var, sizeof(var), "%s%s", prefix, name);
> >
> >
> > No check for success / truncation...
> >
> > Please use g_autofree char *var = g_strdup_printf()..
> >
> >     +    val = getenv(var);
> >
> >
> > For consistency, I'd use g_getenv()
> >
> >     +    if (val) {
> >     +        *err = qemu_strtoul(val, NULL, 10, &res);
> >     +    } else {
> >     +        *err = -ENOENT;
> >     +    }
> >     +    return res;
> >     +}
> >     +
> >     +static void setenv_ulong(const char *prefix, const char *name,
> uint64_t val)
> >     +{
> >     +    char var[80], val_str[80];
> >     +    snprintf(var, sizeof(var), "%s%s", prefix, name);
> >     +    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
> >
> >
> > g_strdup_printf
> >
> >     +    setenv(var, val_str, 1);
> >
> >
> > g_setenv(), and return error value (or assert() if that makes more sense)
> >
> >     +}
> >     +
> >     +static void unsetenv_ulong(const char *prefix, const char *name)
> >     +{
> >     +    char var[80];
> >     +    snprintf(var, sizeof(var), "%s%s", prefix, name);
> >
> >
> > g_strdup_printf
> >
> >
> >     +    unsetenv(var);
> >
> >
> > g_unsetenv
> >
> >     +}
> >     +
> >     +int getenv_fd(const char *name)
> >     +{
> >     +    int err;
> >     +    int fd = getenv_ulong(FD_PREFIX, name, &err);
> >
> >
> > I'd try to use qemu_parse_fd() instead.
> >
> >     +    return err ? -1 : fd;
> >     +}
> >     +
> >     +void setenv_fd(const char *name, int fd)
> >     +{
> >
> >
> > Maybe check fd >= 0 ?
> >
> >     +    setenv_ulong(FD_PREFIX, name, fd);
> >     +}
> >     +
> >     +void unsetenv_fd(const char *name)
> >     +{
> >     +    unsetenv_ulong(FD_PREFIX, name);
> >     +}
> >     +
> >     +void unsetenv_fdv(const char *fmt, ...)
> >     +{
> >     +    va_list args;
> >     +    char buf[80];
> >     +    va_start(args, fmt);
> >     +    vsnprintf(buf, sizeof(buf), fmt, args);
> >     +    va_end(args);
> >
> >
> > That seems to be a leftover.
>
> It is called in the subsequent vfio cpr patches.
>
> >     +}
> >     +
> >     +int walkenv(const char *prefix, walkenv_cb cb, void *handle)
> >
> >     +{
> >     +    char *str, name[128];
> >     +    char **envp = environ;
> >     +    size_t prefix_len = strlen(prefix);
> >     +
> >     +    while (*envp) {
> >     +        str = *envp++;
> >     +        if (!strncmp(str, prefix, prefix_len)) {
> >
> >     +            char *val = strchr(str, '=');
> >     +            str += prefix_len;
> >     +            strncpy(name, str, val - str);
> >
> >
> > g_strndup() to avoid potential buffer overflow.
> >
> >     +            name[val - str] = 0;
> >     +            if (cb(name, val + 1, handle)) {
> >     +                return 1;
> >     +            }
> >     +        }
> >     +    }
> >     +    return 0;
> >     +}
> >     +
> >     +void printenv(void)
> >     +{
> >     +    char **ptr = environ;
> >     +    while (*ptr) {
> >     +        puts(*ptr++);
> >     +    }
> >     +}
> >
> >
> > Is this really useful? I doubt it.
>
> I call it from gdb for debugging, but I can delete it and cast g_listenv()
> instead:
>   print *(((char ** (*)(void))g_listenv)())@100
>

Or just *environ@N ?


> Will do on the rest.
>
> - Steve
>
> >     diff --git a/util/meson.build b/util/meson.build
> >     index 0ffd7f4..5e8097a 100644
> >     --- a/util/meson.build
> >     +++ b/util/meson.build
> >     @@ -23,6 +23,7 @@ util_ss.add(files('host-utils.c'))
> >      util_ss.add(files('bitmap.c', 'bitops.c'))
> >      util_ss.add(files('fifo8.c'))
> >      util_ss.add(files('cacheinfo.c', 'cacheflush.c'))
> >     +util_ss.add(files('env.c'))
> >      util_ss.add(files('error.c', 'qemu-error.c'))
> >      util_ss.add(files('qemu-print.c'))
> >      util_ss.add(files('id.c'))
> >     --
> >     1.8.3.1
> >
> >
> >
> >
> > --
> > Marc-André Lureau
>


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12566 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 20/25] chardev: cpr framework
  2021-07-12 19:20     ` Steven Sistare
@ 2021-07-12 19:49       ` Marc-André Lureau
  2021-07-13 14:34         ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Marc-André Lureau @ 2021-07-12 19:49 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 14731 bytes --]

Hi

On Mon, Jul 12, 2021 at 11:20 PM Steven Sistare <steven.sistare@oracle.com>
wrote:

> On 7/8/2021 12:03 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Jul 7, 2021 at 9:37 PM Steve Sistare <steven.sistare@oracle.com
> <mailto:steven.sistare@oracle.com>> wrote:
> >
> >     Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
> >     Add the chardev close_on_cpr option for devices that can be closed
> on cpr
> >     and reopened after exec.
> >     cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr
> is set
> >     for all chardevs in the configuration.
> >
> >
> > Why not do the right thing by default?
>
> Char devices with buffering in the qemu process do not support cpr, as
> there is no general mechanism
> for saving and restoring the buffer and synchronizing that with device
> operation.  In theory vmstate
> could provide that mechanism, but sync'ing the device with vmstate
> operations would be non-trivial,
> as every device handles it differently, and I did not tackle it.  However,
> some very  useful devices
> do not buffer, and do support cpr, so I introduce QEMU_CHAR_FEATURE_CPR to
> identify them.  CPR support
> can be incrementally added to more devices in the future via this
> mechanism.
>
> > Could use some tests in tests/unit/test-char.c
>
> OK, I'll check it out.  I have deferred adding unit tests until I get more
> buy in on the patch series.
>

I understand :) Tbh, I have no clue if you are close to acceptance. (too
late for 6.1 anyway, you can already update the docs)


> >     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:
> steven.sistare@oracle.com>>
> >     ---
> >      chardev/char.c         | 41
> ++++++++++++++++++++++++++++++++++++++---
> >      include/chardev/char.h |  5 +++++
> >      migration/cpr.c        |  3 +++
> >      qapi/char.json         |  5 ++++-
> >      qemu-options.hx        | 26 ++++++++++++++++++++++----
> >      5 files changed, 72 insertions(+), 8 deletions(-)
> >
> >     diff --git a/chardev/char.c b/chardev/char.c
> >     index d959eec..f10fb94 100644
> >     --- a/chardev/char.c
> >     +++ b/chardev/char.c
> >     @@ -36,6 +36,7 @@
> >      #include "qemu/help_option.h"
> >      #include "qemu/module.h"
> >      #include "qemu/option.h"
> >     +#include "qemu/env.h"
> >      #include "qemu/id.h"
> >      #include "qemu/coroutine.h"
> >      #include "qemu/yank.h"
> >     @@ -239,6 +240,9 @@ static void qemu_char_open(Chardev *chr,
> ChardevBackend *backend,
> >          ChardevClass *cc = CHARDEV_GET_CLASS(chr);
> >          /* Any ChardevCommon member would work */
> >          ChardevCommon *common = backend ? backend->u.null.data : NULL;
> >     +    char fdname[40];
> >
> >
> > Please use g_autoptr char *fdname = NULL; & g_strdup_printf()
>
> Will do.
> (the glibc functions are new to me, and my fingers do not automatically
> type them).
>
> >     +
> >     +    chr->close_on_cpr = (common && common->close_on_cpr);
> >
> >          if (common && common->has_logfile) {
> >              int flags = O_WRONLY | O_CREAT;
> >     @@ -248,7 +252,14 @@ static void qemu_char_open(Chardev *chr,
> ChardevBackend *backend,
> >              } else {
> >                  flags |= O_TRUNC;
> >              }
> >     -        chr->logfd = qemu_open_old(common->logfile, flags, 0666);
> >     +        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
> >     +        chr->logfd = getenv_fd(fdname);
> >     +        if (chr->logfd < 0) {
> >     +            chr->logfd = qemu_open_old(common->logfile, flags,
> 0666);
> >     +            if (!chr->close_on_cpr) {
> >     +                setenv_fd(fdname, chr->logfd);
> >     +            }
> >     +        }
> >              if (chr->logfd < 0) {
> >                  error_setg_errno(errp, errno,
> >                                   "Unable to open logfile %s",
> >     @@ -300,11 +311,12 @@ static void char_finalize(Object *obj)
> >          if (chr->be) {
> >              chr->be->chr = NULL;
> >          }
> >     -    g_free(chr->filename);
> >     -    g_free(chr->label);
> >          if (chr->logfd != -1) {
> >              close(chr->logfd);
> >     +        unsetenv_fdv("%s_log", chr->label);
> >          }
> >     +    g_free(chr->filename);
> >     +    g_free(chr->label);
> >          qemu_mutex_destroy(&chr->chr_write_lock);
> >      }
> >
> >     @@ -504,6 +516,8 @@ void qemu_chr_parse_common(QemuOpts *opts,
> ChardevCommon *backend)
> >
> >          backend->has_logappend = true;
> >          backend->logappend = qemu_opt_get_bool(opts, "logappend",
> false);
> >     +
> >     +    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr",
> false);
> >
> >
> > If set to true and the backend doesn't implement the CPR feature, it
> should raise an error.
>
> Setting to true is the workaround for missing CPR support, so that cpr may
> still be performed.
> The device will be reopened post exec.  That is not as nice as
> transparently preserving the device,
> but is nicer than disallowing cpr because some device(s) of many do not
> support it.
>

ok, "reopen-on-cpr" would be more descriptive then.


> >      }
> >
> >      static const ChardevClass *char_get_class(const char *driver, Error
> **errp)
> >     @@ -945,6 +959,9 @@ QemuOptsList qemu_chardev_opts = {
> >              },{
> >                  .name = "abstract",
> >                  .type = QEMU_OPT_BOOL,
> >     +        },{
> >     +            .name = "close-on-cpr",
> >     +            .type = QEMU_OPT_BOOL,
> >      #endif
> >              },
> >              { /* end of list */ }
> >     @@ -1212,6 +1229,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev
> *chr, guint ms,
> >          return source;
> >      }
> >
> >     +static int chr_cpr_capable(Object *obj, void *opaque)
> >     +{
> >     +    Chardev *chr = (Chardev *)obj;
> >     +    Error **errp = opaque;
> >     +
> >     +    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) ||
> chr->close_on_cpr) {
> >
> >
> > That'd be easy to misuse. Chardev should always explicitly support CPR
> feature (even if close_on_cpr is set)
>
> Given my explanation at top, does this make sense now?
>

I think I understand the purpose, but it feels quite adventurous to rely on
this behaviour by default, even if the feature flag is set. Could it
require both FEATURE_CPR && reopen-on-cpr?


> - Steve
>
>
> >     +        return 0;
> >     +    }
> >     +    error_setg(errp, "error: chardev %s -> %s is not capable of
> cpr",
> >     +               chr->label, chr->filename);
> >     +    return 1;
> >     +}
> >     +
> >     +bool qemu_chr_cpr_capable(Error **errp)
> >     +{
> >     +    return !object_child_foreach(get_chardevs_root(),
> chr_cpr_capable, errp);
> >     +}
> >     +
> >      void qemu_chr_cleanup(void)
> >      {
> >          object_unparent(get_chardevs_root());
> >     diff --git a/include/chardev/char.h b/include/chardev/char.h
> >     index 7c0444f..e488ad1 100644
> >     --- a/include/chardev/char.h
> >     +++ b/include/chardev/char.h
> >     @@ -50,6 +50,8 @@ typedef enum {
> >          /* Whether the gcontext can be changed after calling
> >           * qemu_chr_be_update_read_handlers() */
> >          QEMU_CHAR_FEATURE_GCONTEXT,
> >     +    /* Whether the device supports cpr */
> >     +    QEMU_CHAR_FEATURE_CPR,
> >
> >          QEMU_CHAR_FEATURE_LAST,
> >      } ChardevFeature;
> >     @@ -67,6 +69,7 @@ struct Chardev {
> >          int be_open;
> >          /* used to coordinate the chardev-change special-case: */
> >          bool handover_yank_instance;
> >     +    bool close_on_cpr;
> >          GSource *gsource;
> >          GMainContext *gcontext;
> >          DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
> >     @@ -291,4 +294,6 @@ void resume_mux_open(void);
> >      /* console.c */
> >      void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend,
> Error **errp);
> >
> >     +bool qemu_chr_cpr_capable(Error **errp);
> >     +
> >      #endif
> >     diff --git a/migration/cpr.c b/migration/cpr.c
> >     index 6333988..feff97f 100644
> >     --- a/migration/cpr.c
> >     +++ b/migration/cpr.c
> >     @@ -138,6 +138,9 @@ void cprexec(strList *args, Error **errp)
> >              error_setg(errp, "cprexec requires cprsave with restart
> mode");
> >              return;
> >          }
> >     +    if (!qemu_chr_cpr_capable(errp)) {
> >     +        return;
> >     +    }
> >          if (vfio_cprsave(errp)) {
> >              return;
> >          }
> >     diff --git a/qapi/char.json b/qapi/char.json
> >     index adf2685..5efaf59 100644
> >     --- a/qapi/char.json
> >     +++ b/qapi/char.json
> >     @@ -204,12 +204,15 @@
> >      # @logfile: The name of a logfile to save output
> >      # @logappend: true to append instead of truncate
> >      #             (default to false to truncate)
> >     +# @close-on-cpr: if true, close device's fd on cprsave. defaults to
> false.
> >     +#                since 6.1.
> >      #
> >      # Since: 2.6
> >      ##
> >      { 'struct': 'ChardevCommon',
> >        'data': { '*logfile': 'str',
> >     -            '*logappend': 'bool' } }
> >     +            '*logappend': 'bool',
> >     +            '*close-on-cpr': 'bool' } }
> >
> >      ##
> >      # @ChardevFile:
> >     diff --git a/qemu-options.hx b/qemu-options.hx
> >     index fa53734..d5ff45f 100644
> >     --- a/qemu-options.hx
> >     +++ b/qemu-options.hx
> >     @@ -3134,43 +3134,57 @@ DEFHEADING(Character device options:)
> >
> >      DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
> >          "-chardev help\n"
> >     -    "-chardev
> null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "-chardev
> null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
> >          "-chardev
> socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off][,reconnect=seconds]\n"
> >          "
>  [,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
> >     -    "
>  [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
> >     +    "
>  [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID][,close-on-cpr=on|off]
> (tcp)\n"
> >          "-chardev
> socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
> >     -    "
>  [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off]
> (unix)\n"
> >     +    "
>  [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off]
> (unix)\n"
> >          "-chardev
> udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
> >          "
>  [,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
> >     -    "         [,logfile=PATH][,logappend=on|off]\n"
> >     +    "
>  [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
> >          "-chardev
> msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
> >          "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >      #ifdef _WIN32
> >          "-chardev
> console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >          "-chardev
> serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >      #else
> >          "-chardev
> pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >      #endif
> >      #ifdef CONFIG_BRLAPI
> >          "-chardev
> braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >      #endif
> >      #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
> >              || defined(__NetBSD__) || defined(__OpenBSD__) ||
> defined(__DragonFly__)
> >          "-chardev
> serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >      #endif
> >      #if defined(__linux__) || defined(__FreeBSD__) ||
> defined(__DragonFly__)
> >          "-chardev
> parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >      #endif
> >      #if defined(CONFIG_SPICE)
> >          "-chardev
> spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >          "-chardev
> spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
> >     +    "         [,close-on-cpr=on|off]\n"
> >      #endif
> >          , QEMU_ARCH_ALL
> >      )
> >     @@ -3245,6 +3259,10 @@ The general form of a character device option
> is:
> >          ``logappend`` option controls whether the log file will be
> truncated
> >          or appended to when opened.
> >
> >     +    Every backend supports the ``close-on-cpr`` option.  If on, the
> >     +    devices's descriptor is closed during cprsave, and reopened
> after exec.
> >     +    This is useful for devices that do not support cpr.
> >     +
> >      The available backends are:
> >
> >      ``-chardev null,id=id``
> >     --
> >     1.8.3.1
> >
> >
> >
> >
> > --
> > Marc-André Lureau
>


-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 19473 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 20/25] chardev: cpr framework
  2021-07-12 19:49       ` Marc-André Lureau
@ 2021-07-13 14:34         ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-13 14:34 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/12/2021 3:49 PM, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jul 12, 2021 at 11:20 PM Steven Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     On 7/8/2021 12:03 PM, Marc-André Lureau wrote:
>     > Hi
>     >
>     > On Wed, Jul 7, 2021 at 9:37 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com> <mailto:steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>> wrote:
>     >
>     >     Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
>     >     Add the chardev close_on_cpr option for devices that can be closed on cpr
>     >     and reopened after exec.
>     >     cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr is set
>     >     for all chardevs in the configuration.
>     >
>     >
>     > Why not do the right thing by default?
> 
>     Char devices with buffering in the qemu process do not support cpr, as there is no general mechanism
>     for saving and restoring the buffer and synchronizing that with device operation.  In theory vmstate
>     could provide that mechanism, but sync'ing the device with vmstate operations would be non-trivial,
>     as every device handles it differently, and I did not tackle it.  However, some very  useful devices
>     do not buffer, and do support cpr, so I introduce QEMU_CHAR_FEATURE_CPR to identify them.  CPR support
>     can be incrementally added to more devices in the future via this mechanism.
> 
>     > Could use some tests in tests/unit/test-char.c
> 
>     OK, I'll check it out.  I have deferred adding unit tests until I get more buy in on the patch series.
> 
> 
> I understand :) Tbh, I have no clue if you are close to acceptance. (too late for 6.1 anyway, you can already update the docs)
> 
> 
>     >     Signed-off-by: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com> <mailto:steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>>
>     >     ---
>     >      chardev/char.c         | 41 ++++++++++++++++++++++++++++++++++++++---
>     >      include/chardev/char.h |  5 +++++
>     >      migration/cpr.c        |  3 +++
>     >      qapi/char.json         |  5 ++++-
>     >      qemu-options.hx        | 26 ++++++++++++++++++++++----
>     >      5 files changed, 72 insertions(+), 8 deletions(-)
>     >
>     >     diff --git a/chardev/char.c b/chardev/char.c
>     >     index d959eec..f10fb94 100644
>     >     --- a/chardev/char.c
>     >     +++ b/chardev/char.c
>     >     @@ -36,6 +36,7 @@
>     >      #include "qemu/help_option.h"
>     >      #include "qemu/module.h"
>     >      #include "qemu/option.h"
>     >     +#include "qemu/env.h"
>     >      #include "qemu/id.h"
>     >      #include "qemu/coroutine.h"
>     >      #include "qemu/yank.h"
>     >     @@ -239,6 +240,9 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
>     >          ChardevClass *cc = CHARDEV_GET_CLASS(chr);
>     >          /* Any ChardevCommon member would work */
>     >          ChardevCommon *common = backend ? backend->u.null.data : NULL;
>     >     +    char fdname[40];
>     >
>     >
>     > Please use g_autoptr char *fdname = NULL; & g_strdup_printf()
> 
>     Will do. 
>     (the glibc functions are new to me, and my fingers do not automatically type them).
> 
>     >     +
>     >     +    chr->close_on_cpr = (common && common->close_on_cpr);
>     >
>     >          if (common && common->has_logfile) {
>     >              int flags = O_WRONLY | O_CREAT;
>     >     @@ -248,7 +252,14 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
>     >              } else {
>     >                  flags |= O_TRUNC;
>     >              }
>     >     -        chr->logfd = qemu_open_old(common->logfile, flags, 0666);
>     >     +        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
>     >     +        chr->logfd = getenv_fd(fdname);
>     >     +        if (chr->logfd < 0) {
>     >     +            chr->logfd = qemu_open_old(common->logfile, flags, 0666);
>     >     +            if (!chr->close_on_cpr) {
>     >     +                setenv_fd(fdname, chr->logfd);
>     >     +            }
>     >     +        }
>     >              if (chr->logfd < 0) {
>     >                  error_setg_errno(errp, errno,
>     >                                   "Unable to open logfile %s",
>     >     @@ -300,11 +311,12 @@ static void char_finalize(Object *obj)
>     >          if (chr->be) {
>     >              chr->be->chr = NULL;
>     >          }
>     >     -    g_free(chr->filename);
>     >     -    g_free(chr->label);
>     >          if (chr->logfd != -1) {
>     >              close(chr->logfd);
>     >     +        unsetenv_fdv("%s_log", chr->label);
>     >          }
>     >     +    g_free(chr->filename);
>     >     +    g_free(chr->label);
>     >          qemu_mutex_destroy(&chr->chr_write_lock);
>     >      }
>     >
>     >     @@ -504,6 +516,8 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend)
>     >
>     >          backend->has_logappend = true;
>     >          backend->logappend = qemu_opt_get_bool(opts, "logappend", false);
>     >     +
>     >     +    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr", false);
>     >
>     >
>     > If set to true and the backend doesn't implement the CPR feature, it should raise an error.
> 
>     Setting to true is the workaround for missing CPR support, so that cpr may still be performed. 
>     The device will be reopened post exec.  That is not as nice as transparently preserving the device,
>     but is nicer than disallowing cpr because some device(s) of many do not support it.
> 
> 
> ok, "reopen-on-cpr" would be more descriptive then.
> 
> 
>     >      }
>     >
>     >      static const ChardevClass *char_get_class(const char *driver, Error **errp)
>     >     @@ -945,6 +959,9 @@ QemuOptsList qemu_chardev_opts = {
>     >              },{
>     >                  .name = "abstract",
>     >                  .type = QEMU_OPT_BOOL,
>     >     +        },{
>     >     +            .name = "close-on-cpr",
>     >     +            .type = QEMU_OPT_BOOL,
>     >      #endif
>     >              },
>     >              { /* end of list */ }
>     >     @@ -1212,6 +1229,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
>     >          return source;
>     >      }
>     >
>     >     +static int chr_cpr_capable(Object *obj, void *opaque)
>     >     +{
>     >     +    Chardev *chr = (Chardev *)obj;
>     >     +    Error **errp = opaque;
>     >     +
>     >     +    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) || chr->close_on_cpr) {
>     >
>     > That'd be easy to misuse. Chardev should always explicitly support CPR feature (even if close_on_cpr is set)
> 
>     Given my explanation at top, does this make sense now? 
> 
> I think I understand the purpose, but it feels quite adventurous to rely on this behaviour by default, even if the feature flag is set. Could it require both FEATURE_CPR && reopen-on-cpr?

I'm not following your point.  Let me elaborate in case my intent is not clear.  I use your 
suggested name reopen-on-cpr, but the functionality is otherwise the same:

  reopen-on-cpr is false by default for all devices, and QEMU_CHAR_FEATURE_CPR is true only for
  devices that operate properly when their descriptor is preserved across exec.  

  cpr is blocked if any device exists that does not support the feature, and is not marked as
  reopen-on-cpr.  The user must explicitly add close-on-cpr=on to the device parameters to allow
  cpr to proceed for such a device.

  For devices that do support QEMU_CHAR_FEATURE_CPR, the user may still set reopen-on-cpr=true
  to ignore the feature and recreate the device in its initial state after exec.

Are you suggesting that the user should opt-in to using QEMU_CHAR_FEATURE_CPR?  Such as via
a more expressive flag?

  -chardev ...[,cpr=on|off|reopen]
      on:     allow cpr iff the device has QEMU_CHAR_FEATURE_CPR
      off:    do not allow cpr
      reopen: allow cpr.  reopen the device after exec.

  chr_cpr_capable(Object *obj, void *opaque)
    if ((qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) && chr->cpr == ON) || 
        chr->cpr == RESTART)
            return 0;		/* cpr is allowed */

- Steve

>     >     +        return 0;
>     >     +    }
>     >     +    error_setg(errp, "error: chardev %s -> %s is not capable of cpr",
>     >     +               chr->label, chr->filename);
>     >     +    return 1;
>     >     +}
>     >     +
>     >     +bool qemu_chr_cpr_capable(Error **errp)
>     >     +{
>     >     +    return !object_child_foreach(get_chardevs_root(), chr_cpr_capable, errp);
>     >     +}
>     >     +
>     >      void qemu_chr_cleanup(void)
>     >      {
>     >          object_unparent(get_chardevs_root());
>     >     diff --git a/include/chardev/char.h b/include/chardev/char.h
>     >     index 7c0444f..e488ad1 100644
>     >     --- a/include/chardev/char.h
>     >     +++ b/include/chardev/char.h
>     >     @@ -50,6 +50,8 @@ typedef enum {
>     >          /* Whether the gcontext can be changed after calling
>     >           * qemu_chr_be_update_read_handlers() */
>     >          QEMU_CHAR_FEATURE_GCONTEXT,
>     >     +    /* Whether the device supports cpr */
>     >     +    QEMU_CHAR_FEATURE_CPR,
>     >
>     >          QEMU_CHAR_FEATURE_LAST,
>     >      } ChardevFeature;
>     >     @@ -67,6 +69,7 @@ struct Chardev {
>     >          int be_open;
>     >          /* used to coordinate the chardev-change special-case: */
>     >          bool handover_yank_instance;
>     >     +    bool close_on_cpr;
>     >          GSource *gsource;
>     >          GMainContext *gcontext;
>     >          DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
>     >     @@ -291,4 +294,6 @@ void resume_mux_open(void);
>     >      /* console.c */
>     >      void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
>     >
>     >     +bool qemu_chr_cpr_capable(Error **errp);
>     >     +
>     >      #endif
>     >     diff --git a/migration/cpr.c b/migration/cpr.c
>     >     index 6333988..feff97f 100644
>     >     --- a/migration/cpr.c
>     >     +++ b/migration/cpr.c
>     >     @@ -138,6 +138,9 @@ void cprexec(strList *args, Error **errp)
>     >              error_setg(errp, "cprexec requires cprsave with restart mode");
>     >              return;
>     >          }
>     >     +    if (!qemu_chr_cpr_capable(errp)) {
>     >     +        return;
>     >     +    }
>     >          if (vfio_cprsave(errp)) {
>     >              return;
>     >          }
>     >     diff --git a/qapi/char.json b/qapi/char.json
>     >     index adf2685..5efaf59 100644
>     >     --- a/qapi/char.json
>     >     +++ b/qapi/char.json
>     >     @@ -204,12 +204,15 @@
>     >      # @logfile: The name of a logfile to save output
>     >      # @logappend: true to append instead of truncate
>     >      #             (default to false to truncate)
>     >     +# @close-on-cpr: if true, close device's fd on cprsave. defaults to false.
>     >     +#                since 6.1.
>     >      #
>     >      # Since: 2.6
>     >      ##
>     >      { 'struct': 'ChardevCommon',
>     >        'data': { '*logfile': 'str',
>     >     -            '*logappend': 'bool' } }
>     >     +            '*logappend': 'bool',
>     >     +            '*close-on-cpr': 'bool' } }
>     >
>     >      ##
>     >      # @ChardevFile:
>     >     diff --git a/qemu-options.hx b/qemu-options.hx
>     >     index fa53734..d5ff45f 100644
>     >     --- a/qemu-options.hx
>     >     +++ b/qemu-options.hx
>     >     @@ -3134,43 +3134,57 @@ DEFHEADING(Character device options:)
>     >
>     >      DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
>     >          "-chardev help\n"
>     >     -    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
>     >          "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off][,reconnect=seconds]\n"
>     >          "         [,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
>     >     -    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
>     >     +    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID][,close-on-cpr=on|off] (tcp)\n"
>     >          "-chardev socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
>     >     -    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off] (unix)\n"
>     >     +    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off] (unix)\n"
>     >          "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
>     >          "         [,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
>     >     -    "         [,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
>     >          "-chardev msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
>     >          "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >      #ifdef _WIN32
>     >          "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >          "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >      #else
>     >          "-chardev pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >      #endif
>     >      #ifdef CONFIG_BRLAPI
>     >          "-chardev braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >      #endif
>     >      #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
>     >              || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__DragonFly__)
>     >          "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >      #endif
>     >      #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
>     >          "-chardev parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >      #endif
>     >      #if defined(CONFIG_SPICE)
>     >          "-chardev spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >          "-chardev spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
>     >     +    "         [,close-on-cpr=on|off]\n"
>     >      #endif
>     >          , QEMU_ARCH_ALL
>     >      )
>     >     @@ -3245,6 +3259,10 @@ The general form of a character device option is:
>     >          ``logappend`` option controls whether the log file will be truncated
>     >          or appended to when opened.
>     >
>     >     +    Every backend supports the ``close-on-cpr`` option.  If on, the
>     >     +    devices's descriptor is closed during cprsave, and reopened after exec.
>     >     +    This is useful for devices that do not support cpr.
>     >     +
>     >      The available backends are:
>     >
>     >      ``-chardev null,id=id``
>     >     --
>     >     1.8.3.1
>     >
>     >
>     >
>     >
>     > --
>     > Marc-André Lureau
> 
> 
> 
> -- 
> Marc-André Lureau


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 10/25] util: env var helpers
  2021-07-12 19:36       ` Marc-André Lureau
@ 2021-07-13 16:15         ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-13 16:15 UTC (permalink / raw)
  To: Marc-André Lureau, Dr. David Alan Gilbert
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin, QEMU,
	Markus Armbruster, Alex Williamson, Stefan Hajnoczi,
	Paolo Bonzini, Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée

On 7/12/2021 3:36 PM, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jul 12, 2021 at 11:19 PM Steven Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> wrote:
> 
>     On 7/8/2021 11:10 AM, Marc-André Lureau wrote:
>     > Hi
>     >
>     > On Wed, Jul 7, 2021 at 9:30 PM Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com> <mailto:steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>>> wrote:
>     >
>     >     Add functions for saving fd's and other values in the environment via
>     >     setenv, and for reading them back via getenv.
>     >
>     >
>     > I understand that the rest of the series will rely on environment variables to associate and recover the child-passed FDs, but I am not really convinced that it is a good idea.
>     >
>     > Environment variables have a number of issues that we may encounter down the road: namespace, limits, concurrency, observability etc.. I wonder if the VMState couldn't have a section about the FD to recover. Or maybe just another shared memory region?
> 
>     They also have some advantages.  Their post-exec value can be observed via /proc/$pid/environ,
>     and modified values can be observed by calling printenv() in a debugger.  They are naturally carried
>     across exec, with no external file to create and potentially lose.  Lastly, libcs already defines
>     put and get methods, so the additional layered code is small and simple.  The number of variables
>     is small, and I would rather not over-engineer an alternate solution until the env proves
>     inadequate.  The limits on env size are huge on Linux.  The limits are smaller on Windows, but
>     that is just one of multiple issues to be addressed to support live update on windows.
> 
>     For the alternatives, shared memory is no more observable (maybe less) and also has no concurrency
>     protection.  VMstate does not help because the descriptors are needed before the vmstate file
>     is opened.
>  
> Why does it need to be "observable" from outside the process?
> 
> I meant memory to be shared between the qemu instances (without concurrency etc).
> 
> You would only need that memory fd to be passed as argument to the next qemu instance, to restore the rest of the contexts/fds I suppose.
> 
> I think we need to do this right, as it may have consequences for future updates. It's effectively a kind of protocol. We have better chances to handle different versions correctly by reusing VMState imho.

OK, I yield.  David also does not like using env vars here. I'll define accessors that manipulate a QLIST of struct {int fd, char *name}, create a vmstate struct to describe it using VMSTATE_QLIST_V, 
and serialize to a memfd.  

Sound OK?  

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 16/25] vfio-pci: cpr part 1
  2021-07-07 17:20 ` [PATCH V5 16/25] vfio-pci: cpr part 1 Steve Sistare
@ 2021-07-16 17:45   ` Alex Williamson
  2021-07-19 17:43     ` Steven Sistare
  2021-07-28  4:56   ` Zheng Chuan
  1 sibling, 1 reply; 74+ messages in thread
From: Alex Williamson @ 2021-07-16 17:45 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel, Eric Blake,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed,  7 Jul 2021 10:20:25 -0700
Steve Sistare <steven.sistare@oracle.com> wrote:
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9220e64..40c882f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -31,6 +31,7 @@
>  #include "exec/memory.h"
>  #include "exec/ram_addr.h"
>  #include "hw/hw.h"
> +#include "qemu/env.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/range.h"
> @@ -440,6 +441,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
>          return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>      }
>  
> +    if (container->reused) {
> +        return 0;
> +    }
> +
>      while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
>          /*
>           * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
> @@ -463,6 +468,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
>          return -errno;
>      }
>  
> +    if (unmap.size != size) {
> +        warn_report("VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
> +                     iova, size, unmap.size);
> +    }
> +

I'm a tad nervous that we have paths that can trigger this, the ioctl
certainly supports that we can call it across multiple mappings and the
size returned is the sum of the previously mapped ranges that were
unmapped.  See for instance vfio_listener_region_del()'s use of this
function.


>      return 0;
>  }
>  
> @@ -477,6 +487,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>          .size = size,
>      };
>  
> +    if (container->reused) {
> +        return 0;
> +    }
> +
>      if (!readonly) {
>          map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
>      }
> @@ -1603,6 +1617,10 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
>      if (iommu_type < 0) {
>          return iommu_type;
>      }
> +    if (container->reused) {
> +        container->iommu_type = iommu_type;
> +        return 0;
> +    }

How would this handle the case where SPAPR_TCE_v2 falls back to
SPAPR_TCE (v1)?


>  
>      ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
>      if (ret) {
...
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 9fc12bc..0f5c542 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3264,6 +3272,61 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static void vfio_merge_config(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *pdev = &vdev->pdev;
> +    int size = MIN(pci_config_size(pdev), vdev->config_size);
> +    uint8_t *phys_config = g_malloc(size);
> +    uint32_t mask;
> +    int ret, i;
> +
> +    ret = pread(vdev->vbasedev.fd, phys_config, size, vdev->config_offset);
> +    if (ret < size) {
> +        ret = ret < 0 ? errno : EFAULT;

Leaks phys_config

> +        error_report("failed to read device config space: %s", strerror(ret));
> +        return;
> +    }
> +
> +    for (i = 0; i < size; i++) {
> +        mask = vdev->emulated_config_bits[i];
> +        pdev->config[i] = (pdev->config[i] & mask) | (phys_config[i] & ~mask);
> +    }
> +
> +    g_free(phys_config);
> +}
> +
> +static int vfio_pci_post_load(void *opaque, int version_id)
> +{
> +    VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *pdev = &vdev->pdev;
> +    bool enabled;
> +
> +    vfio_merge_config(vdev);
> +
> +    pdev->reused = false;
> +    enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
> +    memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);

This seems generic to any PCI device, I'm surprised we need to do it
explicitly.  Thanks,

Alex

> +
> +    return 0;
> +}
> +



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-07 17:20 ` [PATCH V5 17/25] vfio-pci: cpr part 2 Steve Sistare
@ 2021-07-16 20:51   ` Alex Williamson
  2021-07-19 17:44     ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Alex Williamson @ 2021-07-16 20:51 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel, Eric Blake,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed,  7 Jul 2021 10:20:26 -0700
Steve Sistare <steven.sistare@oracle.com> wrote:

> Finish cpr for vfio-pci by preserving eventfd's and vector state.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 0f5c542..07bd360 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
...
> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice
*vdev)
>      g_free(phys_config);
>  }
>  
> +static int vfio_pci_pre_save(void *opaque)
> +{
> +    VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *pdev = &vdev->pdev;
> +    int i;
> +
> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
> +        error_report("%s: cpr does not support vfio-pci INTX",
> +                     vdev->vbasedev.name);
> +    }

You're not only not supporting INTx, but devices that support INTx, so
this only works on VFs.  Why?  Is this just out of scope or is there
something fundamentally difficult about it?

This makes me suspect there's a gap in INTx routing setup if it's more
than just another eventfd to store and setup.  If we hot-add a device
using INTx after cpr restart, are we going to find problems?  Thanks,

Alex



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 16/25] vfio-pci: cpr part 1
  2021-07-16 17:45   ` Alex Williamson
@ 2021-07-19 17:43     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-19 17:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel, Eric Blake,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On 7/16/2021 1:45 PM, Alex Williamson wrote:
> On Wed,  7 Jul 2021 10:20:25 -0700
> Steve Sistare <steven.sistare@oracle.com> wrote:
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 9220e64..40c882f 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -31,6 +31,7 @@
>>  #include "exec/memory.h"
>>  #include "exec/ram_addr.h"
>>  #include "hw/hw.h"
>> +#include "qemu/env.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/main-loop.h"
>>  #include "qemu/range.h"
>> @@ -440,6 +441,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
>>          return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>>      }
>>  
>> +    if (container->reused) {
>> +        return 0;
>> +    }
>> +
>>      while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
>>          /*
>>           * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
>> @@ -463,6 +468,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
>>          return -errno;
>>      }
>>  
>> +    if (unmap.size != size) {
>> +        warn_report("VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
>> +                     iova, size, unmap.size);
>> +    }
>> +
> 
> I'm a tad nervous that we have paths that can trigger this, the ioctl
> certainly supports that we can call it across multiple mappings and the
> size returned is the sum of the previously mapped ranges that were
> unmapped.  See for instance vfio_listener_region_del()'s use of this
> function.

OK, I'll remove the warning.

>>      return 0;
>>  }
>>  
>> @@ -477,6 +487,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>>          .size = size,
>>      };
>>  
>> +    if (container->reused) {
>> +        return 0;
>> +    }
>> +
>>      if (!readonly) {
>>          map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
>>      }
>> @@ -1603,6 +1617,10 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
>>      if (iommu_type < 0) {
>>          return iommu_type;
>>      }
>> +    if (container->reused) {
>> +        container->iommu_type = iommu_type;
>> +        return 0;
>> +    }
> 
> How would this handle the case where SPAPR_TCE_v2 falls back to
> SPAPR_TCE (v1)?

I am assuming that if SPAPR supports live update, it will be supported by the V2
interface and not by V1.  That works well here because reused will always be false
for V1.

If we cannot make that assumption, then this needs work.  Regardless, the qemu SPAPR
code will need additional changes to support live update.

>>      ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
>>      if (ret) {
> ...
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 9fc12bc..0f5c542 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3264,6 +3272,61 @@ static Property vfio_pci_dev_properties[] = {
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>>  
>> +static void vfio_merge_config(VFIOPCIDevice *vdev)
>> +{
>> +    PCIDevice *pdev = &vdev->pdev;
>> +    int size = MIN(pci_config_size(pdev), vdev->config_size);
>> +    uint8_t *phys_config = g_malloc(size);
>> +    uint32_t mask;
>> +    int ret, i;
>> +
>> +    ret = pread(vdev->vbasedev.fd, phys_config, size, vdev->config_offset);
>> +    if (ret < size) {
>> +        ret = ret < 0 ? errno : EFAULT;
> 
> Leaks phys_config

Will fix, thanks.

>> +        error_report("failed to read device config space: %s", strerror(ret));
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < size; i++) {
>> +        mask = vdev->emulated_config_bits[i];
>> +        pdev->config[i] = (pdev->config[i] & mask) | (phys_config[i] & ~mask);
>> +    }
>> +
>> +    g_free(phys_config);
>> +}
>> +
>> +static int vfio_pci_post_load(void *opaque, int version_id)
>> +{
>> +    VFIOPCIDevice *vdev = opaque;
>> +    PCIDevice *pdev = &vdev->pdev;
>> +    bool enabled;
>> +
>> +    vfio_merge_config(vdev);
>> +
>> +    pdev->reused = false;
>> +    enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
>> +    memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
> 
> This seems generic to any PCI device, I'm surprised we need to do it
> explicitly.  Thanks,

This is a remnant from before I added VMSTATE_PCI_DEVICE to vfio_pci_vmstate.
I will delete it, thanks.

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-16 20:51   ` Alex Williamson
@ 2021-07-19 17:44     ` Steven Sistare
  2021-07-19 18:10       ` Alex Williamson
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Sistare @ 2021-07-19 17:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel, Eric Blake,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On 7/16/2021 4:51 PM, Alex Williamson wrote:
> On Wed,  7 Jul 2021 10:20:26 -0700
> Steve Sistare <steven.sistare@oracle.com> wrote:
> 
>> Finish cpr for vfio-pci by preserving eventfd's and vector state.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 116 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 0f5c542..07bd360 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
> ...
>> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice
> *vdev)
>>      g_free(phys_config);
>>  }
>>  
>> +static int vfio_pci_pre_save(void *opaque)
>> +{
>> +    VFIOPCIDevice *vdev = opaque;
>> +    PCIDevice *pdev = &vdev->pdev;
>> +    int i;
>> +
>> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
>> +        error_report("%s: cpr does not support vfio-pci INTX",
>> +                     vdev->vbasedev.name);
>> +    }
> 
> You're not only not supporting INTx, but devices that support INTx, so
> this only works on VFs.  Why?  Is this just out of scope or is there
> something fundamentally difficult about it?
> 
> This makes me suspect there's a gap in INTx routing setup if it's more
> than just another eventfd to store and setup.  If we hot-add a device
> using INTx after cpr restart, are we going to find problems?  Thanks,

It could be supported, but requires more code (several event fd's plus other state in VFIOINTx
to save and restore) for a case that does not seem very useful (a directly assigned device that
only supports INTx ?). 

Hot add of such a device after cpr restart is allowed and works.  The next cpr restart operation
would fail with an error message without harming the guest.  However, I should add a check
to prevent the device from being added if only-cpr-capable is specified, in device_set_realized,
like check_only_migratable.

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-19 17:44     ` Steven Sistare
@ 2021-07-19 18:10       ` Alex Williamson
  2021-07-19 18:38         ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Alex Williamson @ 2021-07-19 18:10 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel, Eric Blake,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Mon, 19 Jul 2021 13:44:08 -0400
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 7/16/2021 4:51 PM, Alex Williamson wrote:
> > On Wed,  7 Jul 2021 10:20:26 -0700
> > Steve Sistare <steven.sistare@oracle.com> wrote:
> >   
> >> Finish cpr for vfio-pci by preserving eventfd's and vector state.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 116 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 0f5c542..07bd360 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c  
> > ...  
> >> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice  
> > *vdev)  
> >>      g_free(phys_config);
> >>  }
> >>  
> >> +static int vfio_pci_pre_save(void *opaque)
> >> +{
> >> +    VFIOPCIDevice *vdev = opaque;
> >> +    PCIDevice *pdev = &vdev->pdev;
> >> +    int i;
> >> +
> >> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
> >> +        error_report("%s: cpr does not support vfio-pci INTX",
> >> +                     vdev->vbasedev.name);
> >> +    }  
> > 
> > You're not only not supporting INTx, but devices that support INTx, so
> > this only works on VFs.  Why?  Is this just out of scope or is there
> > something fundamentally difficult about it?
> > 
> > This makes me suspect there's a gap in INTx routing setup if it's more
> > than just another eventfd to store and setup.  If we hot-add a device
> > using INTx after cpr restart, are we going to find problems?  Thanks,  
> 
> It could be supported, but requires more code (several event fd's plus other state in VFIOINTx
> to save and restore) for a case that does not seem very useful (a directly assigned device that
> only supports INTx ?). 

It's not testing that the device *only* supports INTx, it's testing
that the device supports INTx _at_all_.  That effectively means this
excludes anything other than an SR-IOV VF.  There are plenty of valid
and useful cases of assigning PFs, most of which support INTx even if
we don't expect that's their primary operational mode.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-19 18:10       ` Alex Williamson
@ 2021-07-19 18:38         ` Steven Sistare
  2021-07-28  4:56           ` Zheng Chuan
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Sistare @ 2021-07-19 18:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel, Eric Blake,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Marc-André Lureau,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On 7/19/2021 2:10 PM, Alex Williamson wrote:
> On Mon, 19 Jul 2021 13:44:08 -0400
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 7/16/2021 4:51 PM, Alex Williamson wrote:
>>> On Wed,  7 Jul 2021 10:20:26 -0700
>>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>>   
>>>> Finish cpr for vfio-pci by preserving eventfd's and vector state.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 116 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index 0f5c542..07bd360 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c  
>>> ...  
>>>> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice  
>>> *vdev)  
>>>>      g_free(phys_config);
>>>>  }
>>>>  
>>>> +static int vfio_pci_pre_save(void *opaque)
>>>> +{
>>>> +    VFIOPCIDevice *vdev = opaque;
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +    int i;
>>>> +
>>>> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
>>>> +        error_report("%s: cpr does not support vfio-pci INTX",
>>>> +                     vdev->vbasedev.name);
>>>> +    }  
>>>
>>> You're not only not supporting INTx, but devices that support INTx, so
>>> this only works on VFs.  Why?  Is this just out of scope or is there
>>> something fundamentally difficult about it?
>>>
>>> This makes me suspect there's a gap in INTx routing setup if it's more
>>> than just another eventfd to store and setup.  If we hot-add a device
>>> using INTx after cpr restart, are we going to find problems?  Thanks,  
>>
>> It could be supported, but requires more code (several event fd's plus other state in VFIOINTx
>> to save and restore) for a case that does not seem very useful (a directly assigned device that
>> only supports INTx ?). 
> 
> It's not testing that the device *only* supports INTx, it's testing
> that the device supports INTx _at_all_.  That effectively means this
> excludes anything other than an SR-IOV VF.  There are plenty of valid
> and useful cases of assigning PFs, most of which support INTx even if
> we don't expect that's their primary operational mode.  Thanks,

OK, I'll look into it.  If this proves problematic, how do you feel about deferring
INTx support to a later patch?

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 04/25] cpr: HMP interfaces for reboot
  2021-07-07 17:20 ` [PATCH V5 04/25] cpr: HMP " Steve Sistare
@ 2021-07-28  4:55   ` Zheng Chuan
  0 siblings, 0 replies; 74+ messages in thread
From: Zheng Chuan @ 2021-07-28  4:55 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Markus Armbruster, Alex Williamson,
	Paolo Bonzini, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée

Hi

On 2021/7/8 1:20, Steve Sistare wrote:
> cprsave <file> <mode>
>   Call cprsave().
>   Arguments:
>     file : save vmstate to this file name
>     mode: must be "reboot"
> 
> cprload <file>
>   Call cprload().
>   Arguments:
>     file : load vmstate from this file name
> 
> cprinfo
>   Print to stdout a space-delimited list of modes supported by cprsave.
>   Arguments: none
> 
> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  hmp-commands.hx       | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  include/monitor/hmp.h |  3 +++
>  monitor/hmp-cmds.c    | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 95 insertions(+)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 8e45bce..11827ae 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -351,6 +351,50 @@ SRST
>  ERST
>  
>      {
> +        .name       = "cprinfo",
> +        .args_type  = "",
> +        .params     = "",
> +        .help       = "return list of modes supported by cprsave",
> +        .cmd        = hmp_cprinfo,
> +    },
> +
> +SRST
> +``cprinfo``
> +Return a space-delimited list of modes supported by cprsave.
> +ERST
> +
> +    {
> +        .name       = "cprsave",
> +        .args_type  = "file:s,mode:s",
> +        .params     = "file 'reboot'",
> +        .help       = "create a checkpoint of the VM in file",
> +        .cmd        = hmp_cprsave,
> +    },
> +
> +SRST
> +``cprsave`` *file* *mode*
> +Pause the VCPUs,
> +create a checkpoint of the whole virtual machine, and save it in *file*.
> +If *mode* is 'reboot', the checkpoint remains valid after a host kexec
> +reboot, and guest ram must be backed by persistant shared memory.  To

Should be persistent.

> +resume from the checkpoint, issue the quit command, reboot the system,
> +and issue the cprload command.
> +ERST
> +
> +    {
> +        .name       = "cprload",
> +        .args_type  = "file:s",
> +        .params     = "file",
> +        .help       = "load VM checkpoint from file",
> +        .cmd        = hmp_cprload,
> +    },
> +
> +SRST
> +``cprload`` *file*
> +Load a virtual machine from checkpoint file *file* and continue VCPUs.
> +ERST
> +
> +    {
>          .name       = "delvm",
>          .args_type  = "name:s",
>          .params     = "tag",
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 3baa105..98bb775 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -58,6 +58,9 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
>  void hmp_loadvm(Monitor *mon, const QDict *qdict);
>  void hmp_savevm(Monitor *mon, const QDict *qdict);
>  void hmp_delvm(Monitor *mon, const QDict *qdict);
> +void hmp_cprinfo(Monitor *mon, const QDict *qdict);
> +void hmp_cprsave(Monitor *mon, const QDict *qdict);
> +void hmp_cprload(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 0942027..8e80581 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -33,6 +33,7 @@
>  #include "qapi/qapi-commands-block.h"
>  #include "qapi/qapi-commands-char.h"
>  #include "qapi/qapi-commands-control.h"
> +#include "qapi/qapi-commands-cpr.h"
>  #include "qapi/qapi-commands-machine.h"
>  #include "qapi/qapi-commands-migration.h"
>  #include "qapi/qapi-commands-misc.h"
> @@ -1177,6 +1178,53 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
>      qapi_free_AnnounceParameters(params);
>  }
>  
> +void hmp_cprinfo(Monitor *mon, const QDict *qdict)
> +{
> +    Error *err = NULL;
> +    CprInfo *cprinfo;
> +    CprModeList *mode;
> +
> +    cprinfo = qmp_cprinfo(&err);
> +    if (err) {
> +        goto out;
> +    }
> +
> +    for (mode = cprinfo->modes; mode; mode = mode->next) {
> +        monitor_printf(mon, "%s ", CprMode_str(mode->value));
> +    }
> +
> +out:
> +    hmp_handle_error(mon, err);
> +    qapi_free_CprInfo(cprinfo);
> +}
> +
> +void hmp_cprsave(Monitor *mon, const QDict *qdict)
> +{
> +    Error *err = NULL;
> +    const char *mode;
> +    int val;
> +
> +    mode = qdict_get_try_str(qdict, "mode");
> +    val = qapi_enum_parse(&CprMode_lookup, mode, -1, &err);
> +
> +    if (val == -1) {
> +        goto out;
> +    }
> +
> +    qmp_cprsave(qdict_get_try_str(qdict, "file"), val, &err);
> +
> +out:
> +    hmp_handle_error(mon, err);
> +}
> +
> +void hmp_cprload(Monitor *mon, const QDict *qdict)
> +{
> +    Error *err = NULL;
> +
> +    qmp_cprload(qdict_get_try_str(qdict, "file"), &err);
> +    hmp_handle_error(mon, err);
> +}
> +
>  void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
>  {
>      qmp_migrate_cancel(NULL);
> 

-- 
Regards.
Chuan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 13/25] cpr: HMP interfaces for restart
  2021-07-07 17:20 ` [PATCH V5 13/25] cpr: HMP " Steve Sistare
@ 2021-07-28  4:56   ` Zheng Chuan
  0 siblings, 0 replies; 74+ messages in thread
From: Zheng Chuan @ 2021-07-28  4:56 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Markus Armbruster, Alex Williamson,
	Paolo Bonzini, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée

Hi

On 2021/7/8 1:20, Steve Sistare wrote:
> cprsave <file> <mode>
>   mode may be "restart"
> 
> cprexec <command>
>   Call cprexec().
>   Arguments:
>     command : command line to execute, with space-separated arguments
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  hmp-commands.hx       | 20 +++++++++++++++++++-
>  include/monitor/hmp.h |  1 +
>  monitor/hmp-cmds.c    | 11 +++++++++++
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 11827ae..d956405 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -366,7 +366,7 @@ ERST
>      {
>          .name       = "cprsave",
>          .args_type  = "file:s,mode:s",
> -        .params     = "file 'reboot'",
> +        .params     = "file 'restart'|'reboot'",
>          .help       = "create a checkpoint of the VM in file",
>          .cmd        = hmp_cprsave,
>      },
> @@ -379,6 +379,24 @@ If *mode* is 'reboot', the checkpoint remains valid after a host kexec
>  reboot, and guest ram must be backed by persistant shared memory.  To
Same, Should be persistent.
>  resume from the checkpoint, issue the quit command, reboot the system,
>  and issue the cprload command.
> +
> +If *mode* is 'restart', the checkpoint remains valid after restarting qemu,
> +and guest ram must be allocated with the memfd-alloc machine option.  To
> +resume from the checkpoint, issue the cprexec command to restart, and issue
> +the cprload command.
> +ERST
> +
> +    {
> +        .name       = "cprexec",
> +        .args_type  = "command:S",
> +        .params     = "command",
> +        .help       = "Restart qemu by directly exec'ing command",
> +        .cmd        = hmp_cprexec,
> +    },
> +
> +SRST
> +``cprexec`` *command*
> +Restart qemu by directly exec'ing *command*, replacing the qemu process.
>  ERST
>  
>      {
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 98bb775..ffc5eb1 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -60,6 +60,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict);
>  void hmp_delvm(Monitor *mon, const QDict *qdict);
>  void hmp_cprinfo(Monitor *mon, const QDict *qdict);
>  void hmp_cprsave(Monitor *mon, const QDict *qdict);
> +void hmp_cprexec(Monitor *mon, const QDict *qdict);
>  void hmp_cprload(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index a56f83c..163564e 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1217,6 +1217,17 @@ out:
>      hmp_handle_error(mon, err);
>  }
>  
> +void hmp_cprexec(Monitor *mon, const QDict *qdict)
> +{
> +    Error *err = NULL;
> +    const char *command = qdict_get_try_str(qdict, "command");
> +    strList *args = strList_from_string(command, ' ');
> +
> +    qmp_cprexec(args, &err);
> +    qapi_free_strList(args);
> +    hmp_handle_error(mon, err);
> +}
> +
>  void hmp_cprload(Monitor *mon, const QDict *qdict)
>  {
>      Error *err = NULL;
> 

-- 
Regards.
Chuan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 16/25] vfio-pci: cpr part 1
  2021-07-07 17:20 ` [PATCH V5 16/25] vfio-pci: cpr part 1 Steve Sistare
  2021-07-16 17:45   ` Alex Williamson
@ 2021-07-28  4:56   ` Zheng Chuan
  2021-07-30 12:50     ` Steven Sistare
  1 sibling, 1 reply; 74+ messages in thread
From: Zheng Chuan @ 2021-07-28  4:56 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Markus Armbruster, Alex Williamson,
	Paolo Bonzini, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée

Hi

On 2021/7/8 1:20, Steve Sistare wrote:
> Enable vfio-pci devices to be saved and restored across an exec restart
> of qemu.
> 
> At vfio creation time, save the value of vfio container, group, and device
> descriptors in the environment.
> 
> In cprsave and cprexec, suspend the use of virtual addresses in DMA
> mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped
> at a different VA after exec.  DMA to already-mapped pages continues.  Save
> the msi message area as part of vfio-pci vmstate, save the interrupt and
> notifier eventfd's in the environment, and clear the close-on-exec flag
> for the vfio descriptors.  The flag is not cleared earlier because the
> descriptors should not persist across miscellaneous fork and exec calls
> that may be performed during normal operation.
> 
> On qemu restart, vfio_realize() finds the descriptor env vars, uses
> the descriptors, and notes that the device is being reused.  Device and
> iommu state is already configured, so operations in vfio_realize that
> would modify the configuration are skipped for a reused device, including
> vfio ioctl's and writes to PCI configuration space.  The result is that
> vfio_realize constructs qemu data structures that reflect the current
> state of the device.  However, the reconstruction is not complete until
> cprload is called. cprload loads the msi data and finds eventfds in the
> environment.  It rebuilds vector data structures and attaches the
> interrupts to the new KVM instance.  cprload then walks the flattened
> ranges of the vfio_address_spaces and calls VFIO_DMA_MAP_FLAG_VADDR to
> inform the kernel of the new VA's.  Lastly, it starts the VM and suppresses
> vfio device reset.
> 
> This functionality is delivered by 2 patches for clarity.  Part 2 adds
> eventfd and vector support.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  MAINTAINERS                   |   1 +
>  hw/pci/pci.c                  |   4 ++
>  hw/vfio/common.c              |  69 +++++++++++++++++--
>  hw/vfio/cpr.c                 | 154 ++++++++++++++++++++++++++++++++++++++++++
>  hw/vfio/meson.build           |   1 +
>  hw/vfio/pci.c                 |  66 +++++++++++++++++-
>  hw/vfio/trace-events          |   1 +
>  include/hw/pci/pci.h          |   1 +
>  include/hw/vfio/vfio-common.h |   5 ++
>  include/migration/cpr.h       |   3 +
>  linux-headers/linux/vfio.h    |   6 ++
>  migration/cpr.c               |  20 ++++++
>  12 files changed, 323 insertions(+), 8 deletions(-)
>  create mode 100644 hw/vfio/cpr.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8647a97..58479db 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2862,6 +2862,7 @@ CPR
>  M: Steve Sistare <steven.sistare@oracle.com>
>  M: Mark Kanda <mark.kanda@oracle.com>
>  S: Maintained
> +F: hw/vfio/cpr.c
>  F: include/migration/cpr.h
>  F: migration/cpr.c
>  F: qapi/cpr.json
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 2590898..fa4a439 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -307,6 +307,10 @@ static void pci_do_device_reset(PCIDevice *dev)
>  {
>      int r;
>  
> +    if (dev->reused) {
> +        return;
> +    }
> +
>      pci_device_deassert_intx(dev);
>      assert(dev->irq_state == 0);
>  
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9220e64..40c882f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -31,6 +31,7 @@
>  #include "exec/memory.h"
>  #include "exec/ram_addr.h"
>  #include "hw/hw.h"
> +#include "qemu/env.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/range.h"
> @@ -440,6 +441,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
>          return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>      }
>  
> +    if (container->reused) {
> +        return 0;
> +    }
> +
>      while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
>          /*
>           * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
> @@ -463,6 +468,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
>          return -errno;
>      }
>  
> +    if (unmap.size != size) {
> +        warn_report("VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
> +                     iova, size, unmap.size);
> +    }
> +
>      return 0;
>  }
>  
> @@ -477,6 +487,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>          .size = size,
>      };
>  
> +    if (container->reused) {
> +        return 0;
> +    }
> +
>      if (!readonly) {
>          map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
>      }
> @@ -1603,6 +1617,10 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
>      if (iommu_type < 0) {
>          return iommu_type;
>      }
> +    if (container->reused) {
> +        container->iommu_type = iommu_type;
> +        return 0;
> +    }
>  
>      ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
>      if (ret) {
> @@ -1703,6 +1721,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>  {
>      VFIOContainer *container;
>      int ret, fd;
> +    bool reused;
> +    char name[40];
>      VFIOAddressSpace *space;
>  
>      space = vfio_get_address_space(as);
> @@ -1739,16 +1759,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          return ret;
>      }
>  
> +    snprintf(name, sizeof(name), "vfio_container_for_group_%d", group->groupid);
> +    fd = getenv_fd(name);
> +    reused = (fd >= 0);
> +
>      QLIST_FOREACH(container, &space->containers, next) {
> -        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> -            group->container = container;
> -            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +        if (container->fd == fd ||
> +            !ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> +            break;
> +        }
> +    }
> +
> +    if (container) {
> +        group->container = container;
> +        QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +        if (!reused) {
>              vfio_kvm_device_add_group(group);
> -            return 0;
> +            setenv_fd(name, container->fd);
>          }
> +        return 0;
> +    }
> +
> +    if (!reused) {
> +        fd = qemu_open_old("/dev/vfio/vfio", O_RDWR);
>      }
>  
> -    fd = qemu_open_old("/dev/vfio/vfio", O_RDWR);
>      if (fd < 0) {
>          error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio");
>          ret = -errno;
> @@ -1766,6 +1801,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      container = g_malloc0(sizeof(*container));
>      container->space = space;
>      container->fd = fd;
> +    container->reused = reused;
>      container->error = NULL;
>      container->dirty_pages_supported = false;
>      QLIST_INIT(&container->giommu_list);
> @@ -1893,6 +1929,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      }
>  
>      container->initialized = true;
> +    setenv_fd(name, fd);
>  
>      return 0;
>  listener_release_exit:
> @@ -1920,6 +1957,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>  
>      QLIST_REMOVE(group, container_next);
>      group->container = NULL;
> +    unsetenv_fdv("vfio_container_for_group_%d", group->groupid);
>  
>      /*
>       * Explicitly release the listener first before unset container,
> @@ -1978,7 +2016,12 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
>      group = g_malloc0(sizeof(*group));
>  
>      snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
> -    group->fd = qemu_open_old(path, O_RDWR);
> +
> +    group->fd = getenv_fd(path);
> +    if (group->fd < 0) {
> +        group->fd = qemu_open_old(path, O_RDWR);
> +    }
> +
>      if (group->fd < 0) {
>          error_setg_errno(errp, errno, "failed to open %s", path);
>          goto free_group_exit;
> @@ -2012,6 +2055,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
>  
>      QLIST_INSERT_HEAD(&vfio_group_list, group, next);
>  
> +    setenv_fd(path, group->fd);
> +
>      return group;
>  
>  close_fd_exit:
> @@ -2036,6 +2081,7 @@ void vfio_put_group(VFIOGroup *group)
>      vfio_disconnect_container(group);
>      QLIST_REMOVE(group, next);
>      trace_vfio_put_group(group->fd);
> +    unsetenv_fdv("/dev/vfio/%d", group->groupid);
>      close(group->fd);
>      g_free(group);
>  
> @@ -2049,8 +2095,14 @@ int vfio_get_device(VFIOGroup *group, const char *name,
>  {
>      struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>      int ret, fd;
> +    bool reused;
> +
> +    fd = getenv_fd(name);
> +    reused = (fd >= 0);
> +    if (!reused) {
> +        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> +    }
>  
> -    fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>      if (fd < 0) {
>          error_setg_errno(errp, errno, "error getting device from group %d",
>                           group->groupid);
> @@ -2095,6 +2147,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
>      vbasedev->num_irqs = dev_info.num_irqs;
>      vbasedev->num_regions = dev_info.num_regions;
>      vbasedev->flags = dev_info.flags;
> +    vbasedev->reused = reused;
> +    setenv_fd(name, fd);
>  
>      trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
>                            dev_info.num_irqs);
> @@ -2111,6 +2165,7 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>      QLIST_REMOVE(vbasedev, next);
>      vbasedev->group = NULL;
>      trace_vfio_put_base_device(vbasedev->fd);
> +    unsetenv_fd(vbasedev->name);
>      close(vbasedev->fd);
>  }
>  
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> new file mode 100644
> index 0000000..28f8a76
> --- /dev/null
> +++ b/hw/vfio/cpr.c
> @@ -0,0 +1,154 @@
> +/*
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include <sys/ioctl.h>
> +#include <linux/vfio.h>
> +#include "hw/vfio/vfio-common.h"
> +#include "sysemu/kvm.h"
> +#include "qapi/error.h"
> +#include "trace.h"
> +
> +static int
> +vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> +{
> +    struct vfio_iommu_type1_dma_unmap unmap = {
> +        .argsz = sizeof(unmap),
> +        .flags = VFIO_DMA_UNMAP_FLAG_VADDR | VFIO_DMA_UNMAP_FLAG_ALL,
> +        .iova = 0,
> +        .size = 0,
> +    };
> +    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> +        error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +static int vfio_dma_map_vaddr(VFIOContainer *container, hwaddr iova,
> +                              ram_addr_t size, void *vaddr,
> +                              Error **errp)
> +{
> +    struct vfio_iommu_type1_dma_map map = {
> +        .argsz = sizeof(map),
> +        .flags = VFIO_DMA_MAP_FLAG_VADDR,
> +        .vaddr = (__u64)(uintptr_t)vaddr,
> +        .iova = iova,
> +        .size = size,
> +    };
> +    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
> +        error_setg_errno(errp, errno,
> +                         "vfio_dma_map_vaddr(iova %lu, size %ld, va %p)",
> +                         iova, size, vaddr);
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
> +static int
> +vfio_region_remap(MemoryRegionSection *section, void *handle, Error **errp)
> +{
> +    MemoryRegion *mr = section->mr;
> +    VFIOContainer *container = handle;
> +    const char *name = memory_region_name(mr);
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr offset, iova, roundup;
> +    void *vaddr;
> +
> +    if (vfio_listener_skipped_section(section) || memory_region_is_iommu(mr)) {
> +        return 0;
> +    }
> +
> +    offset = section->offset_within_address_space;
> +    iova = TARGET_PAGE_ALIGN(offset);
> +    roundup = iova - offset;
> +    size = (size - roundup) & TARGET_PAGE_MASK;
> +    vaddr = memory_region_get_ram_ptr(mr) +
> +            section->offset_within_region + roundup;
> +
> +    trace_vfio_region_remap(name, container->fd, iova, iova + size - 1, vaddr);
> +    return vfio_dma_map_vaddr(container, iova, size, vaddr, errp);
> +}
> +
> +bool vfio_cpr_capable(VFIOContainer *container, Error **errp)
> +{
> +    if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR) ||
> +        !ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
> +        error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR "
> +                         "or VFIO_UNMAP_ALL");
> +        return false;
> +    } else {
> +        return true;
> +    }
> +}
> +
> +int vfio_cprsave(Error **errp)
> +{
> +    VFIOAddressSpace *space, *last_space;
> +    VFIOContainer *container, *last_container;
> +
> +    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> +        QLIST_FOREACH(container, &space->containers, next) {
> +            if (!vfio_cpr_capable(container, errp)) {
> +                return 1;
> +            }
> +        }
> +    }
> +
> +    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> +        QLIST_FOREACH(container, &space->containers, next) {
> +            if (vfio_dma_unmap_vaddr_all(container, errp)) {
> +                goto unwind;
> +            }
> +        }
> +    }
> +    return 0;
> +
> +unwind:
> +    last_space = space;
> +    last_container = container;
> +    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> +        QLIST_FOREACH(container, &space->containers, next) {
> +            Error *err;
> +
> +            if (space == last_space && container == last_container) {
> +                break;
> +            }
> +            if (as_flat_walk(space->as, vfio_region_remap, container, &err)) {
> +                error_prepend(errp, "%s", error_get_pretty(err));
> +                error_free(err);
> +            }
> +        }
> +    }
> +    return 1;
> +}
> +
> +int vfio_cprload(Error **errp)
> +{
> +    VFIOAddressSpace *space;
> +    VFIOContainer *container;
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +
> +    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> +        QLIST_FOREACH(container, &space->containers, next) {
> +            if (!vfio_cpr_capable(container, errp)) {
> +                return 1;
> +            }
> +            container->reused = false;
> +            if (as_flat_walk(space->as, vfio_region_remap, container, errp)) {
> +                return 1;
> +            }
> +        }
> +    }
> +    QLIST_FOREACH(group, &vfio_group_list, next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vbasedev->reused = false;
> +        }
> +    }
> +    return 0;
> +}
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index da9af29..e247b2b 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -5,6 +5,7 @@ vfio_ss.add(files(
>    'migration.c',
>  ))
>  vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
> +  'cpr.c',
>    'display.c',
>    'pci-quirks.c',
>    'pci.c',
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 9fc12bc..0f5c542 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -29,6 +29,8 @@
>  #include "hw/qdev-properties.h"
>  #include "hw/qdev-properties-system.h"
>  #include "migration/vmstate.h"
> +#include "migration/cpr.h"
> +#include "qemu/env.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/module.h"
> @@ -1656,6 +1658,7 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
>  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
>  {
>      VFIOBAR *bar = &vdev->bars[nr];
> +    PCIDevice *pdev = &vdev->pdev;
>      char *name;
>  
>      if (!bar->size) {
> @@ -1676,7 +1679,7 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
>          }
>      }
>  
> -    pci_register_bar(&vdev->pdev, nr, bar->type, bar->mr);
> +    pci_register_bar(pdev, nr, bar->type, bar->mr);
>  }
>  
>  static void vfio_bars_register(VFIOPCIDevice *vdev)
> @@ -2888,6 +2891,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          vfio_put_group(group);
>          goto error;
>      }
> +    pdev->reused = vdev->vbasedev.reused;
>  
>      vfio_populate_device(vdev, &err);
>      if (err) {
> @@ -3157,6 +3161,10 @@ static void vfio_pci_reset(DeviceState *dev)
>  {
>      VFIOPCIDevice *vdev = VFIO_PCI(dev);
>  
> +    if (vdev->pdev.reused) {
> +        return;
> +    }
> +
>      trace_vfio_pci_reset(vdev->vbasedev.name);
>  
>      vfio_pci_pre_reset(vdev);
> @@ -3264,6 +3272,61 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static void vfio_merge_config(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *pdev = &vdev->pdev;
> +    int size = MIN(pci_config_size(pdev), vdev->config_size);
> +    uint8_t *phys_config = g_malloc(size);
> +    uint32_t mask;
> +    int ret, i;
> +
> +    ret = pread(vdev->vbasedev.fd, phys_config, size, vdev->config_offset);
> +    if (ret < size) {
> +        ret = ret < 0 ? errno : EFAULT;
> +        error_report("failed to read device config space: %s", strerror(ret));
> +        return;
> +    }
> +
> +    for (i = 0; i < size; i++) {
> +        mask = vdev->emulated_config_bits[i];
> +        pdev->config[i] = (pdev->config[i] & mask) | (phys_config[i] & ~mask);
> +    }
> +
> +    g_free(phys_config);
> +}
> +
> +static int vfio_pci_post_load(void *opaque, int version_id)
> +{
> +    VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *pdev = &vdev->pdev;
> +    bool enabled;
> +
> +    vfio_merge_config(vdev);
> +
> +    pdev->reused = false;
> +    enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
> +    memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
> +
> +    return 0;
> +}
> +
> +static bool vfio_pci_needed(void *opaque)
> +{
> +    return cpr_mode() == CPR_MODE_RESTART;
> +}
> +
> +static const VMStateDescription vfio_pci_vmstate = {
> +    .name = "vfio-pci",
> +    .unmigratable = 1,
> +    .version_id = 0,
> +    .minimum_version_id = 0,
> +    .post_load = vfio_pci_post_load,
> +    .needed = vfio_pci_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -3271,6 +3334,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>  
>      dc->reset = vfio_pci_reset;
>      device_class_set_props(dc, vfio_pci_dev_properties);
> +    dc->vmsd = &vfio_pci_vmstate;
>      dc->desc = "VFIO-based PCI device assignment";
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      pdc->realize = vfio_realize;
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 0ef1b5f..63dd0fe 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -118,6 +118,7 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
>  vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
>  vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
>  vfio_dma_unmap_overflow_workaround(void) ""
> +vfio_region_remap(const char *name, int fd, uint64_t iova_start, uint64_t iova_end, void *vaddr) "%s fd %d 0x%"PRIx64" - 0x%"PRIx64" [%p]"
>  
>  # platform.c
>  vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index bef3e49..add7f46 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -360,6 +360,7 @@ struct PCIDevice {
>      /* ID of standby device in net_failover pair */
>      char *failover_pair_id;
>      uint32_t acpi_index;
> +    bool reused;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 00acb85..b46d850 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -85,6 +85,7 @@ typedef struct VFIOContainer {
>      Error *error;
>      bool initialized;
>      bool dirty_pages_supported;
> +    bool reused;
>      uint64_t dirty_pgsizes;
>      uint64_t max_dirty_bitmap_size;
>      unsigned long pgsizes;
> @@ -124,6 +125,7 @@ typedef struct VFIODevice {
>      bool no_mmap;
>      bool ram_block_discard_allowed;
>      bool enable_migration;
> +    bool reused;
>      VFIODeviceOps *ops;
>      unsigned int num_irqs;
>      unsigned int num_regions;
> @@ -200,6 +202,9 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
>  void vfio_put_group(VFIOGroup *group);
>  int vfio_get_device(VFIOGroup *group, const char *name,
>                      VFIODevice *vbasedev, Error **errp);
> +int vfio_cprsave(Error **errp);
> +int vfio_cprload(Error **errp);
> +bool vfio_cpr_capable(VFIOContainer *container, Error **errp);
>  
>  extern const MemoryRegionOps vfio_region_ops;
>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
> diff --git a/include/migration/cpr.h b/include/migration/cpr.h
> index bffee19..1ea5046 100644
> --- a/include/migration/cpr.h
> +++ b/include/migration/cpr.h
> @@ -10,6 +10,9 @@
>  
>  #include "qapi/qapi-types-cpr.h"
>  
> +#define CPR_MODE_NONE ((CprMode)(-1))
> +
> +CprMode cpr_mode(void);
>  void cprsave(const char *file, CprMode mode, Error **errp);
>  void cprexec(strList *args, Error **errp);
>  void cprload(const char *file, Error **errp);
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index e680594..48a02c0 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -52,6 +52,12 @@
>  /* Supports the vaddr flag for DMA map and unmap */
>  #define VFIO_UPDATE_VADDR		10
>  
> +/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
> +#define VFIO_UNMAP_ALL                        9
> +
> +/* Supports VFIO DMA map and unmap with the VADDR flag */
> +#define VFIO_UPDATE_VADDR              10
> +
>  /*
>   * The IOCTL interface is designed for extensibility by embedding the
>   * structure length (argsz) and flags into structures passed between
> diff --git a/migration/cpr.c b/migration/cpr.c
> index fb57dec..578466c 100644
> --- a/migration/cpr.c
> +++ b/migration/cpr.c
> @@ -31,6 +31,13 @@
>  #include "hw/virtio/vhost.h"
>  #include "qemu/env.h"
>  
> +static CprMode cpr_active_mode = CPR_MODE_NONE;
> +
> +CprMode cpr_mode(void)
> +{
> +    return cpr_active_mode;
> +}
> +
>  QEMUFile *qf_file_open(const char *path, int flags, int mode,
>                                const char *name, Error **errp)
>  {
> @@ -92,6 +99,7 @@ void cprsave(const char *file, CprMode mode, Error **errp)
>      }
>      vm_stop(RUN_STATE_SAVE_VM);
>  
> +    cpr_active_mode = mode;
>      ret = qemu_save_device_state(f);
>      qemu_fclose(f);
>      if (ret < 0) {
> @@ -105,6 +113,7 @@ err:
>      if (saved_vm_running) {
>          vm_start();
>      }
> +    cpr_active_mode = CPR_MODE_NONE;
>  done:
>      return;
>  }
> @@ -125,6 +134,13 @@ void cprexec(strList *args, Error **errp)
>          error_setg(errp, "runstate is not save-vm");
>          return;
>      }
> +    if (cpr_active_mode != CPR_MODE_RESTART) {
> +        error_setg(errp, "cprexec requires cprsave with restart mode");
> +        return;
> +    }
> +    if (vfio_cprsave(errp)) {
> +        return;
> +    }
>      walkenv(FD_PREFIX, preserve_fd, 0);
>      qemu_system_exec_request(args);
>  }
> @@ -158,6 +174,10 @@ void cprload(const char *file, Error **errp)
>          return;
>      }
>  
> +    if (vfio_cprload(errp)) {
> +        return;
> +    }
> +
It will compile failed in some targets without vfio support such as m68k.
Maybe CONFIG_VFIO should be added for vfio_{save, load}.

>      state = global_state_get_runstate();
>      if (state == RUN_STATE_RUNNING) {
>          vm_start();
> 

-- 
Regards.
Chuan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-19 18:38         ` Steven Sistare
@ 2021-07-28  4:56           ` Zheng Chuan
  2021-07-30 12:52             ` Steven Sistare
  0 siblings, 1 reply; 74+ messages in thread
From: Zheng Chuan @ 2021-07-28  4:56 UTC (permalink / raw)
  To: Steven Sistare, Alex Williamson
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	qemu-devel, Dr. David Alan Gilbert, Paolo Bonzini,
	Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrange,
	Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

Hi

On 2021/7/20 2:38, Steven Sistare wrote:
> On 7/19/2021 2:10 PM, Alex Williamson wrote:
>> On Mon, 19 Jul 2021 13:44:08 -0400
>> Steven Sistare <steven.sistare@oracle.com> wrote:
>>
>>> On 7/16/2021 4:51 PM, Alex Williamson wrote:
>>>> On Wed,  7 Jul 2021 10:20:26 -0700
>>>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>>>   
>>>>> Finish cpr for vfio-pci by preserving eventfd's and vector state.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>  1 file changed, 116 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>> index 0f5c542..07bd360 100644
>>>>> --- a/hw/vfio/pci.c
>>>>> +++ b/hw/vfio/pci.c  
>>>> ...  
>>>>> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice  
>>>> *vdev)  
>>>>>      g_free(phys_config);
>>>>>  }
>>>>>  
>>>>> +static int vfio_pci_pre_save(void *opaque)
>>>>> +{
>>>>> +    VFIOPCIDevice *vdev = opaque;
>>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>>> +    int i;
>>>>> +
>>>>> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
>>>>> +        error_report("%s: cpr does not support vfio-pci INTX",
>>>>> +                     vdev->vbasedev.name);
>>>>> +    }  
>>>>
>>>> You're not only not supporting INTx, but devices that support INTx, so
>>>> this only works on VFs.  Why?  Is this just out of scope or is there
>>>> something fundamentally difficult about it?
>>>>
>>>> This makes me suspect there's a gap in INTx routing setup if it's more
>>>> than just another eventfd to store and setup.  If we hot-add a device
>>>> using INTx after cpr restart, are we going to find problems?  Thanks,  
>>>
>>> It could be supported, but requires more code (several event fd's plus other state in VFIOINTx
>>> to save and restore) for a case that does not seem very useful (a directly assigned device that
>>> only supports INTx ?). 
>>
>> It's not testing that the device *only* supports INTx, it's testing
>> that the device supports INTx _at_all_.  That effectively means this
>> excludes anything other than an SR-IOV VF.  There are plenty of valid
>> and useful cases of assigning PFs, most of which support INTx even if
>> we don't expect that's their primary operational mode.  Thanks,
> 
> OK, I'll look into it.  If this proves problematic, how do you feel about deferring
> INTx support to a later patch?
> 
I am curious about that does cpr restart mode work for GPU passthrough?
> - Steve
> 
> .
> 

-- 
Regards.
Chuan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 23/25] chardev: cpr for sockets
  2021-07-07 17:20 ` [PATCH V5 23/25] chardev: cpr for sockets Steve Sistare
@ 2021-07-29  4:04   ` Zheng Chuan
  0 siblings, 0 replies; 74+ messages in thread
From: Zheng Chuan @ 2021-07-29  4:04 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Markus Armbruster, Alex Williamson,
	Paolo Bonzini, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée

Hi.

On 2021/7/8 1:20, Steve Sistare wrote:
> Save accepted socket fds in the environment before cprsave, and look for
> fds in the environment after cprload.  Reject cprexec if a socket enables
> the TLS or websocket option.  Allow a monitor socket by closing it on exec.
> 
> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
>  monitor/hmp.c         |  3 +++
>  monitor/qmp.c         |  3 +++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index d0fb545..dc9da8c 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -27,7 +27,9 @@
>  #include "io/channel-socket.h"
>  #include "io/channel-tls.h"
>  #include "io/channel-websock.h"
> +#include "qemu/env.h"
>  #include "io/net-listener.h"
> +#include "qemu/env.h"
duplicated include.

>  #include "qemu/error-report.h"
>  #include "qemu/module.h"
>  #include "qemu/option.h"
> @@ -414,6 +416,7 @@ static void tcp_chr_free_connection(Chardev *chr)
>      SocketChardev *s = SOCKET_CHARDEV(chr);
>      int i;
>  
> +    unsetenv_fd(chr->label);
>      if (s->read_msgfds_num) {
>          for (i = 0; i < s->read_msgfds_num; i++) {
>              close(s->read_msgfds[i]);
> @@ -976,6 +979,10 @@ static void tcp_chr_accept(QIONetListener *listener,
>                                 QIO_CHANNEL(cioc));
>      }
>      tcp_chr_new_client(chr, cioc);
> +
> +    if (s->sioc && !chr->close_on_cpr) {
> +        setenv_fd(chr->label, s->sioc->fd);
> +    }
>  }
>  
>  
> @@ -1231,6 +1238,24 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
>      return false;
>  }
>  
> +static void load_char_socket_fd(Chardev *chr, Error **errp)
> +{
> +    SocketChardev *sockchar = SOCKET_CHARDEV(chr);
> +    QIOChannelSocket *sioc;
> +    int fd = getenv_fd(chr->label);
> +
> +    if (fd != -1) {
> +        sockchar = SOCKET_CHARDEV(chr);
> +        sioc = qio_channel_socket_new_fd(fd, errp);
> +        if (sioc) {
> +            tcp_chr_accept(sockchar->listener, sioc, chr);
> +            object_unref(OBJECT(sioc));
> +        } else {
> +            error_setg(errp, "error: could not restore socket for %s",
> +                       chr->label);
> +        }
> +    }
> +}
>  
>  static int qmp_chardev_open_socket_server(Chardev *chr,
>                                            bool is_telnet,
> @@ -1435,6 +1460,10 @@ static void qmp_chardev_open_socket(Chardev *chr,
>      }
>      s->registered_yank = true;
>  
> +    if (!s->tls_creds && !s->is_websock) {
> +        qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
> +    }
> +
>      /* be isn't opened until we get a connection */
>      *be_opened = false;
>  
> @@ -1450,6 +1479,8 @@ static void qmp_chardev_open_socket(Chardev *chr,
>              return;
>          }
>      }
> +
> +    load_char_socket_fd(chr, errp);
>  }
>  
>  static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
> diff --git a/monitor/hmp.c b/monitor/hmp.c
> index 6c0b33a..63700b3 100644
> --- a/monitor/hmp.c
> +++ b/monitor/hmp.c
> @@ -1451,4 +1451,7 @@ void monitor_init_hmp(Chardev *chr, bool use_readline, Error **errp)
>      qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read, monitor_read,
>                               monitor_event, NULL, &mon->common, NULL, true);
>      monitor_list_append(&mon->common);
> +
> +    /* monitor cannot yet be preserved across cpr */
> +    chr->close_on_cpr = true;
>  }
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index 092c527..21a90bf 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -535,4 +535,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
>                                   NULL, &mon->common, NULL, true);
>          monitor_list_append(&mon->common);
>      }
> +
> +    /* Monitor cannot yet be preserved across cpr */
> +    chr->close_on_cpr = true;
>  }
> 

-- 
Regards.
Chuan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 16/25] vfio-pci: cpr part 1
  2021-07-28  4:56   ` Zheng Chuan
@ 2021-07-30 12:50     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-07-30 12:50 UTC (permalink / raw)
  To: Zheng Chuan, qemu-devel
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Markus Armbruster, Alex Williamson,
	Paolo Bonzini, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrange, Philippe Mathieu-Daudé,
	Alex Bennée

On 7/28/2021 12:56 AM, Zheng Chuan wrote:
> On 2021/7/8 1:20, Steve Sistare wrote:
>> Enable vfio-pci devices to be saved and restored across an exec restart
>> of qemu.
>>
>> [...]
>> --- a/migration/cpr.c
>> +++ b/migration/cpr.c
>> @@ -31,6 +31,13 @@
>>  #include "hw/virtio/vhost.h"
>>  #include "qemu/env.h"
>>  
>> +static CprMode cpr_active_mode = CPR_MODE_NONE;
>> +
>> +CprMode cpr_mode(void)
>> +{
>> +    return cpr_active_mode;
>> +}
>> +
>>  QEMUFile *qf_file_open(const char *path, int flags, int mode,
>>                                const char *name, Error **errp)
>>  {
>> @@ -92,6 +99,7 @@ void cprsave(const char *file, CprMode mode, Error **errp)
>>      }
>>      vm_stop(RUN_STATE_SAVE_VM);
>>  
>> +    cpr_active_mode = mode;
>>      ret = qemu_save_device_state(f);
>>      qemu_fclose(f);
>>      if (ret < 0) {
>> @@ -105,6 +113,7 @@ err:
>>      if (saved_vm_running) {
>>          vm_start();
>>      }
>> +    cpr_active_mode = CPR_MODE_NONE;
>>  done:
>>      return;
>>  }
>> @@ -125,6 +134,13 @@ void cprexec(strList *args, Error **errp)
>>          error_setg(errp, "runstate is not save-vm");
>>          return;
>>      }
>> +    if (cpr_active_mode != CPR_MODE_RESTART) {
>> +        error_setg(errp, "cprexec requires cprsave with restart mode");
>> +        return;
>> +    }
>> +    if (vfio_cprsave(errp)) {
>> +        return;
>> +    }
>>      walkenv(FD_PREFIX, preserve_fd, 0);
>>      qemu_system_exec_request(args);
>>  }
>> @@ -158,6 +174,10 @@ void cprload(const char *file, Error **errp)
>>          return;
>>      }
>>  
>> +    if (vfio_cprload(errp)) {
>> +        return;
>> +    }
>> +
> It will compile failed in some targets without vfio support such as m68k.
> Maybe CONFIG_VFIO should be added for vfio_{save, load}.
> 
>>      state = global_state_get_runstate();
>>      if (state == RUN_STATE_RUNNING) {
>>          vm_start();

Thank-you Zheng.  I will fix this and the other mistakes you found.

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-28  4:56           ` Zheng Chuan
@ 2021-07-30 12:52             ` Steven Sistare
  2021-07-31  6:07               ` Zheng Chuan
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Sistare @ 2021-07-30 12:52 UTC (permalink / raw)
  To: Zheng Chuan, Alex Williamson
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	qemu-devel, Dr. David Alan Gilbert, Paolo Bonzini,
	Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrange,
	Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster

On 7/28/2021 12:56 AM, Zheng Chuan wrote:
> On 2021/7/20 2:38, Steven Sistare wrote:
>> On 7/19/2021 2:10 PM, Alex Williamson wrote:
>>> On Mon, 19 Jul 2021 13:44:08 -0400
>>> Steven Sistare <steven.sistare@oracle.com> wrote:
>>>
>>>> On 7/16/2021 4:51 PM, Alex Williamson wrote:
>>>>> On Wed,  7 Jul 2021 10:20:26 -0700
>>>>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>>>>   
>>>>>> Finish cpr for vfio-pci by preserving eventfd's and vector state.
>>>>>>
>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>> ---
>>>>>>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>  1 file changed, 116 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>> index 0f5c542..07bd360 100644
>>>>>> --- a/hw/vfio/pci.c
>>>>>> +++ b/hw/vfio/pci.c  
>>>>> ...  
>>>>>> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice  
>>>>> *vdev)  
>>>>>>      g_free(phys_config);
>>>>>>  }
>>>>>>  
>>>>>> +static int vfio_pci_pre_save(void *opaque)
>>>>>> +{
>>>>>> +    VFIOPCIDevice *vdev = opaque;
>>>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>>>> +    int i;
>>>>>> +
>>>>>> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
>>>>>> +        error_report("%s: cpr does not support vfio-pci INTX",
>>>>>> +                     vdev->vbasedev.name);
>>>>>> +    }  
>>>>>
>>>>> You're not only not supporting INTx, but devices that support INTx, so
>>>>> this only works on VFs.  Why?  Is this just out of scope or is there
>>>>> something fundamentally difficult about it?
>>>>>
>>>>> This makes me suspect there's a gap in INTx routing setup if it's more
>>>>> than just another eventfd to store and setup.  If we hot-add a device
>>>>> using INTx after cpr restart, are we going to find problems?  Thanks,  
>>>>
>>>> It could be supported, but requires more code (several event fd's plus other state in VFIOINTx
>>>> to save and restore) for a case that does not seem very useful (a directly assigned device that
>>>> only supports INTx ?). 
>>>
>>> It's not testing that the device *only* supports INTx, it's testing
>>> that the device supports INTx _at_all_.  That effectively means this
>>> excludes anything other than an SR-IOV VF.  There are plenty of valid
>>> and useful cases of assigning PFs, most of which support INTx even if
>>> we don't expect that's their primary operational mode.  Thanks,
>>
>> OK, I'll look into it.  If this proves problematic, how do you feel about deferring
>> INTx support to a later patch?
>>
> I am curious about that does cpr restart mode work for GPU passthrough?

It should work for any vfio device (after I fix the INTX limitation), but I have not tested
a GPU yet.

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 17/25] vfio-pci: cpr part 2
  2021-07-30 12:52             ` Steven Sistare
@ 2021-07-31  6:07               ` Zheng Chuan
  0 siblings, 0 replies; 74+ messages in thread
From: Zheng Chuan @ 2021-07-31  6:07 UTC (permalink / raw)
  To: Steven Sistare, Alex Williamson
  Cc: Jason Zeng, Juan Quintela, Eric Blake, Michael S. Tsirkin,
	qemu-devel, Xiexiangyou, Dr. David Alan Gilbert, Paolo Bonzini,
	Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrange,
	Philippe Mathieu-Daudé,
	Alex Bennée, Markus Armbruster



On 2021/7/30 20:52, Steven Sistare wrote:
> On 7/28/2021 12:56 AM, Zheng Chuan wrote:
>> On 2021/7/20 2:38, Steven Sistare wrote:
>>> On 7/19/2021 2:10 PM, Alex Williamson wrote:
>>>> On Mon, 19 Jul 2021 13:44:08 -0400
>>>> Steven Sistare <steven.sistare@oracle.com> wrote:
>>>>
>>>>> On 7/16/2021 4:51 PM, Alex Williamson wrote:
>>>>>> On Wed,  7 Jul 2021 10:20:26 -0700
>>>>>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>>>>>   
>>>>>>> Finish cpr for vfio-pci by preserving eventfd's and vector state.
>>>>>>>
>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>>> ---
>>>>>>>  hw/vfio/pci.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>>  1 file changed, 116 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>>> index 0f5c542..07bd360 100644
>>>>>>> --- a/hw/vfio/pci.c
>>>>>>> +++ b/hw/vfio/pci.c  
>>>>>> ...  
>>>>>>> @@ -3295,14 +3329,91 @@ static void vfio_merge_config(VFIOPCIDevice  
>>>>>> *vdev)  
>>>>>>>      g_free(phys_config);
>>>>>>>  }
>>>>>>>  
>>>>>>> +static int vfio_pci_pre_save(void *opaque)
>>>>>>> +{
>>>>>>> +    VFIOPCIDevice *vdev = opaque;
>>>>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
>>>>>>> +        error_report("%s: cpr does not support vfio-pci INTX",
>>>>>>> +                     vdev->vbasedev.name);
>>>>>>> +    }  
>>>>>>
>>>>>> You're not only not supporting INTx, but devices that support INTx, so
>>>>>> this only works on VFs.  Why?  Is this just out of scope or is there
>>>>>> something fundamentally difficult about it?
>>>>>>
>>>>>> This makes me suspect there's a gap in INTx routing setup if it's more
>>>>>> than just another eventfd to store and setup.  If we hot-add a device
>>>>>> using INTx after cpr restart, are we going to find problems?  Thanks,  
>>>>>
>>>>> It could be supported, but requires more code (several event fd's plus other state in VFIOINTx
>>>>> to save and restore) for a case that does not seem very useful (a directly assigned device that
>>>>> only supports INTx ?). 
>>>>
>>>> It's not testing that the device *only* supports INTx, it's testing
>>>> that the device supports INTx _at_all_.  That effectively means this
>>>> excludes anything other than an SR-IOV VF.  There are plenty of valid
>>>> and useful cases of assigning PFs, most of which support INTx even if
>>>> we don't expect that's their primary operational mode.  Thanks,
>>>
>>> OK, I'll look into it.  If this proves problematic, how do you feel about deferring
>>> INTx support to a later patch?
>>>
>> I am curious about that does cpr restart mode work for GPU passthrough?
> 
> It should work for any vfio device (after I fix the INTX limitation), but I have not tested
> a GPU yet.
> 
> - Steve
> .
Yes, The GPU may switch frequently between INTX and MSI, and cpr should support both of them:)
Thanks.
> 

-- 
Regards.
Chuan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 03/25] cpr: QMP interfaces for reboot
  2021-07-07 17:20 ` [PATCH V5 03/25] cpr: QMP interfaces for reboot Steve Sistare
  2021-07-08 13:27   ` Marc-André Lureau
@ 2021-08-04 15:48   ` Eric Blake
  2021-08-04 20:27     ` Steven Sistare
  1 sibling, 1 reply; 74+ messages in thread
From: Eric Blake @ 2021-08-04 15:48 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed, Jul 07, 2021 at 10:20:12AM -0700, Steve Sistare wrote:
> cprsave calls cprsave().  Syntax:
>   { 'enum': 'CprMode', 'data': [ 'reboot' ] }
>   { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }
> 
> cprload calls cprload().  Syntax:
>   { 'command': 'cprload', 'data': { 'file': 'str' } }

Does this also allow the magic "/dev/fdset/NNN" syntax for opening an
fd already passed in previously?
/me goes back to patch 2 to check
Yes, it looks like it should.

> 
> cprinfo returns a list of supported modes.  Syntax:
>   { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
>   { 'command': 'cprinfo', 'returns': 'CprInfo' }

As pointed out elsewhere, relying on introspection seems nicer than
adding this command.

> 
> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---

> +++ b/qapi/cpr.json
> @@ -0,0 +1,74 @@
> +# -*- Mode: Python -*-
> +#
> +# Copyright (c) 2021 Oracle and/or its affiliates.
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2.
> +# See the COPYING file in the top-level directory.
> +
> +##
> +# = CPR

Might be worth expanding what this acronym stands for here.

> +##
> +
> +{ 'include': 'common.json' }
> +
> +##
> +# @CprMode:
> +#
> +# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
> +#
> +# Since: 6.1

As this missed 6.1, you'll need to (eventually) rebase the series to
mention 6.2 everywhere.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 02/25] cpr: reboot mode
  2021-07-07 17:20 ` [PATCH V5 02/25] cpr: reboot mode Steve Sistare
  2021-07-08 12:25   ` Marc-André Lureau
@ 2021-08-04 15:48   ` Eric Blake
  1 sibling, 0 replies; 74+ messages in thread
From: Eric Blake @ 2021-08-04 15:48 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed, Jul 07, 2021 at 10:20:11AM -0700, Steve Sistare wrote:
> Provide the cprsave and cprload functions for live update.  These save and
> restore VM state, with minimal guest pause time, so that qemu may be updated
> to a new version in between.
> 

> +++ b/migration/cpr.c
> @@ -0,0 +1,149 @@

> +
> +QEMUFile *qf_file_open(const char *path, int flags, int mode,
> +                              const char *name, Error **errp)

Indentation is off.

> +{
> +    QIOChannelFile *fioc;
> +    QIOChannel *ioc;
> +    QEMUFile *f;
> +
> +    if (flags & O_RDWR) {
> +        error_setg(errp, "qf_file_open %s: O_RDWR not supported", path);
> +        return 0;
> +    }
> +
> +    fioc = qio_channel_file_new_path(path, flags, mode, errp);

Good, you aren't using bare open(), but reusing existing wrappers,
which means you should be able to accept magic filenames like
"/dev/fdset/1" to open an fd passed in previously.  (I had to come
back to this patch to make sure after starting on patch 3)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 12/25] cpr: QMP interfaces for restart
  2021-07-07 17:20 ` [PATCH V5 12/25] cpr: QMP interfaces for restart Steve Sistare
  2021-07-08 15:49   ` Marc-André Lureau
@ 2021-08-04 16:00   ` Eric Blake
  2021-08-04 20:22     ` Steven Sistare
  1 sibling, 1 reply; 74+ messages in thread
From: Eric Blake @ 2021-08-04 16:00 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed, Jul 07, 2021 at 10:20:21AM -0700, Steve Sistare wrote:
> cprexec calls cprexec().  Syntax:
>   { 'command': 'cprexec', 'data': { 'argv': [ 'str' ] } }
> 
> Add the restart mode:
>   { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  monitor/qmp-cmds.c |  5 +++++
>  qapi/cpr.json      | 16 +++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index 1128604..7326f7d 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -179,6 +179,11 @@ void qmp_cprsave(const char *file, CprMode mode, Error **errp)
>      cprsave(file, mode, errp);
>  }
>  
> +void qmp_cprexec(strList *args, Error **errp)
> +{
> +    cprexec(args, errp);
> +}

Why do you need both qmp_cprexec() and cprexec()?  Can you just name
it qmp_cprexec() in cpr.c from the get-go, rather than having to add a
one-line wrapper in qmp-cmds.c?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 12/25] cpr: QMP interfaces for restart
  2021-08-04 16:00   ` Eric Blake
@ 2021-08-04 20:22     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-08-04 20:22 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On 8/4/2021 12:00 PM, Eric Blake wrote:
> On Wed, Jul 07, 2021 at 10:20:21AM -0700, Steve Sistare wrote:
>> cprexec calls cprexec().  Syntax:
>>   { 'command': 'cprexec', 'data': { 'argv': [ 'str' ] } }
>>
>> Add the restart mode:
>>   { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  monitor/qmp-cmds.c |  5 +++++
>>  qapi/cpr.json      | 16 +++++++++++++++-
>>  2 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
>> index 1128604..7326f7d 100644
>> --- a/monitor/qmp-cmds.c
>> +++ b/monitor/qmp-cmds.c
>> @@ -179,6 +179,11 @@ void qmp_cprsave(const char *file, CprMode mode, Error **errp)
>>      cprsave(file, mode, errp);
>>  }
>>  
>> +void qmp_cprexec(strList *args, Error **errp)
>> +{
>> +    cprexec(args, errp);
>> +}
> 
> Why do you need both qmp_cprexec() and cprexec()?  Can you just name
> it qmp_cprexec() in cpr.c from the get-go, rather than having to add a
> one-line wrapper in qmp-cmds.c?

Will do.

While I'm at it, I will add an underscore to the function names and a dash to the command
names to be consistent with other compound-word commands:

qmp_cpr_save
qmp_cpr_exec
qmp_cpr_load
cpr-save
cpr-exec
cpr-load

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V5 03/25] cpr: QMP interfaces for reboot
  2021-08-04 15:48   ` Eric Blake
@ 2021-08-04 20:27     ` Steven Sistare
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Sistare @ 2021-08-04 20:27 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, qemu-devel,
	Dr. David Alan Gilbert, Alex Williamson, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On 8/4/2021 11:48 AM, Eric Blake wrote:
> On Wed, Jul 07, 2021 at 10:20:12AM -0700, Steve Sistare wrote:
>> cprsave calls cprsave().  Syntax:
>>   { 'enum': 'CprMode', 'data': [ 'reboot' ] }
>>   { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }
>>
>> cprload calls cprload().  Syntax:
>>   { 'command': 'cprload', 'data': { 'file': 'str' } }
> 
> Does this also allow the magic "/dev/fdset/NNN" syntax for opening an
> fd already passed in previously?
> /me goes back to patch 2 to check
> Yes, it looks like it should.
> 
>>
>> cprinfo returns a list of supported modes.  Syntax:
>>   { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
>>   { 'command': 'cprinfo', 'returns': 'CprInfo' }
> 
> As pointed out elsewhere, relying on introspection seems nicer than
> adding this command.
> 
>>
>> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
> 
>> +++ b/qapi/cpr.json
>> @@ -0,0 +1,74 @@
>> +# -*- Mode: Python -*-
>> +#
>> +# Copyright (c) 2021 Oracle and/or its affiliates.
>> +#
>> +# This work is licensed under the terms of the GNU GPL, version 2.
>> +# See the COPYING file in the top-level directory.
>> +
>> +##
>> +# = CPR
> 
> Might be worth expanding what this acronym stands for here.> 
>> +##
>> +
>> +{ 'include': 'common.json' }
>> +
>> +##
>> +# @CprMode:
>> +#
>> +# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
>> +#
>> +# Since: 6.1
> 
> As this missed 6.1, you'll need to (eventually) rebase the series to
> mention 6.2 everywhere.

Will do for both. (Marc-Andre also asked).  And I'll fix the indentation problem in patch 2.

- Steve


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2021-08-04 20:28 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 17:20 [PATCH V5 00/25] Live Update Steve Sistare
2021-07-07 17:20 ` [PATCH V5 01/25] qemu_ram_volatile Steve Sistare
2021-07-08 12:01   ` Marc-André Lureau
2021-07-12 17:06     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 02/25] cpr: reboot mode Steve Sistare
2021-07-08 12:25   ` Marc-André Lureau
2021-07-12 17:07     ` Steven Sistare
2021-08-04 15:48   ` Eric Blake
2021-07-07 17:20 ` [PATCH V5 03/25] cpr: QMP interfaces for reboot Steve Sistare
2021-07-08 13:27   ` Marc-André Lureau
2021-07-12 17:07     ` Steven Sistare
2021-08-04 15:48   ` Eric Blake
2021-08-04 20:27     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 04/25] cpr: HMP " Steve Sistare
2021-07-28  4:55   ` Zheng Chuan
2021-07-07 17:20 ` [PATCH V5 05/25] as_flat_walk Steve Sistare
2021-07-08 13:49   ` Marc-André Lureau
2021-07-12 17:07     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 06/25] oslib: qemu_clr_cloexec Steve Sistare
2021-07-08 13:58   ` Marc-André Lureau
2021-07-12 17:07     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 07/25] machine: memfd-alloc option Steve Sistare
2021-07-08 14:20   ` Marc-André Lureau
2021-07-12 17:07     ` Steven Sistare
2021-07-12 17:45       ` Marc-André Lureau
2021-07-07 17:20 ` [PATCH V5 08/25] vl: add helper to request re-exec Steve Sistare
2021-07-08 14:31   ` Marc-André Lureau
2021-07-12 17:07     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 09/25] string to strList Steve Sistare
2021-07-08 14:37   ` Marc-André Lureau
2021-07-07 17:20 ` [PATCH V5 10/25] util: env var helpers Steve Sistare
2021-07-08 15:10   ` Marc-André Lureau
2021-07-12 19:19     ` Steven Sistare
2021-07-12 19:36       ` Marc-André Lureau
2021-07-13 16:15         ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 11/25] cpr: restart mode Steve Sistare
2021-07-08 15:43   ` Marc-André Lureau
2021-07-08 15:54     ` Marc-André Lureau
2021-07-12 19:19       ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 12/25] cpr: QMP interfaces for restart Steve Sistare
2021-07-08 15:49   ` Marc-André Lureau
2021-07-12 19:19     ` Steven Sistare
2021-08-04 16:00   ` Eric Blake
2021-08-04 20:22     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 13/25] cpr: HMP " Steve Sistare
2021-07-28  4:56   ` Zheng Chuan
2021-07-07 17:20 ` [PATCH V5 14/25] pci: export functions for cpr Steve Sistare
2021-07-07 17:20 ` [PATCH V5 15/25] vfio-pci: refactor " Steve Sistare
2021-07-07 17:20 ` [PATCH V5 16/25] vfio-pci: cpr part 1 Steve Sistare
2021-07-16 17:45   ` Alex Williamson
2021-07-19 17:43     ` Steven Sistare
2021-07-28  4:56   ` Zheng Chuan
2021-07-30 12:50     ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 17/25] vfio-pci: cpr part 2 Steve Sistare
2021-07-16 20:51   ` Alex Williamson
2021-07-19 17:44     ` Steven Sistare
2021-07-19 18:10       ` Alex Williamson
2021-07-19 18:38         ` Steven Sistare
2021-07-28  4:56           ` Zheng Chuan
2021-07-30 12:52             ` Steven Sistare
2021-07-31  6:07               ` Zheng Chuan
2021-07-07 17:20 ` [PATCH V5 18/25] vhost: reset vhost devices upon cprsave Steve Sistare
2021-07-07 17:20 ` [PATCH V5 19/25] hostmem-memfd: cpr support Steve Sistare
2021-07-07 17:20 ` [PATCH V5 20/25] chardev: cpr framework Steve Sistare
2021-07-08 16:03   ` Marc-André Lureau
2021-07-12 19:20     ` Steven Sistare
2021-07-12 19:49       ` Marc-André Lureau
2021-07-13 14:34         ` Steven Sistare
2021-07-07 17:20 ` [PATCH V5 21/25] chardev: cpr for simple devices Steve Sistare
2021-07-07 17:20 ` [PATCH V5 22/25] chardev: cpr for pty Steve Sistare
2021-07-07 17:20 ` [PATCH V5 23/25] chardev: cpr for sockets Steve Sistare
2021-07-29  4:04   ` Zheng Chuan
2021-07-07 17:20 ` [PATCH V5 24/25] cpr: only-cpr-capable option Steve Sistare
2021-07-07 17:20 ` [PATCH V5 25/25] simplify savevm Steve Sistare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.