All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 00/22] Live Update
@ 2021-01-05 15:41 Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 01/22] as_flat_walk Steve Sistare
                   ` (21 more replies)
  0 siblings, 22 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave and cprload commands for live update.  These save and
restore VM state, with minimal guest pause time, so that qemu may be updated
to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports two
modes: restart and reboot.  For restart, cprsave exec's the qemu binary (or
/usr/bin/qemu-exec if it exists) with the same argv.  qemu restarts in a
paused state and waits for the cprload command.

To use the restart mode, qemu must be started with the memfd-alloc option,
which allocates guest ram using memfd_create.  The memfd's are saved to
the environment and kept open across exec, after which they are found from
the environment and re-mmap'd.  Hence guest ram is preserved in place,
albeit with new virtual addresses in the qemu process.  The caller resumes
the guest by calling cprload, which loads state from the file.  If the VM
was running at cprsave time, then VM execution resumes.  cprsave supports
any type of guest image and block device, but the caller must not modify
guest block devices between cprsave and cprload.

The restart mode supports vfio devices by preserving the vfio container,
group, device, and event descriptors across the qemu re-exec, and by
updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_SUSPEND
and VFIO_DMA_MAP_FLAG_RESUME as proposed in 
https://lore.kernel.org/kvm/1609861013-129801-1-git-send-email-steven.sistare@oracle.com

For the reboot mode, cprsave saves state and exits qemu, and the caller is
allowed to update the host kernel and system software and reboot.  The
caller resumes the guest by running qemu with the same arguments as the
original process and calling cprload.  To use this mode, guest ram must be
mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
PKRAM as proposed in https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/

The reboot mode supports vfio devices if the caller suspends the guest
instead of stopping the VM, such as by issuing guest-suspend-ram to the
qemu guest agent.  The guest drivers' suspend methods flush outstanding
requests and re-initialize the devices, and thus there is no device state
to save and restore.

The first patches add helper functions:

  - as_flat_walk
  - qemu_ram_volatile
  - oslib: qemu_clr_cloexec
  - util: env var helpers
  - vl: memfd-alloc option
  - vl: add helper to request re-exec

The next patches implement cprsave and cprload:

  - cpr
  - cpr: QMP interfaces
  - cpr: HMP interfaces

The next patches add vfio support for the restart mode:

  - pci: export functions for cpr
  - vfio-pci: refactor for cpr
  - vfio-pci: cpr

The next patches preserve various descriptor-based backend devices across
a cprsave restart:

  - vhost: reset vhost devices upon cprsave
  - chardev: cpr framework
  - chardev: cpr for simple devices
  - chardev: cpr for pty
  - chardev: socket accept subroutine
  - chardev: cpr for sockets
  - monitor: cpr support
  - cpr: only-cpr-capable option
  - cpr: maintainers
  - simplify savevm

Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
"cprload restart".  The software update is performed while the guest is
running to minimize downtime.

window 1				| window 2
					|
# qemu-system-x86_64 ... 		|
QEMU 4.2.0 monitor - type 'help' ...	|
(qemu) info status			|
VM status: running			|
					| # yum update qemu
(qemu) cprsave /tmp/qemu.sav restart	|
QEMU 4.2.1 monitor - type 'help' ...	|
(qemu) info status			|
VM status: paused (prelaunch)		|
(qemu) cprload /tmp/qemu.sav		|
(qemu) info status			|
VM status: running			|


Here is an example of updating the host kernel using "cprload reboot"

window 1					| window 2
						|
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...		|
(qemu) info status				|
VM status: running				|
						| # yum update kernel-uek
(qemu) cprsave /tmp/qemu.sav restart		|
						|
# systemctl kexec				|
kexec_core: Starting new kernel			|
...						|
						|
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...		|
(qemu) info status				|
VM status: paused (prelaunch)			|
(qemu) cprload /tmp/qemu.sav			|
(qemu) info status				|
VM status: running				|

Changes from V1 to V2:
  - revert vmstate infrastructure changes
  - refactor cpr functions into new files
  - delete MADV_DOEXEC and use memfd + VFIO_DMA_UNMAP_FLAG_SUSPEND to 
    preserve memory.
  - add framework to filter chardev's that support cpr
  - save and restore vfio eventfd's
  - modify cprinfo QMP interface
  - incorporate misc review feedback
  - remove unrelated and unneeded patches
  - refactor all patches into a shorter and easier to review series

Steve Sistare (17):
  as_flat_walk
  qemu_ram_volatile
  oslib: qemu_clr_cloexec
  util: env var helpers
  vl: memfd-alloc option
  vl: add helper to request re-exec
  cpr
  pci: export functions for cpr
  vfio-pci: refactor for cpr
  vfio-pci: cpr
  chardev: cpr framework
  chardev: cpr for simple devices
  chardev: cpr for pty
  chardev: socket accept subroutine
  cpr: only-cpr-capable option
  cpr: maintainers
  simplify savevm

Mark Kanda (5):
  cpr: QMP interfaces
  cpr: HMP interfaces
  vhost: reset vhost devices upon cprsave
  chardev: cpr for sockets
  monitor: cpr support

 MAINTAINERS                   |  11 +++
 chardev/char-mux.c            |   1 +
 chardev/char-null.c           |   1 +
 chardev/char-pty.c            |  16 +++-
 chardev/char-serial.c         |   1 +
 chardev/char-socket.c         |  31 +++++++
 chardev/char-stdio.c          |   8 ++
 chardev/char.c                |  41 ++++++++-
 exec.c                        |  75 +++++++++++++--
 gdbstub.c                     |   1 +
 hmp-commands.hx               |  44 +++++++++
 hw/pci/msix.c                 |  20 ++--
 hw/pci/pci.c                  |   7 +-
 hw/vfio/Makefile.objs         |   2 +-
 hw/vfio/common.c              |  63 ++++++++++++-
 hw/vfio/cpr.c                 | 117 +++++++++++++++++++++++
 hw/vfio/pci.c                 | 209 ++++++++++++++++++++++++++++++++++++++----
 hw/vfio/trace-events          |   1 +
 hw/virtio/vhost.c             |  11 +++
 include/chardev/char.h        |   6 ++
 include/exec/memory.h         |  11 +++
 include/hw/pci/msix.h         |   5 +
 include/hw/pci/pci.h          |   2 +
 include/hw/vfio/vfio-common.h |   7 ++
 include/hw/virtio/vhost.h     |   1 +
 include/io/channel-socket.h   |  12 +++
 include/migration/cpr.h       |  17 ++++
 include/monitor/hmp.h         |   3 +
 include/monitor/monitor.h     |   2 +
 include/qemu/env.h            |  27 ++++++
 include/qemu/osdep.h          |   1 +
 include/sysemu/sysemu.h       |   4 +
 io/channel-socket.c           |  52 +++++++----
 linux-headers/linux/vfio.h    |   5 +
 migration/Makefile.objs       |   2 +-
 migration/cpr.c               | 198 +++++++++++++++++++++++++++++++++++++++
 migration/migration.c         |   6 ++
 migration/savevm.c            |  19 ++--
 migration/savevm.h            |   2 +
 monitor/hmp-cmds.c            |  48 ++++++++++
 monitor/monitor.c             |   5 +
 monitor/qmp-cmds.c            |  31 +++++++
 monitor/qmp.c                 |  43 +++++++++
 qapi/Makefile.objs            |   3 +-
 qapi/char.json                |   5 +-
 qapi/cpr.json                 |  68 ++++++++++++++
 qapi/qapi-schema.json         |   1 +
 qemu-options.hx               |  45 ++++++++-
 slirp                         |   2 +-
 softmmu/memory.c              |  17 ++++
 softmmu/vl.c                  |  68 +++++++++++++-
 stubs/Makefile.objs           |   1 +
 stubs/cpr.c                   |   3 +
 trace-events                  |   1 +
 util/Makefile.objs            |   2 +-
 util/env.c                    | 119 ++++++++++++++++++++++++
 util/oslib-posix.c            |   9 ++
 util/oslib-win32.c            |   4 +
 58 files changed, 1433 insertions(+), 84 deletions(-)
 create mode 100644 hw/vfio/cpr.c
 create mode 100644 include/migration/cpr.h
 create mode 100644 include/qemu/env.h
 create mode 100644 migration/cpr.c
 create mode 100644 qapi/cpr.json
 create mode 100644 stubs/cpr.c
 create mode 100644 util/env.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V2 01/22] as_flat_walk
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 02/22] qemu_ram_volatile Steve Sistare
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add an iterator over the sections of a flattened address space.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/exec/memory.h |  3 +++
 softmmu/memory.c      | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 307e527..8dba065 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1894,6 +1894,9 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
  */
 bool memory_region_is_mapped(MemoryRegion *mr);
 
+typedef int (*qemu_flat_walk_cb)(MemoryRegionSection *s, void *handle);
+int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func, void *handle);
+
 /**
  * memory_region_find: translate an address/size relative to a
  * MemoryRegion into a #MemoryRegionSection.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index af25987..8cac3bc 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2513,6 +2513,23 @@ bool memory_region_is_mapped(MemoryRegion *mr)
     return mr->container ? true : false;
 }
 
+int as_flat_walk(AddressSpace *as, qemu_flat_walk_cb func, void *handle)
+{
+    FlatView *view = address_space_get_flatview(as);
+    FlatRange *fr;
+    int ret;
+
+    FOR_EACH_FLAT_RANGE(fr, view) {
+        MemoryRegionSection section = section_from_flat_range(fr, view);
+        ret = func(&section, handle);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /* Same as memory_region_find, but it does not add a reference to the
  * returned region.  It must be called from an RCU critical section.
  */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 02/22] qemu_ram_volatile
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 01/22] as_flat_walk Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 03/22] oslib: qemu_clr_cloexec Steve Sistare
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a function that returns true if any ram_list block represents
volatile memory.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 exec.c                | 30 ++++++++++++++++++++++++++++++
 include/exec/memory.h |  8 ++++++++
 slirp                 |  2 +-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 6f381f9..d1f31b4 100644
--- a/exec.c
+++ b/exec.c
@@ -2726,6 +2726,36 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
     return block->offset + offset;
 }
 
+/*
+ * Return true if any memory regions are writable and not backed by shared
+ * memory.
+ */
+bool qemu_ram_volatile(Error **errp)
+{
+    RAMBlock *block;
+    MemoryRegion *mr;
+    bool ret = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        mr = block->mr;
+        if (mr &&
+            memory_region_is_ram(mr) &&
+            !memory_region_is_ram_device(mr) &&
+            !memory_region_is_rom(mr) &&
+            (block->fd == -1 || !qemu_ram_is_shared(block))) {
+
+            error_setg(errp, "Memory region %s is volatile",
+                       memory_region_name(mr));
+            ret = true;
+            break;
+        }
+    }
+
+    rcu_read_unlock();
+    return ret;
+}
+
 /* Generate a debug exception if a watchpoint has been hit.  */
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
                           MemTxAttrs attrs, int flags, uintptr_t ra)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8dba065..6115a01 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2522,6 +2522,14 @@ bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+/**
+ * qemu_ram_volatile: return true if any memory regions are writable and not
+ * backed by shared memory.
+ *
+ * @errp: returned error message identifying the bad region.
+ */
+bool qemu_ram_volatile(Error **errp);
+
 #endif
 
 #endif
diff --git a/slirp b/slirp
index ce94eba..a62d367 160000
--- a/slirp
+++ b/slirp
@@ -1 +1 @@
-Subproject commit ce94eba2042d52a0ba3d9e252ebce86715e94275
+Subproject commit a62d36734ffe9828d0f70df1b3898a3b4fbda755
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 03/22] oslib: qemu_clr_cloexec
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 01/22] as_flat_walk Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 02/22] qemu_ram_volatile Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 04/22] util: env var helpers Steve Sistare
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Define qemu_clr_cloexec, analogous to qemu_set_cloexec.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/qemu/osdep.h | 1 +
 util/oslib-posix.c   | 9 +++++++++
 util/oslib-win32.c   | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 20872e7..d7d67f2 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -554,6 +554,7 @@ static inline void qemu_timersub(const struct timeval *val1,
 #endif
 
 void qemu_set_cloexec(int fd);
+void qemu_clr_cloexec(int fd);
 
 /* Starting on QEMU 2.5, qemu_hw_version() returns "2.5+" by default
  * instead of QEMU_VERSION, so setting hw_version on MachineClass
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index ad8001a..9e3b477 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -314,6 +314,15 @@ void qemu_set_cloexec(int fd)
     assert(f != -1);
 }
 
+void qemu_clr_cloexec(int fd)
+{
+    int f;
+    f = fcntl(fd, F_GETFD);
+    assert(f != -1);
+    f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
+    assert(f != -1);
+}
+
 /*
  * Creates a pipe with FD_CLOEXEC set on both file descriptors
  */
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index c654daf..42eb2cc 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -254,6 +254,10 @@ void qemu_set_cloexec(int fd)
 {
 }
 
+void qemu_clr_cloexec(int fd)
+{
+}
+
 /* Offset between 1/1/1601 and 1/1/1970 in 100 nanosec units */
 #define _W32_FT_OFFSET (116444736000000000ULL)
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 04/22] util: env var helpers
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (2 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 03/22] oslib: qemu_clr_cloexec Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 05/22] vl: memfd-alloc option Steve Sistare
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add functions for saving fd's and other values in the environment via
setenv, and for reading them back via getenv.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
---
 include/qemu/env.h |  27 ++++++++++++
 util/Makefile.objs |   2 +-
 util/env.c         | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/env.h
 create mode 100644 util/env.c

diff --git a/include/qemu/env.h b/include/qemu/env.h
new file mode 100644
index 0000000..174f0c7
--- /dev/null
+++ b/include/qemu/env.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_ENV_H
+#define QEMU_ENV_H
+
+#define FD_PREFIX "QEMU_FD_"
+#define BOOL_PREFIX "QEMU_BOOL_"
+
+typedef int (*walkenv_cb)(const char *name, const char *val, void *handle);
+
+int getenv_fd(const char *name);
+void setenv_fd(const char *name, int fd);
+void unsetenv_fd(const char *name);
+void unsetenv_fdv(const char *fmt, ...);
+bool getenv_bool(const char *name);
+void setenv_bool(const char *name, bool val);
+void unsetenv_bool(const char *name);
+int walkenv(const char *prefix, walkenv_cb cb, void *handle);
+void printenv(void);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index cc5e371..d357932 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -1,4 +1,4 @@
-util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o
+util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o env.o
 util-obj-$(call lnot,$(CONFIG_ATOMIC64)) += atomic64.o
 util-obj-$(CONFIG_POSIX) += aio-posix.o
 util-obj-$(CONFIG_POSIX) += fdmon-poll.o
diff --git a/util/env.c b/util/env.c
new file mode 100644
index 0000000..afaf77f
--- /dev/null
+++ b/util/env.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/env.h"
+
+static uint64_t getenv_ulong(const char *prefix, const char *name, bool *found)
+{
+    char var[80], *val;
+    uint64_t res;
+
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    val = getenv(var);
+    if (val) {
+        *found = true;
+        res = strtol(val, 0, 10);
+    } else {
+        *found = false;
+        res = 0;
+    }
+    return res;
+}
+
+static void setenv_ulong(const char *prefix, const char *name, uint64_t val)
+{
+    char var[80], val_str[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    snprintf(val_str, sizeof(val_str), "%"PRIu64, val);
+    setenv(var, val_str, 1);
+}
+
+static void unsetenv_ulong(const char *prefix, const char *name)
+{
+    char var[80];
+    snprintf(var, sizeof(var), "%s%s", prefix, name);
+    unsetenv(var);
+}
+
+int getenv_fd(const char *name)
+{
+    bool found;
+    int fd = getenv_ulong(FD_PREFIX, name, &found);
+    if (!found) {
+        fd = -1;
+    }
+    return fd;
+}
+
+void setenv_fd(const char *name, int fd)
+{
+    setenv_ulong(FD_PREFIX, name, fd);
+}
+
+void unsetenv_fd(const char *name)
+{
+    unsetenv_ulong(FD_PREFIX, name);
+}
+
+void unsetenv_fdv(const char *fmt, ...)
+{
+    va_list args;
+    char buf[80];
+    va_start(args, fmt);
+    vsnprintf(buf, sizeof(buf), fmt, args);
+    va_end(args);
+}
+
+bool getenv_bool(const char *name)
+{
+    bool found;
+    bool val = getenv_ulong(BOOL_PREFIX, name, &found);
+    if (!found) {
+        val = -1;
+    }
+    return val;
+}
+
+void setenv_bool(const char *name, bool val)
+{
+    setenv_ulong(BOOL_PREFIX, name, val);
+}
+
+void unsetenv_bool(const char *name)
+{
+    unsetenv_ulong(BOOL_PREFIX, name);
+}
+
+int walkenv(const char *prefix, walkenv_cb cb, void *handle)
+{
+    char *str, name[128];
+    char **envp = environ;
+    size_t prefix_len = strlen(prefix);
+
+    while (*envp) {
+        str = *envp++;
+        if (!strncmp(str, prefix, prefix_len)) {
+            char *val = strchr(str, '=');
+            str += prefix_len;
+            strncpy(name, str, val - str);
+            name[val - str] = 0;
+            if (cb(name, val + 1, handle)) {
+                return 1;
+            }
+        }
+    }
+    return 0;
+}
+
+void printenv(void)
+{
+    char **ptr = environ;
+    while (*ptr) {
+        puts(*ptr++);
+    }
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 05/22] vl: memfd-alloc option
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (3 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 04/22] util: env var helpers Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 16:27   ` Daniel P. Berrangé
  2021-01-05 15:41 ` [PATCH V2 06/22] vl: add helper to request re-exec Steve Sistare
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Allocate anonymous memory using memfd_create if the memfd-alloc option is
set.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 exec.c                  | 38 ++++++++++++++++++++++++++++++--------
 include/sysemu/sysemu.h |  1 +
 qemu-options.hx         | 11 +++++++++++
 softmmu/vl.c            |  4 ++++
 trace-events            |  1 +
 5 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index d1f31b4..6da6590 100644
--- a/exec.c
+++ b/exec.c
@@ -67,6 +67,7 @@
 #include "exec/log.h"
 
 #include "qemu/pmem.h"
+#include "qemu/memfd.h"
 
 #include "migration/vmstate.h"
 
@@ -2231,34 +2232,55 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
 {
     RAMBlock *block;
     RAMBlock *last_block = NULL;
+    struct MemoryRegion *mr = new_block->mr;
     ram_addr_t old_ram_size, new_ram_size;
     Error *err = NULL;
+    const char *name;
+    void *addr = 0;
+    size_t maxlen;
 
     old_ram_size = last_ram_page();
 
     qemu_mutex_lock_ramlist();
-    new_block->offset = find_ram_offset(new_block->max_length);
+    maxlen = new_block->max_length;
+    new_block->offset = find_ram_offset(maxlen);
 
     if (!new_block->host) {
         if (xen_enabled()) {
-            xen_ram_alloc(new_block->offset, new_block->max_length,
-                          new_block->mr, &err);
+            xen_ram_alloc(new_block->offset, maxlen, new_block->mr, &err);
             if (err) {
                 error_propagate(errp, err);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
         } else {
-            new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align, shared);
-            if (!new_block->host) {
+            name = memory_region_name(new_block->mr);
+            if (memfd_alloc) {
+                int mfd = -1;          /* placeholder until next patch */
+                mr->align = QEMU_VMALLOC_ALIGN;
+                if (mfd < 0) {
+                    mfd = qemu_memfd_create(name, maxlen + mr->align,
+                                            0, 0, 0, &err);
+                    if (mfd < 0) {
+                        return;
+                    }
+                }
+                new_block->flags |= RAM_SHARED;
+                addr = file_ram_alloc(new_block, maxlen, mfd, false, errp);
+                trace_anon_memfd_alloc(name, maxlen, addr, mfd);
+            } else {
+                addr = phys_mem_alloc(maxlen, &mr->align, shared);
+            }
+
+            if (!addr) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
-                                 memory_region_name(new_block->mr));
+                                 name);
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
+            memory_try_enable_merging(addr, maxlen);
+            new_block->host = addr;
         }
     }
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 4b6a5c4..408eb56 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -13,6 +13,7 @@ extern int only_migratable;
 extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
+extern bool memfd_alloc;
 
 void qemu_add_data_dir(const char *path);
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 708583b..455b43b7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4094,6 +4094,17 @@ SRST
     an unmigratable state.
 ERST
 
+#ifdef __linux__
+DEF("memfd-alloc", 0,  QEMU_OPTION_memfd_alloc, \
+    "-memfd-alloc         allocate anonymous memory using memfd_create\n",
+    QEMU_ARCH_ALL)
+#endif
+
+SRST
+``-memfd-alloc``
+    Allocate anonymous memory using memfd_create (Linux only).
+ERST
+
 DEF("nodefaults", 0, QEMU_OPTION_nodefaults, \
     "-nodefaults     don't create default devices\n", QEMU_ARCH_ALL)
 SRST
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 4eb9d1f..5668e2b 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -164,6 +164,7 @@ bool boot_strict;
 uint8_t *boot_splash_filedata;
 int only_migratable; /* turn it off unless user states otherwise */
 bool wakeup_suspend_enabled;
+bool memfd_alloc;
 
 int icount_align_option;
 
@@ -3635,6 +3636,9 @@ void qemu_init(int argc, char **argv, char **envp)
             case QEMU_OPTION_only_migratable:
                 only_migratable = 1;
                 break;
+            case QEMU_OPTION_memfd_alloc:
+                memfd_alloc = true;
+                break;
             case QEMU_OPTION_nodefaults:
                 has_defaults = 0;
                 break;
diff --git a/trace-events b/trace-events
index 42107eb..ed46c4d 100644
--- a/trace-events
+++ b/trace-events
@@ -54,6 +54,7 @@ find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_
 ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
 memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
 memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
+anon_memfd_alloc(const char *name, size_t size, void *ptr, int fd) "%s size %zu ptr %p fd %d"
 
 # memory.c
 memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 06/22] vl: add helper to request re-exec
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (4 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 05/22] vl: memfd-alloc option Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 07/22] cpr Steve Sistare
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add a qemu_exec_requested() hook that causes the main loop to exit and
re-exec qemu using the same initial arguments.  If /usr/bin/qemu-exec
exists, exec that instead.  This is an optional site-specific trampoline
that may alter the environment before exec'ing the qemu binary.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/sysemu/sysemu.h |  1 +
 softmmu/vl.c            | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 408eb56..2ab9f95 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -17,6 +17,7 @@ extern bool memfd_alloc;
 
 void qemu_add_data_dir(const char *path);
 
+void qemu_system_exec_request(void);
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5668e2b..d395e80 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -165,6 +165,7 @@ uint8_t *boot_splash_filedata;
 int only_migratable; /* turn it off unless user states otherwise */
 bool wakeup_suspend_enabled;
 bool memfd_alloc;
+static char **argv_main;
 
 int icount_align_option;
 
@@ -1294,6 +1295,7 @@ static ShutdownCause reset_requested;
 static ShutdownCause shutdown_requested;
 static int shutdown_signal;
 static pid_t shutdown_pid;
+static int exec_requested;
 static int powerdown_requested;
 static int debug_requested;
 static int suspend_requested;
@@ -1324,6 +1326,11 @@ static int qemu_shutdown_requested(void)
     return atomic_xchg(&shutdown_requested, SHUTDOWN_CAUSE_NONE);
 }
 
+static int qemu_exec_requested(void)
+{
+    return atomic_xchg(&exec_requested, 0);
+}
+
 static void qemu_kill_report(void)
 {
     if (!qtest_driver() && shutdown_signal) {
@@ -1570,6 +1577,13 @@ void qemu_system_shutdown_request(ShutdownCause reason)
     qemu_notify_event();
 }
 
+void qemu_system_exec_request(void)
+{
+    shutdown_requested = 1;
+    exec_requested = 1;
+    qemu_notify_event();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown();
@@ -1605,6 +1619,16 @@ void qemu_system_debug_request(void)
     qemu_notify_event();
 }
 
+static void qemu_exec(void)
+{
+    const char *helper = "/usr/bin/qemu-exec";
+    const char *bin = !access(helper, X_OK) ? helper : argv_main[0];
+
+    execvp(bin, argv_main);
+    error_report("execvp failed, errno %d.", errno);
+    exit(1);
+}
+
 static bool main_loop_should_exit(void)
 {
     RunState r;
@@ -1625,6 +1649,11 @@ static bool main_loop_should_exit(void)
     }
     request = qemu_shutdown_requested();
     if (request) {
+
+        if (qemu_exec_requested()) {
+            qemu_exec();
+            /* not reached */
+        }
         qemu_kill_report();
         qemu_system_shutdown(request);
         if (no_shutdown) {
@@ -2872,6 +2901,7 @@ void qemu_init(int argc, char **argv, char **envp)
 
     os_set_line_buffering();
 
+    argv_main = argv;
     error_init(argv[0]);
     module_call_init(MODULE_INIT_TRACE);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 07/22] cpr
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (5 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 06/22] vl: add helper to request re-exec Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 08/22] cpr: QMP interfaces Steve Sistare
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Provide the cprsave and cprload functions for live update.  These save and
restore VM state, with minimal guest pause time, so that qemu may be updated
to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports two
modes: restart and reboot.  For restart, cprsave exec's the qemu binary (or
/usr/bin/qemu-exec if it exists) with the same argv.  qemu restarts in a
paused state and waits for the cprload command.

To use the restart mode, qemu must be started with the memfd-alloc option.
The memfd's are saved to the environment and kept open across exec, after
which they are found from the environment and re-mmap'd.  Hence guest ram is
preserved in place, albeit with new virtual addresses in the qemu process.
The caller resumes the guest by calling cprload, which loads state from the
file.  If the VM was running at cprsave time, then VM execution resumes.
cprsave supports any type of guest image and block device, but the caller
must not modify guest block devices between cprsave and cprload.

For the reboot mode, cprsave saves state and exits qemu, and the caller is
allowed to update the host kernel and system software and reboot.  The
caller resumes the guest by running qemu with the same arguments as the
original process and calling cprload.  To use this mode, guest ram must be
mapped to a persistent shared memory file such as /dev/dax0.0 or /dev/shm
PKRAM.

The reboot mode supports vfio devices if the caller suspends the guest
instead of stopping the VM, such as by issuing guest-suspend-ram to the
qemu guest agent.  The guest drivers' suspend methods flush outstanding
requests and re-initialize the devices, and thus there is no device state
to save and restore.

The restart mode supports vfio devices in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 exec.c                  |   6 +-
 include/migration/cpr.h |  17 +++++
 include/sysemu/sysemu.h |   1 +
 migration/Makefile.objs |   2 +-
 migration/cpr.c         | 187 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.h      |   2 +
 softmmu/vl.c            |  21 +++++-
 7 files changed, 233 insertions(+), 3 deletions(-)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c

diff --git a/exec.c b/exec.c
index 6da6590..6a6e43d 100644
--- a/exec.c
+++ b/exec.c
@@ -68,6 +68,7 @@
 
 #include "qemu/pmem.h"
 #include "qemu/memfd.h"
+#include "qemu/env.h"
 
 #include "migration/vmstate.h"
 
@@ -2256,7 +2257,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
         } else {
             name = memory_region_name(new_block->mr);
             if (memfd_alloc) {
-                int mfd = -1;          /* placeholder until next patch */
+                int mfd = getenv_fd(name);
                 mr->align = QEMU_VMALLOC_ALIGN;
                 if (mfd < 0) {
                     mfd = qemu_memfd_create(name, maxlen + mr->align,
@@ -2264,7 +2265,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                     if (mfd < 0) {
                         return;
                     }
+                    setenv_fd(name, mfd);
                 }
+                qemu_clr_cloexec(mfd);
                 new_block->flags |= RAM_SHARED;
                 addr = file_ram_alloc(new_block, maxlen, mfd, false, errp);
                 trace_anon_memfd_alloc(name, maxlen, addr, mfd);
@@ -2521,6 +2524,7 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     qemu_mutex_lock_ramlist();
+    unsetenv_fd(memory_region_name(block->mr));
     QLIST_REMOVE_RCU(block, next);
     ram_list.mru_block = NULL;
     /* Write list before version */
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 0000000..42dec4e
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+#include "qapi/qapi-types-cpr.h"
+
+bool cpr_active(void);
+void cprsave(const char *file, CprMode mode, Error **errp);
+void cprload(const char *file, Error **errp);
+
+#endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 2ab9f95..f0017d4 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -14,6 +14,7 @@ extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
 extern bool memfd_alloc;
+extern int start_on_wake;
 
 void qemu_add_data_dir(const char *path);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 0fc619e..106b5fb 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,5 @@
 common-obj-y += migration.o socket.o fd.o exec.o
-common-obj-y += tls.o channel.o savevm.o
+common-obj-y += tls.o channel.o savevm.o cpr.o
 common-obj-y += colo.o colo-failover.o
 common-obj-y += vmstate.o vmstate-types.o page_cache.o
 common-obj-y += qemu-file.o global_state.o
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 0000000..a8f3c10
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/xen/xen.h"
+#include "monitor/monitor.h"
+#include "migration.h"
+#include "migration/snapshot.h"
+#include "chardev/char.h"
+#include "migration/misc.h"
+#include "migration/cpr.h"
+#include "migration/global_state.h"
+#include "qemu-file-channel.h"
+#include "qemu-file.h"
+#include "savevm.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "io/channel-buffer.h"
+#include "io/channel-file.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/xen.h"
+#include "sysemu/replay.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/virtio/vhost.h"
+#include "qemu/env.h"
+
+static int cpr_is_active;
+
+bool cpr_active(void)
+{
+    return cpr_is_active;
+}
+
+QEMUFile *qf_file_open(const char *path, int flags, int mode,
+                              const char *name, Error **errp)
+{
+    QIOChannelFile *fioc;
+    QIOChannel *ioc;
+    QEMUFile *f;
+
+    if (flags & O_RDWR) {
+        error_setg(errp, "qf_file_open %s: O_RDWR not supported", path);
+        return 0;
+    }
+
+    fioc = qio_channel_file_new_path(path, flags, mode, errp);
+    if (!fioc) {
+        return 0;
+    }
+
+    ioc = QIO_CHANNEL(fioc);
+    qio_channel_set_name(ioc, name);
+    f = (flags & O_WRONLY) ? qemu_fopen_channel_output(ioc) :
+                             qemu_fopen_channel_input(ioc);
+    object_unref(OBJECT(fioc));
+    return f;
+}
+
+static int preserve_fd(const char *name, const char *val, void *handle)
+{
+    qemu_clr_cloexec(atoi(val));
+    return 0;
+}
+
+void cprsave(const char *file, CprMode mode, Error **errp)
+{
+    int ret = 0;
+    QEMUFile *f;
+    int saved_vm_running = runstate_is_running();
+    bool restart = (mode == CPR_MODE_RESTART);
+    bool reboot = (mode == CPR_MODE_REBOOT);
+
+    if (reboot && qemu_ram_volatile(errp)) {
+        return;
+    }
+
+    if (restart && xen_enabled()) {
+        error_setg(errp, "xen does not support cprsave restart");
+        return;
+    }
+
+    if (migrate_colo_enabled()) {
+        error_setg(errp, "error: cprsave does not support x-colo");
+        return;
+    }
+
+    if (replay_mode != REPLAY_MODE_NONE) {
+        error_setg(errp, "error: cprsave does not support replay");
+        return;
+    }
+
+    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, "cprsave", errp);
+    if (!f) {
+        return;
+    }
+
+    ret = global_state_store();
+    if (ret) {
+        error_setg(errp, "Error saving global state");
+        qemu_fclose(f);
+        return;
+    }
+    if (runstate_check(RUN_STATE_SUSPENDED)) {
+        /* Update timers_state before saving.  Suspend did not so do. */
+        cpu_disable_ticks();
+    }
+    vm_stop(RUN_STATE_SAVE_VM);
+
+    cpr_is_active = true;
+    ret = qemu_save_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, QERR_IO_ERROR);
+        goto err;
+    }
+
+    if (ret < 0) {
+        if (!*errp) {
+            error_setg(errp, "qemu_savevm_state failed");
+        }
+        goto err;
+    }
+
+    if (reboot) {
+        no_shutdown = 0;
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    } else if (restart) {
+        walkenv(FD_PREFIX, preserve_fd, 0);
+        setenv("QEMU_START_FREEZE", "", 1);
+        qemu_system_exec_request();
+    }
+    goto done;
+
+err:
+    if (saved_vm_running) {
+        vm_start();
+    }
+done:
+    cpr_is_active = false;
+    return;
+}
+
+void cprload(const char *file, Error **errp)
+{
+    QEMUFile *f;
+    int ret;
+    RunState state;
+
+    if (runstate_is_running()) {
+        error_setg(errp, "cprload called for a running VM");
+        return;
+    }
+
+    f = qf_file_open(file, O_RDONLY, 0, "cprload", errp);
+    if (!f) {
+        return;
+    }
+
+    if (qemu_get_be32(f) != QEMU_VM_FILE_MAGIC ||
+        qemu_get_be32(f) != QEMU_VM_FILE_VERSION) {
+        error_setg(errp, "error: %s is not a vmstate file", file);
+        return;
+    }
+
+    ret = qemu_load_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while loading VM state", ret);
+        return;
+    }
+
+    state = global_state_get_runstate();
+    if (state == RUN_STATE_RUNNING) {
+        vm_start();
+    } else {
+        runstate_set(state);
+        if (runstate_check(RUN_STATE_SUSPENDED)) {
+            start_on_wake = 1;
+        }
+    }
+}
diff --git a/migration/savevm.h b/migration/savevm.h
index ba64a7e..7413254 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -64,5 +64,7 @@ int qemu_loadvm_state(QEMUFile *f);
 void qemu_loadvm_state_cleanup(void);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
+QEMUFile *qf_file_open(const char *path, int flags, int mode,
+                       const char *name, Error **errp);
 
 #endif
diff --git a/softmmu/vl.c b/softmmu/vl.c
index d395e80..9f2be5c 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -165,6 +165,7 @@ uint8_t *boot_splash_filedata;
 int only_migratable; /* turn it off unless user states otherwise */
 bool wakeup_suspend_enabled;
 bool memfd_alloc;
+int start_on_wake;
 static char **argv_main;
 
 int icount_align_option;
@@ -604,6 +605,8 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
     { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
@@ -1527,7 +1530,17 @@ void qemu_system_wakeup_request(WakeupReason reason, Error **errp)
     if (!(wakeup_reason_mask & (1 << reason))) {
         return;
     }
-    runstate_set(RUN_STATE_RUNNING);
+
+    /*
+     * Must call vm_start if it has never been called, to invoke the state
+     * change callbacks for the first time.
+     */
+    if (start_on_wake) {
+        start_on_wake = 0;
+        vm_start();
+    } else {
+        runstate_set(RUN_STATE_RUNNING);
+    }
     wakeup_reason = reason;
     qemu_notify_event();
 }
@@ -4510,6 +4523,12 @@ void qemu_init(int argc, char **argv, char **envp)
         exit(0);
     }
 
+    /* Equivalent to -S, but no need for parent to modify argv. */
+    if (getenv("QEMU_START_FREEZE")) {
+        unsetenv("QEMU_START_FREEZE");
+        autostart = 0;
+    }
+
     if (incoming) {
         Error *local_err = NULL;
         qemu_start_incoming_migration(incoming, &local_err);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 08/22] cpr: QMP interfaces
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (6 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 07/22] cpr Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 09/22] cpr: HMP interfaces Steve Sistare
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave calls cprsave().  Syntax:
  { 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }
  { 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }

cprload calls cprload().  Syntax:
  { 'command': 'cprload', 'data': { 'file': 'str' } }

cprinfo returns a list of supported modes.  Syntax:
  { 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
  { 'command': 'cprinfo', 'returns': 'CprInfo' }

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 monitor/qmp-cmds.c    | 31 +++++++++++++++++++++++
 qapi/Makefile.objs    |  3 ++-
 qapi/cpr.json         | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/qapi-schema.json |  1 +
 4 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 qapi/cpr.json

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 864cbfa..3778b04 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -35,9 +35,11 @@
 #include "qapi/qapi-commands-machine.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-commands-ui.h"
+#include "qapi/qapi-commands-cpr.h"
 #include "qapi/qmp/qerror.h"
 #include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "migration/cpr.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -161,6 +163,35 @@ void qmp_cont(Error **errp)
     }
 }
 
+CprInfo *qmp_cprinfo(Error **errp)
+{
+    CprInfo *cprinfo;
+    CprModeList *mode, *mode_list = NULL;
+    CprMode i;
+
+    cprinfo = g_malloc0(sizeof(*cprinfo));
+
+    for (i = 0; i < CPR_MODE__MAX; i++) {
+        mode = g_malloc0(sizeof(*mode));
+        mode->value = i;
+        mode->next = mode_list;
+        mode_list = mode;
+    }
+
+    cprinfo->modes = mode_list;
+    return cprinfo;
+}
+
+void qmp_cprsave(const char *file, CprMode mode, Error **errp)
+{
+    cprsave(file, mode, errp);
+}
+
+void qmp_cprload(const char *file, Error **errp)
+{
+    cprload(file, errp);
+}
+
 void qmp_system_wakeup(Error **errp)
 {
     if (!qemu_wakeup_suspend_enabled()) {
diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs
index 4673ab7..099b325 100644
--- a/qapi/Makefile.objs
+++ b/qapi/Makefile.objs
@@ -5,7 +5,8 @@ util-obj-y += opts-visitor.o qapi-clone-visitor.o
 util-obj-y += qmp-event.o
 util-obj-y += qapi-util.o
 
-QAPI_COMMON_MODULES = audio authz block-core block char common control crypto
+QAPI_COMMON_MODULES = audio authz block-core block char
+QAPI_COMMON_MODULES += common cpr control crypto
 QAPI_COMMON_MODULES += dump error introspect job machine migration misc
 QAPI_COMMON_MODULES += net pragma qdev qom rdma rocker run-state sockets tpm
 QAPI_COMMON_MODULES += trace transaction ui
diff --git a/qapi/cpr.json b/qapi/cpr.json
new file mode 100644
index 0000000..588749f
--- /dev/null
+++ b/qapi/cpr.json
@@ -0,0 +1,68 @@
+# -*- Mode: Python -*-
+#
+# Copyright (c) 2021 Oracle and/or its affiliates.
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+##
+# = CPR
+##
+
+{ 'include': 'common.json' }
+
+##
+# @CprMode:
+#
+# @reboot: checkpoint can be cprload'ed after a host kexec reboot.
+#
+# @restart: checkpoint can be cprload'ed after restarting qemu.
+#
+# Since 5.3
+##
+{ 'enum': 'CprMode', 'data': [ 'reboot', 'restart' ] }
+
+
+##
+# @CprInfo:
+#
+# @modes: @CprMode list
+#
+# Since 5.3
+##
+{ 'struct': 'CprInfo', 'data': { 'modes': [ 'CprMode' ] } }
+
+##
+# @cprinfo:
+#
+# Returns: @CprInfo
+#
+# Since 5.3
+##
+{ 'command': 'cprinfo', 'returns': 'CprInfo' }
+
+##
+# @cprsave:
+#
+# Create a checkpoint of the virtual machine device state in @file.
+# Guest RAM and guest block device blocks are not saved.
+#
+# @file: name of checkpoint file
+# @mode: @CprMode mode
+#
+# Since 5.3
+##
+{ 'command': 'cprsave', 'data': { 'file': 'str', 'mode': 'CprMode' } }
+
+##
+# @cprload:
+#
+# Start virtual machine from checkpoint file that was created earlier using
+# the cprsave command.
+#
+# @file: name of checkpoint file
+#
+# Since 5.3
+##
+{ 'command': 'cprload', 'data': { 'file': 'str' } }
+
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index f03ff91..0d47ab6 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -74,6 +74,7 @@
 { 'include': 'ui.json' }
 { 'include': 'authz.json' }
 { 'include': 'migration.json' }
+{ 'include': 'cpr.json' }
 { 'include': 'transaction.json' }
 { 'include': 'trace.json' }
 { 'include': 'control.json' }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 09/22] cpr: HMP interfaces
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (7 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 08/22] cpr: QMP interfaces Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 10/22] pci: export functions for cpr Steve Sistare
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

cprsave <file> <mode>
  Call cprsave().
  Arguments:
    file : save vmstate to this file name
    mode: "reboot" or "restart"

cprload <file>
  Call cprload().
  Arguments:
    file : load vmstate from this file name

cprinfo
  Print to stdout a space-delimited list of modes supported by cprsave.
  Arguments: none

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hmp-commands.hx       | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/monitor/hmp.h |  3 +++
 monitor/hmp-cmds.c    | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 60f395c..8577850 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -354,6 +354,50 @@ SRST
 ERST
 
     {
+        .name       = "cprinfo",
+        .args_type  = "",
+        .params     = "",
+        .help       = "return list of modes supported by cprsave",
+        .cmd        = hmp_cprinfo,
+    },
+
+SRST
+``cprinfo``
+    Return a space-delimited list of modes supported by cprsave.
+ERST
+
+    {
+        .name       = "cprsave",
+        .args_type  = "file:s,mode:s",
+        .params     = "file 'restart'|'reboot'",
+        .help       = "create a checkpoint of the VM in file",
+        .cmd        = hmp_cprsave,
+    },
+
+SRST
+``cprsave`` *file* *mode*
+    Create a checkpoint of the whole virtual machine and save it in *file*.
+    If *mode* is 'reboot', the checkpoint remains valid after a host kexec
+    reboot.  Guest ram must be backed by persistant shared memory.
+    If *mode* is 'restart', pause the VCPUs, exec /usr/bin/qemu-exec if it
+    exists, else exec argv[0], passing all the original command line arguments.
+    Guest ram must be allocated with the memfd-anon option.
+ERST
+
+    {
+        .name       = "cprload",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "load VM checkpoint from file",
+        .cmd        = hmp_cprload,
+    },
+
+SRST
+``cprload`` *file*
+Load a virtual machine from checkpoint file *file* and continue VCPUs.
+ERST
+
+    {
         .name       = "delvm",
         .args_type  = "name:s",
         .params     = "tag",
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index c986cfd..919b9a9 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -59,6 +59,9 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_loadvm(Monitor *mon, const QDict *qdict);
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
+void hmp_cprinfo(Monitor *mon, const QDict *qdict);
+void hmp_cprsave(Monitor *mon, const QDict *qdict);
+void hmp_cprload(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ae4b6a4..e64b754 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -32,6 +32,7 @@
 #include "qapi/qapi-commands-block.h"
 #include "qapi/qapi-commands-char.h"
 #include "qapi/qapi-commands-control.h"
+#include "qapi/qapi-commands-cpr.h"
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-commands-net.h"
@@ -1139,6 +1140,53 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
     qapi_free_AnnounceParameters(params);
 }
 
+void hmp_cprinfo(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    CprInfo *cprinfo;
+    CprModeList *mode;
+
+    cprinfo = qmp_cprinfo(&err);
+    if (err) {
+        goto out;
+    }
+
+    for (mode = cprinfo->modes; mode; mode = mode->next) {
+        monitor_printf(mon, "%s ", CprMode_str(mode->value));
+    }
+
+out:
+    hmp_handle_error(mon, err);
+    qapi_free_CprInfo(cprinfo);
+}
+
+void hmp_cprsave(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *mode;
+    int val;
+
+    mode = qdict_get_try_str(qdict, "mode");
+    val = qapi_enum_parse(&CprMode_lookup, mode, -1, &err);
+
+    if (val == -1) {
+        goto out;
+    }
+
+    qmp_cprsave(qdict_get_try_str(qdict, "file"), val, &err);
+
+out:
+    hmp_handle_error(mon, err);
+}
+
+void hmp_cprload(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_cprload(qdict_get_try_str(qdict, "file"), &err);
+    hmp_handle_error(mon, err);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 10/22] pci: export functions for cpr
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (8 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 09/22] cpr: HMP interfaces Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:41 ` [PATCH V2 11/22] vfio-pci: refactor " Steve Sistare
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Export msix_is_pending and msix_init_vector_notifiers for use by cpr.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/pci/msix.c         | 20 ++++++++++++++------
 hw/pci/pci.c          |  3 +--
 include/hw/pci/msix.h |  5 +++++
 include/hw/pci/pci.h  |  1 +
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 67e34f3..52c8949 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -64,7 +64,7 @@ static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
     return dev->msix_pba + vector / 8;
 }
 
-static int msix_is_pending(PCIDevice *dev, int vector)
+int msix_is_pending(PCIDevice *dev, unsigned int vector)
 {
     return *msix_pending_byte(dev, vector) & msix_pending_mask(vector);
 }
@@ -576,6 +576,17 @@ static void msix_unset_notifier_for_vector(PCIDevice *dev, unsigned int vector)
     dev->msix_vector_release_notifier(dev, vector);
 }
 
+void msix_init_vector_notifiers(PCIDevice *dev,
+                                MSIVectorUseNotifier use_notifier,
+                                MSIVectorReleaseNotifier release_notifier,
+                                MSIVectorPollNotifier poll_notifier)
+{
+    assert(use_notifier && release_notifier);
+    dev->msix_vector_use_notifier = use_notifier;
+    dev->msix_vector_release_notifier = release_notifier;
+    dev->msix_vector_poll_notifier = poll_notifier;
+}
+
 int msix_set_vector_notifiers(PCIDevice *dev,
                               MSIVectorUseNotifier use_notifier,
                               MSIVectorReleaseNotifier release_notifier,
@@ -583,11 +594,8 @@ int msix_set_vector_notifiers(PCIDevice *dev,
 {
     int vector, ret;
 
-    assert(use_notifier && release_notifier);
-
-    dev->msix_vector_use_notifier = use_notifier;
-    dev->msix_vector_release_notifier = release_notifier;
-    dev->msix_vector_poll_notifier = poll_notifier;
+    msix_init_vector_notifiers(dev, use_notifier, release_notifier,
+                               poll_notifier);
 
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index de0fae1..7343e00 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -216,7 +216,6 @@ static const TypeInfo pcie_bus_info = {
 };
 
 static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num);
-static void pci_update_mappings(PCIDevice *d);
 static void pci_irq_handler(void *opaque, int irq_num, int level);
 static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, Error **);
 static void pci_del_option_rom(PCIDevice *pdev);
@@ -1316,7 +1315,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
     return new_addr;
 }
 
-static void pci_update_mappings(PCIDevice *d)
+void pci_update_mappings(PCIDevice *d)
 {
     PCIIORegion *r;
     int i;
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 4c4a60c..46606cf 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -32,6 +32,7 @@ int msix_present(PCIDevice *dev);
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
 void msix_clr_pending(PCIDevice *dev, int vector);
+int msix_is_pending(PCIDevice *dev, unsigned vector);
 
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
@@ -41,6 +42,10 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
+void msix_init_vector_notifiers(PCIDevice *dev,
+                                MSIVectorUseNotifier use_notifier,
+                                MSIVectorReleaseNotifier release_notifier,
+                                MSIVectorPollNotifier poll_notifier);
 int msix_set_vector_notifiers(PCIDevice *dev,
                               MSIVectorUseNotifier use_notifier,
                               MSIVectorReleaseNotifier release_notifier,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c1bf7d5..bd07c86 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -865,5 +865,6 @@ extern const VMStateDescription vmstate_pci_device;
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+void pci_update_mappings(PCIDevice *d);
 
 #endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 11/22] vfio-pci: refactor for cpr
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (9 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 10/22] pci: export functions for cpr Steve Sistare
@ 2021-01-05 15:41 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 12/22] vfio-pci: cpr Steve Sistare
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Export vfio_address_spaces and vfio_listener_skipped_section.
Add optional eventfd arg to vfio_add_kvm_msi_virq.
Refactor vector use into a helper vfio_vector_init.
All for use by cpr in a subsequent patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/common.c              |  4 ++--
 hw/vfio/pci.c                 | 36 +++++++++++++++++++++++++-----------
 include/hw/vfio/vfio-common.h |  3 +++
 3 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3335714..7f8768d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,7 +40,7 @@
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
+VFIOAddressSpaceList vfio_address_spaces =
     QLIST_HEAD_INITIALIZER(vfio_address_spaces);
 
 #ifdef CONFIG_KVM
@@ -393,7 +393,7 @@ static int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
     return -1;
 }
 
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
             !memory_region_is_iommu(section->mr)) ||
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2e561c0..9b57ffa 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -412,7 +412,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 }
 
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
-                                  int vector_n, bool msix)
+                                  int vector_n, bool msix, int eventfd)
 {
     int virq;
 
@@ -420,7 +420,9 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
         return;
     }
 
-    if (event_notifier_init(&vector->kvm_interrupt, 0)) {
+    if (eventfd >= 0) {
+        event_notifier_init_fd(&vector->kvm_interrupt, eventfd);
+    } else if (event_notifier_init(&vector->kvm_interrupt, 0)) {
         return;
     }
 
@@ -456,6 +458,22 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
     kvm_irqchip_commit_routes(kvm_state);
 }
 
+static void vfio_vector_init(VFIOPCIDevice *vdev, int nr, int eventfd)
+{
+    VFIOMSIVector *vector = &vdev->msi_vectors[nr];
+    PCIDevice *pdev = &vdev->pdev;
+
+    vector->vdev = vdev;
+    vector->virq = -1;
+    if (eventfd >= 0) {
+        event_notifier_init_fd(&vector->interrupt, eventfd);
+    } else if (event_notifier_init(&vector->interrupt, 0)) {
+        error_report("vfio: Error: event_notifier_init failed");
+    }
+    vector->use = true;
+    msix_vector_use(pdev, nr);
+}
+
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
@@ -467,14 +485,10 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
 
     vector = &vdev->msi_vectors[nr];
 
+    vfio_vector_init(vdev, nr, -1);
+
     if (!vector->use) {
-        vector->vdev = vdev;
-        vector->virq = -1;
-        if (event_notifier_init(&vector->interrupt, 0)) {
-            error_report("vfio: Error: event_notifier_init failed");
-        }
-        vector->use = true;
-        msix_vector_use(pdev, nr);
+        vfio_vector_init(vdev, nr, -1);
     }
 
     qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
@@ -492,7 +506,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
         }
     } else {
         if (msg) {
-            vfio_add_kvm_msi_virq(vdev, vector, nr, true);
+            vfio_add_kvm_msi_virq(vdev, vector, nr, true, -1);
         }
     }
 
@@ -628,7 +642,7 @@ retry:
          * Attempt to enable route through KVM irqchip,
          * default to userspace handling if unavailable.
          */
-        vfio_add_kvm_msi_virq(vdev, vector, i, false);
+        vfio_add_kvm_msi_virq(vdev, vector, i, false, -1);
     }
 
     /* Set interrupt type prior to possible interrupts */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c78f3ff..ab87df4 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -182,6 +182,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
+typedef QLIST_HEAD(, VFIOAddressSpace) VFIOAddressSpaceList;
+extern VFIOAddressSpaceList vfio_address_spaces;
 
 #ifdef CONFIG_LINUX
 int vfio_get_region_info(VFIODevice *vbasedev, int index,
@@ -193,6 +195,7 @@ struct vfio_info_cap_header *
 vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id);
 #endif
 extern const MemoryListener vfio_prereg_listener;
+bool vfio_listener_skipped_section(MemoryRegionSection *section);
 
 int vfio_spapr_create_window(VFIOContainer *container,
                              MemoryRegionSection *section,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 12/22] vfio-pci: cpr
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (10 preceding siblings ...)
  2021-01-05 15:41 ` [PATCH V2 11/22] vfio-pci: refactor " Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 13/22] vhost: reset vhost devices upon cprsave Steve Sistare
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Enable vfio-pci devices to be saved and restored across an exec restart
of qemu.

At vfio creation time, save the value of vfio container, group, and device
descriptors in the environment.

In cprsave, suspend the use of virtual addresses in DMA mappings with
VFIO_DMA_UNMAP_FLAG_SUSPEND, because guest ram will be remapped at a
different VA after exec.  DMA to already-mapped pages continues.  Save
the msi message area as part of vfio-pci vmstate, save the interrupt and
notifier eventfd's in the environment, and clear the close-on-exec flag
for the vfio descriptors.  The flag is not cleared earlier because the
descriptors should not persist across miscellaneous fork and exec calls
that may be performed during normal operation.

On qemu restart, vfio_realize() finds the descriptor env vars, uses
the descriptors, and notes that the device is being reused.  Device and
iommu state is already configured, so operations in vfio_realize that
would modify the configuration are skipped for a reused device, including
vfio ioctl's and writes to PCI configuration space.  The result is that
vfio_realize constructs qemu data structures that reflect the current
state of the device.  However, the reconstruction is not complete until
cprload is called. cprload loads the msi data and finds eventfds in the
environment.  It rebuilds vector data structures and attaches the
interrupts to the new KVM instance.  cprload then walks the flattened
ranges of the vfio_address_spaces and calls VFIO_DMA_MAP_FLAG_RESUME to
inform the kernel of the new VA's.  Lastly, it starts the VM and suppresses
vfio device reset.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/pci/pci.c                  |   4 +
 hw/vfio/Makefile.objs         |   2 +-
 hw/vfio/common.c              |  59 +++++++++++++-
 hw/vfio/cpr.c                 | 117 ++++++++++++++++++++++++++++
 hw/vfio/pci.c                 | 173 ++++++++++++++++++++++++++++++++++++++++--
 hw/vfio/trace-events          |   1 +
 include/hw/pci/pci.h          |   1 +
 include/hw/vfio/vfio-common.h |   4 +
 linux-headers/linux/vfio.h    |   5 ++
 migration/cpr.c               |   4 +
 10 files changed, 360 insertions(+), 10 deletions(-)
 create mode 100644 hw/vfio/cpr.c

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 7343e00..c2e1509 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -291,6 +291,10 @@ static void pci_do_device_reset(PCIDevice *dev)
 {
     int r;
 
+    if (dev->reused) {
+        return;
+    }
+
     pci_device_deassert_intx(dev);
     assert(dev->irq_state == 0);
 
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 9bb1c09..8a8e0da 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += common.o spapr.o
+obj-y += common.o spapr.o cpr.o
 obj-$(CONFIG_VFIO_PCI) += pci.o pci-quirks.o display.o
 obj-$(CONFIG_VFIO_CCW) += ccw.o
 obj-$(CONFIG_VFIO_PLATFORM) += platform.o
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7f8768d..986e111 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -37,6 +37,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qemu/env.h"
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -299,6 +300,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
         .size = size,
     };
 
+    if (container->reused) {
+        return 0;
+    }
+
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         /*
          * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -322,6 +327,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
+    if (unmap.size != size) {
+        warn_report("VFIO_UNMAP_DMA(0x%lx, 0x%lx) only unmaps 0x%llx",
+                     iova, size, unmap.size);
+    }
+
     return 0;
 }
 
@@ -336,6 +346,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
         .size = size,
     };
 
+    if (container->reused) {
+        return 0;
+    }
+
     if (!readonly) {
         map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
     }
@@ -1178,6 +1192,10 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
     if (iommu_type < 0) {
         return iommu_type;
     }
+    if (container->reused) {
+        container->iommu_type = iommu_type;
+        return 0;
+    }
 
     ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd);
     if (ret) {
@@ -1209,6 +1227,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 {
     VFIOContainer *container;
     int ret, fd;
+    bool reused;
+    char name[40];
     VFIOAddressSpace *space;
 
     space = vfio_get_address_space(as);
@@ -1245,16 +1265,29 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         return ret;
     }
 
+    snprintf(name, sizeof(name), "vfio_container_%d", group->groupid);
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+
     QLIST_FOREACH(container, &space->containers, next) {
+        if (fd >= 0 && container->fd == fd) {
+            group->container = container;
+            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+            return 0;
+        }
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
             group->container = container;
             QLIST_INSERT_HEAD(&container->group_list, group, container_next);
             vfio_kvm_device_add_group(group);
+            setenv_fd(name, container->fd);
             return 0;
         }
     }
 
-    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    if (fd < 0) {
+        fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    }
+
     if (fd < 0) {
         error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio");
         ret = -errno;
@@ -1272,6 +1305,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->space = space;
     container->fd = fd;
+    container->reused = reused;
     container->error = NULL;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->hostwin_list);
@@ -1394,6 +1428,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     }
 
     container->initialized = true;
+    setenv_fd(name, fd);
 
     return 0;
 listener_release_exit:
@@ -1421,6 +1456,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
+    unsetenv_fdv("vfio_container_%d", group->groupid);
 
     /*
      * Explicitly release the listener first before unset container,
@@ -1479,7 +1515,12 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     group = g_malloc0(sizeof(*group));
 
     snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open(path, O_RDWR);
+
+    group->fd = getenv_fd(path);
+    if (group->fd < 0) {
+        group->fd = qemu_open(path, O_RDWR);
+    }
+
     if (group->fd < 0) {
         error_setg_errno(errp, errno, "failed to open %s", path);
         goto free_group_exit;
@@ -1513,6 +1554,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 
     QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
+    setenv_fd(path, group->fd);
+
     return group;
 
 close_fd_exit:
@@ -1537,6 +1580,7 @@ void vfio_put_group(VFIOGroup *group)
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
     trace_vfio_put_group(group->fd);
+    unsetenv_fdv("/dev/vfio/%d", group->groupid);
     close(group->fd);
     g_free(group);
 
@@ -1550,8 +1594,14 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     int ret, fd;
+    bool reused;
+
+    fd = getenv_fd(name);
+    reused = (fd >= 0);
+    if (fd < 0) {
+        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    }
 
-    fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
     if (fd < 0) {
         error_setg_errno(errp, errno, "error getting device from group %d",
                          group->groupid);
@@ -1596,6 +1646,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
     vbasedev->num_irqs = dev_info.num_irqs;
     vbasedev->num_regions = dev_info.num_regions;
     vbasedev->flags = dev_info.flags;
+    vbasedev->reused = reused;
+    setenv_fd(name, fd);
 
     trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
                           dev_info.num_irqs);
@@ -1612,6 +1664,7 @@ void vfio_put_base_device(VFIODevice *vbasedev)
     QLIST_REMOVE(vbasedev, next);
     vbasedev->group = NULL;
     trace_vfio_put_base_device(vbasedev->fd);
+    unsetenv_fd(vbasedev->name);
     close(vbasedev->fd);
 }
 
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
new file mode 100644
index 0000000..565312d
--- /dev/null
+++ b/hw/vfio/cpr.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+#include "hw/vfio/vfio-common.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+static int
+vfio_dma_suspend(VFIOContainer *container, hwaddr iova, ram_addr_t size)
+{
+    int ret = 0;
+    struct vfio_iommu_type1_dma_unmap unmap = {
+        .argsz = sizeof(unmap),
+        .flags = VFIO_DMA_UNMAP_FLAG_SUSPEND,
+        .iova = iova,
+        .size = size,
+    };
+    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+        ret = -errno;
+        error_report("vfio_dma_suspend(iova %lu, size %ld) error %d",
+                      iova, size, -errno);
+    }
+    return ret;
+}
+
+static int
+vfio_dma_resume(VFIOContainer *container, hwaddr iova, ram_addr_t size,
+               void *vaddr)
+{
+    int ret = 0;
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_RESUME,
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .iova = iova,
+        .size = size,
+    };
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
+        ret = -errno;
+        error_report("vfio_dma_resume(iova %lu, size %ld, va %p) error %d",
+                      iova, size, vaddr, -errno);
+    }
+    return ret;
+}
+
+static int vfio_region_resume(MemoryRegionSection *section, void *handle)
+{
+    MemoryRegion *mr = section->mr;
+    VFIOContainer *container = handle;
+    const char *name = memory_region_name(mr);
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr offset, iova, roundup;
+    void *vaddr;
+
+    if (vfio_listener_skipped_section(section) || memory_region_is_iommu(mr)) {
+        return 0;
+    }
+
+    offset = section->offset_within_address_space;
+    iova = TARGET_PAGE_ALIGN(offset);
+    roundup = iova - offset;
+    size = (size - roundup) & TARGET_PAGE_MASK;
+    vaddr = memory_region_get_ram_ptr(mr) +
+            section->offset_within_region + roundup;
+
+    trace_vfio_region_resume(name, container->fd, iova, iova + size - 1, vaddr);
+    return vfio_dma_resume(container, iova, size, vaddr);
+}
+
+int vfio_cprsave(void)
+{
+    VFIOAddressSpace *space;
+    VFIOContainer *container;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_SUSPEND)) {
+                error_report("error: IOMMU does not support VFIO_SUSPEND.");
+                return -1;
+            }
+            if (vfio_dma_suspend(container, 0, 0)) {
+                return 1;
+            }
+        }
+    }
+    return 0;
+}
+
+int vfio_cprload(void)
+{
+    VFIOAddressSpace *space;
+    VFIOContainer *container;
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        QLIST_FOREACH(container, &space->containers, next) {
+            container->reused = false;
+            if (as_flat_walk(space->as, vfio_region_resume, container)) {
+                return 1;
+            }
+        }
+    }
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->reused = false;
+        }
+    }
+    return 0;
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9b57ffa..042c52e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -28,6 +28,8 @@
 #include "hw/pci/pci_bridge.h"
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
+#include "migration/cpr.h"
+#include "qemu/env.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -1599,6 +1601,14 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
     }
 }
 
+static void vfio_config_sync(VFIOPCIDevice *vdev, uint32_t offset, size_t len)
+{
+    if (pread(vdev->vbasedev.fd, vdev->pdev.config + offset, len,
+          vdev->config_offset + offset) != len) {
+        error_report("vfio_config_sync pread failed");
+    }
+}
+
 static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
@@ -1639,6 +1649,7 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
 static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
+    PCIDevice *pdev = &vdev->pdev;
     char *name;
 
     if (!bar->size) {
@@ -1659,7 +1670,10 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
         }
     }
 
-    pci_register_bar(&vdev->pdev, nr, bar->type, bar->mr);
+    pci_register_bar(pdev, nr, bar->type, bar->mr);
+    if (pdev->reused) {
+        vfio_config_sync(vdev, pci_bar(pdev, nr), 8);
+    }
 }
 
 static void vfio_bars_register(VFIOPCIDevice *vdev)
@@ -2576,6 +2590,27 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
     vfio_put_base_device(&vdev->vbasedev);
 }
 
+static void setenv_event_fd(VFIOPCIDevice *vdev, int nr, const char *name,
+                            EventNotifier *ev)
+{
+    char envname[256];
+    int fd = event_notifier_get_fd(ev);
+    const char *vfname = vdev->vbasedev.name;
+
+    if (fd >= 0) {
+        snprintf(envname, sizeof(envname), "%s_%s_%d", vfname, name, nr);
+        setenv_fd(envname, fd);
+    }
+}
+
+static int getenv_event_fd(VFIOPCIDevice *vdev, int nr, const char *name)
+{
+    char envname[256];
+    const char *vfname = vdev->vbasedev.name;
+    snprintf(envname, sizeof(envname), "%s_%s_%d", vfname, name, nr);
+    return getenv_fd(envname);
+}
+
 static void vfio_err_notifier_handler(void *opaque)
 {
     VFIOPCIDevice *vdev = opaque;
@@ -2607,7 +2642,13 @@ static void vfio_err_notifier_handler(void *opaque)
 static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
 {
     Error *err = NULL;
-    int32_t fd;
+    int32_t fd = getenv_event_fd(vdev, 0, "err");
+
+    if (fd >= 0) {
+        event_notifier_init_fd(&vdev->err_notifier, fd);
+        qemu_set_fd_handler(fd, vfio_err_notifier_handler, NULL, vdev);
+        return;
+    }
 
     if (!vdev->pci_aer) {
         return;
@@ -2668,7 +2709,14 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
     struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
                                       .index = VFIO_PCI_REQ_IRQ_INDEX };
     Error *err = NULL;
-    int32_t fd;
+    int32_t fd = getenv_event_fd(vdev, 0, "req");
+
+    if (fd >= 0) {
+        event_notifier_init_fd(&vdev->req_notifier, fd);
+        qemu_set_fd_handler(fd, vfio_req_notifier_handler, NULL, vdev);
+        vdev->req_enabled = true;
+        return;
+    }
 
     if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
         return;
@@ -2824,6 +2872,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         vfio_put_group(group);
         goto error;
     }
+    pdev->reused = vdev->vbasedev.reused;
 
     vfio_populate_device(vdev, &err);
     if (err) {
@@ -2986,9 +3035,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                              vfio_intx_routing_notifier);
         vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
         kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
-        ret = vfio_intx_enable(vdev, errp);
-        if (ret) {
-            goto out_deregister;
+        if (!pdev->reused) {
+            ret = vfio_intx_enable(vdev, errp);
+            if (ret) {
+                goto out_deregister;
+            }
         }
     }
 
@@ -3031,6 +3082,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    vfio_config_sync(vdev, pdev->msix_cap + PCI_MSIX_FLAGS, 2);
+    if (pdev->reused) {
+        pci_update_mappings(pdev);
+    }
+
     return;
 
 out_deregister:
@@ -3094,6 +3150,10 @@ static void vfio_pci_reset(DeviceState *dev)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(dev);
 
+    if (vdev->pdev.reused) {
+        return;
+    }
+
     trace_vfio_pci_reset(vdev->vbasedev.name);
 
     vfio_pci_pre_reset(vdev);
@@ -3196,6 +3256,106 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static int vfio_pci_pre_save(void *opaque)
+{
+    VFIOPCIDevice *vdev = opaque;
+    int i;
+
+    for (i = 0; i < vdev->nr_vectors; i++) {
+        VFIOMSIVector *vector = &vdev->msi_vectors[i];
+        if (vector->use) {
+            setenv_event_fd(vdev, i, "interrupt", &vector->interrupt);
+            if (vector->virq >= 0) {
+                setenv_event_fd(vdev, i, "kvm_interrupt",
+                                &vector->kvm_interrupt);
+            }
+        }
+    }
+    setenv_event_fd(vdev, 0, "err", &vdev->err_notifier);
+    setenv_event_fd(vdev, 0, "req", &vdev->req_notifier);
+    return 0;
+}
+
+static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix)
+{
+    int i, fd;
+    bool pending = false;
+    PCIDevice *pdev = &vdev->pdev;
+
+    vdev->nr_vectors = nr_vectors;
+    vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
+    vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
+
+    for (i = 0; i < nr_vectors; i++) {
+        VFIOMSIVector *vector = &vdev->msi_vectors[i];
+
+        fd = getenv_event_fd(vdev, i, "interrupt");
+        if (fd >= 0) {
+            vfio_vector_init(vdev, i, fd);
+            qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
+        }
+
+        fd = getenv_event_fd(vdev, i, "kvm_interrupt");
+        if (fd >= 0) {
+            vfio_add_kvm_msi_virq(vdev, vector, i, msix, fd);
+        }
+
+        if (msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
+            set_bit(i, vdev->msix->pending);
+            pending = true;
+        }
+    }
+
+    memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
+}
+
+static int vfio_pci_post_load(void *opaque, int version_id)
+{
+    VFIOPCIDevice *vdev = opaque;
+    PCIDevice *pdev = &vdev->pdev;
+    int nr_vectors;
+    bool enabled;
+
+    if (msix_enabled(pdev)) {
+        nr_vectors = vdev->msix->entries;
+        vfio_claim_vectors(vdev, nr_vectors, true);
+        msix_init_vector_notifiers(pdev, vfio_msix_vector_use,
+                                   vfio_msix_vector_release, NULL);
+
+    } else if (msi_enabled(pdev)) {
+        nr_vectors = msi_nr_vectors_allocated(pdev);
+        vfio_claim_vectors(vdev, nr_vectors, false);
+
+    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+        error_report("vfio_pci_post_load does not yet support INTX"); /* TBD */
+    }
+
+    pdev->reused = false;
+    enabled = pci_get_word(pdev->config + PCI_COMMAND) & PCI_COMMAND_MASTER;
+    memory_region_set_enabled(&pdev->bus_master_enable_region, enabled);
+
+    return 0;
+}
+
+static bool vfio_pci_needed(void *opaque)
+{
+    return cpr_active();
+}
+
+static const VMStateDescription vfio_pci_vmstate = {
+    .name = "vfio-pci",
+    .unmigratable = 1,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .post_load = vfio_pci_post_load,
+    .pre_save = vfio_pci_pre_save,
+    .needed = vfio_pci_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_MSIX(pdev, VFIOPCIDevice),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3203,6 +3363,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 
     dc->reset = vfio_pci_reset;
     device_class_set_props(dc, vfio_pci_dev_properties);
+    dc->vmsd = &vfio_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b1ef55a..1ac7f99 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -115,6 +115,7 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_region_resume(const char *name, int fd, uint64_t iova_start, uint64_t iova_end, void *vaddr) "%s fd %d 0x%"PRIx64" - 0x%"PRIx64" [%p]"
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index bd07c86..c926a24 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -358,6 +358,7 @@ struct PCIDevice {
 
     /* ID of standby device in net_failover pair */
     char *failover_pair_id;
+    bool reused;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ab87df4..ac13fe0 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -73,6 +73,7 @@ typedef struct VFIOContainer {
     unsigned iommu_type;
     Error *error;
     bool initialized;
+    bool reused;
     unsigned long pgsizes;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
@@ -109,6 +110,7 @@ typedef struct VFIODevice {
     bool needs_reset;
     bool no_mmap;
     bool ram_block_discard_allowed;
+    bool reused;
     VFIODeviceOps *ops;
     unsigned int num_irqs;
     unsigned int num_regions;
@@ -178,6 +180,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
+int vfio_cprsave(void);
+int vfio_cprload(void);
 
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index f09df26..9563672 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -46,6 +46,9 @@
  */
 #define VFIO_NOIOMMU_IOMMU		8
 
+/* Supports VFIO DMA suspend and resume */
+#define VFIO_SUSPEND                    9
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
@@ -1052,6 +1055,7 @@ struct vfio_iommu_type1_dma_map {
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+#define VFIO_DMA_MAP_FLAG_RESUME (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -1088,6 +1092,7 @@ struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
+#define VFIO_DMA_UNMAP_FLAG_SUSPEND (1 << 1)
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
 	__u8    data[];
diff --git a/migration/cpr.c b/migration/cpr.c
index a8f3c10..045ebc5 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -131,6 +131,9 @@ void cprsave(const char *file, CprMode mode, Error **errp)
         no_shutdown = 0;
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     } else if (restart) {
+        if (vfio_cprsave()) {
+            goto err;
+        }
         walkenv(FD_PREFIX, preserve_fd, 0);
         setenv("QEMU_START_FREEZE", "", 1);
         qemu_system_exec_request();
@@ -174,6 +177,7 @@ void cprload(const char *file, Error **errp)
         error_setg(errp, "Error %d while loading VM state", ret);
         return;
     }
+    vfio_cprload();
 
     state = global_state_get_runstate();
     if (state == RUN_STATE_RUNNING) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 13/22] vhost: reset vhost devices upon cprsave
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (11 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 12/22] vfio-pci: cpr Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 14/22] chardev: cpr framework Steve Sistare
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

A vhost device is implicitly preserved across re-exec because its fd is not
closed, and the value of the fd is specified on the command line for the
new qemu to find.  However, new qemu issues an VHOST_RESET_OWNER ioctl,
which fails because the device already has an owner.  To fix, reset the
owner prior to exec.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/virtio/vhost.c         | 11 +++++++++++
 include/hw/virtio/vhost.h |  1 +
 migration/cpr.c           |  1 +
 3 files changed, 13 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 1a1384e..42aa44c 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1764,6 +1764,17 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
     hdev->vdev = NULL;
 }
 
+void vhost_dev_reset_all(void)
+{
+    struct vhost_dev *dev;
+
+    QLIST_FOREACH(dev, &vhost_devices, entry) {
+        if (dev->vhost_ops->vhost_reset_device(dev) < 0) {
+            VHOST_OPS_DEBUG("vhost_reset_device failed");
+        }
+    }
+}
+
 int vhost_net_set_backend(struct vhost_dev *hdev,
                           struct vhost_vring_file *file)
 {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 767a95e..5fef8bd 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -105,6 +105,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+void vhost_dev_reset_all(void);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
diff --git a/migration/cpr.c b/migration/cpr.c
index 045ebc5..13c5d7c 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -135,6 +135,7 @@ void cprsave(const char *file, CprMode mode, Error **errp)
             goto err;
         }
         walkenv(FD_PREFIX, preserve_fd, 0);
+        vhost_dev_reset_all();
         setenv("QEMU_START_FREEZE", "", 1);
         qemu_system_exec_request();
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 14/22] chardev: cpr framework
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (12 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 13/22] vhost: reset vhost devices upon cprsave Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 15/22] chardev: cpr for simple devices Steve Sistare
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add QEMU_CHAR_FEATURE_CPR for devices that support cpr.
Add the chardev close_on_cpr option for devices that can be closed on cpr
and reopened after exec.
cpr is allowed only if either QEMU_CHAR_FEATURE_CPR or close_on_cpr is set
for all chardevs in the configuration.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char.c         | 41 ++++++++++++++++++++++++++++++++++++++---
 include/chardev/char.h |  5 +++++
 migration/cpr.c        |  3 +++
 qapi/char.json         |  5 ++++-
 qemu-options.hx        | 26 ++++++++++++++++++++++++--
 5 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/chardev/char.c b/chardev/char.c
index 77e7ec8..0be551f 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -37,6 +37,7 @@
 #include "qemu/help_option.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
+#include "qemu/env.h"
 #include "qemu/id.h"
 #include "qemu/coroutine.h"
 
@@ -226,6 +227,9 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
     ChardevClass *cc = CHARDEV_GET_CLASS(chr);
     /* Any ChardevCommon member would work */
     ChardevCommon *common = backend ? backend->u.null.data : NULL;
+    char fdname[40];
+
+    chr->close_on_cpr = (common && common->close_on_cpr);
 
     if (common && common->has_logfile) {
         int flags = O_WRONLY | O_CREAT;
@@ -235,7 +239,14 @@ static void qemu_char_open(Chardev *chr, ChardevBackend *backend,
         } else {
             flags |= O_TRUNC;
         }
-        chr->logfd = qemu_open(common->logfile, flags, 0666);
+        snprintf(fdname, sizeof(fdname), "%s_log", chr->label);
+        chr->logfd = getenv_fd(fdname);
+        if (chr->logfd < 0) {
+            chr->logfd = qemu_open(common->logfile, flags, 0666);
+            if (!chr->close_on_cpr) {
+                setenv_fd(fdname, chr->logfd);
+            }
+        }
         if (chr->logfd < 0) {
             error_setg_errno(errp, errno,
                              "Unable to open logfile %s",
@@ -286,11 +297,12 @@ static void char_finalize(Object *obj)
     if (chr->be) {
         chr->be->chr = NULL;
     }
-    g_free(chr->filename);
-    g_free(chr->label);
     if (chr->logfd != -1) {
         close(chr->logfd);
+        unsetenv_fdv("%s_log", chr->label);
     }
+    g_free(chr->filename);
+    g_free(chr->label);
     qemu_mutex_destroy(&chr->chr_write_lock);
 }
 
@@ -490,6 +502,8 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend)
 
     backend->has_logappend = true;
     backend->logappend = qemu_opt_get_bool(opts, "logappend", false);
+
+    backend->close_on_cpr = qemu_opt_get_bool(opts, "close-on-cpr", false);
 }
 
 static const ChardevClass *char_get_class(const char *driver, Error **errp)
@@ -922,6 +936,9 @@ QemuOptsList qemu_chardev_opts = {
         },{
             .name = "abstract",
             .type = QEMU_OPT_BOOL,
+        },{
+            .name = "close-on-cpr",
+            .type = QEMU_OPT_BOOL,
         },
         { /* end of list */ }
     },
@@ -1163,6 +1180,24 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
     return source;
 }
 
+static int chr_cpr_capable(Object *obj, void *opaque)
+{
+    Chardev *chr = (Chardev *)obj;
+    Error **errp = opaque;
+
+    if (qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_CPR) || chr->close_on_cpr) {
+        return 0;
+    }
+    error_setg(errp, "error: chardev %s -> %s is not capable of cpr",
+               chr->label, chr->filename);
+    return 1;
+}
+
+bool qemu_chr_cpr_capable(Error **errp)
+{
+    return !object_child_foreach(get_chardevs_root(), chr_cpr_capable, errp);
+}
+
 void qemu_chr_cleanup(void)
 {
     object_unparent(get_chardevs_root());
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 00589a6..35d04e9 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -50,6 +50,8 @@ typedef enum {
     /* Whether the gcontext can be changed after calling
      * qemu_chr_be_update_read_handlers() */
     QEMU_CHAR_FEATURE_GCONTEXT,
+    /* Whether the device supports cpr */
+    QEMU_CHAR_FEATURE_CPR,
 
     QEMU_CHAR_FEATURE_LAST,
 } ChardevFeature;
@@ -65,6 +67,7 @@ struct Chardev {
     char *filename;
     int logfd;
     int be_open;
+    bool close_on_cpr;
     GSource *gsource;
     GMainContext *gcontext;
     DECLARE_BITMAP(features, QEMU_CHAR_FEATURE_LAST);
@@ -290,4 +293,6 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
 /* console.c */
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
+bool qemu_chr_cpr_capable(Error **errp);
+
 #endif
diff --git a/migration/cpr.c b/migration/cpr.c
index 13c5d7c..93f6800 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -131,6 +131,9 @@ void cprsave(const char *file, CprMode mode, Error **errp)
         no_shutdown = 0;
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     } else if (restart) {
+        if (!qemu_chr_cpr_capable(errp)) {
+            goto err;
+        }
         if (vfio_cprsave()) {
             goto err;
         }
diff --git a/qapi/char.json b/qapi/char.json
index 8aeedf9..7e0186c 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -204,12 +204,15 @@
 # @logfile: The name of a logfile to save output
 # @logappend: true to append instead of truncate
 #             (default to false to truncate)
+# @close-on-cpr: if true, close device's fd on cprsave. defaults to false.
+#                since 5.3.
 #
 # Since: 2.6
 ##
 { 'struct': 'ChardevCommon',
   'data': { '*logfile': 'str',
-            '*logappend': 'bool' } }
+            '*logappend': 'bool',
+            '*close-on-cpr': 'bool' } }
 
 ##
 # @ChardevFile:
diff --git a/qemu-options.hx b/qemu-options.hx
index 455b43b7..1ab5af5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2956,42 +2956,60 @@ DEFHEADING(Character device options:)
 DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
     "-chardev help\n"
     "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]\n"
     "         [,server][,nowait][,telnet][,websocket][,reconnect=seconds][,mux=on|off]\n"
     "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] (tcp)\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev socket,id=id,path=path[,server][,nowait][,telnet][,websocket][,reconnect=seconds]\n"
-    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off] (unix)\n"
+    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off][,close-on-cpr=on|off] (unix)\n"
     "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
     "         [,localport=localport][,ipv4][,ipv6][,mux=on|off]\n"
-    "         [,logfile=PATH][,logappend=on|off]\n"
+    "         [,logfile=PATH][,logappend=on|off][,close-on-cpr=on|off]\n"
     "-chardev msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
     "         [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #ifdef _WIN32
     "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #else
     "-chardev pty,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev stdio,id=id[,mux=on|off][,signal=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #ifdef CONFIG_BRLAPI
     "-chardev braille,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__) \
         || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__DragonFly__)
     "-chardev serial,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev tty,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
     "-chardev parallel,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev parport,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
 #if defined(CONFIG_SPICE)
     "-chardev spicevmc,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
     "-chardev spiceport,id=id,name=name[,debug=debug][,logfile=PATH][,logappend=on|off]\n"
+    "         [,close-on-cpr=on|off]\n"
 #endif
     , QEMU_ARCH_ALL
 )
@@ -3064,6 +3082,10 @@ The general form of a character device option is:
     ``logappend`` option controls whether the log file will be truncated
     or appended to when opened.
 
+    Every backend supports the ``close-on-cpr`` option.  If on, the devices's
+    descriptor is closed during cprsave, and reopened after exec.  This is
+    useful for devices that do not support cpr.
+
 The available backends are:
 
 ``-chardev null,id=id``
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 15/22] chardev: cpr for simple devices
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (13 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 14/22] chardev: cpr framework Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 16/22] chardev: cpr for pty Steve Sistare
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Set QEMU_CHAR_FEATURE_CPR for devices that trivially support cpr.
char-stdio is slightly less trivial.  Allow the gdb server by
closing it on exec.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-mux.c     | 1 +
 chardev/char-null.c    | 1 +
 chardev/char-serial.c  | 1 +
 chardev/char-stdio.c   | 8 ++++++++
 gdbstub.c              | 1 +
 include/chardev/char.h | 1 +
 migration/cpr.c        | 1 +
 7 files changed, 14 insertions(+)

diff --git a/chardev/char-mux.c b/chardev/char-mux.c
index 6f980bb..2a6989e 100644
--- a/chardev/char-mux.c
+++ b/chardev/char-mux.c
@@ -330,6 +330,7 @@ static void qemu_chr_open_mux(Chardev *chr,
      */
     *be_opened = machine_init_done;
     qemu_chr_fe_init(&d->chr, drv, errp);
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 
 static void qemu_chr_parse_mux(QemuOpts *opts, ChardevBackend *backend,
diff --git a/chardev/char-null.c b/chardev/char-null.c
index 1c6a290..02acaff 100644
--- a/chardev/char-null.c
+++ b/chardev/char-null.c
@@ -32,6 +32,7 @@ static void null_chr_open(Chardev *chr,
                           Error **errp)
 {
     *be_opened = false;
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 
 static void char_null_class_init(ObjectClass *oc, void *data)
diff --git a/chardev/char-serial.c b/chardev/char-serial.c
index 7c3d84a..b585085 100644
--- a/chardev/char-serial.c
+++ b/chardev/char-serial.c
@@ -274,6 +274,7 @@ static void qmp_chardev_open_serial(Chardev *chr,
     qemu_set_nonblock(fd);
     tty_serial_init(fd, 115200, 'N', 8, 1);
 
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
     qemu_chr_open_fd(chr, fd, fd);
 }
 #endif /* __linux__ || __sun__ */
diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c
index 82eaebc..db6218c 100644
--- a/chardev/char-stdio.c
+++ b/chardev/char-stdio.c
@@ -116,9 +116,17 @@ static void qemu_chr_open_stdio(Chardev *chr,
         stdio_allow_signal = opts->signal;
     }
     qemu_chr_set_echo_stdio(chr, false);
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
 }
 #endif
 
+void qemu_term_exit(void)
+{
+#ifndef _WIN32
+    term_exit();
+#endif
+}
+
 static void qemu_chr_parse_stdio(QemuOpts *opts, ChardevBackend *backend,
                                  Error **errp)
 {
diff --git a/gdbstub.c b/gdbstub.c
index f3a318c..0e8ed91 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -3417,6 +3417,7 @@ int gdbserver_start(const char *device)
         mon_chr = gdbserver_state.mon_chr;
         reset_gdbserver_state();
     }
+    mon_chr->close_on_cpr = true;
 
     create_processes(&gdbserver_state);
 
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 35d04e9..affcc12 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -294,5 +294,6 @@ GSource *qemu_chr_timeout_add_ms(Chardev *chr, guint ms,
 void qemu_chr_parse_vc(QemuOpts *opts, ChardevBackend *backend, Error **errp);
 
 bool qemu_chr_cpr_capable(Error **errp);
+void qemu_term_exit(void);
 
 #endif
diff --git a/migration/cpr.c b/migration/cpr.c
index 93f6800..de85d56 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -139,6 +139,7 @@ void cprsave(const char *file, CprMode mode, Error **errp)
         }
         walkenv(FD_PREFIX, preserve_fd, 0);
         vhost_dev_reset_all();
+        qemu_term_exit();
         setenv("QEMU_START_FREEZE", "", 1);
         qemu_system_exec_request();
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 16/22] chardev: cpr for pty
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (14 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 15/22] chardev: cpr for simple devices Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 17/22] chardev: socket accept subroutine Steve Sistare
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Save and restore pty descriptors across cprsave and cprload.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-pty.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 1cc501a..0916f9e 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -30,6 +30,7 @@
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "qemu/env.h"
 #include "qemu/qemu-print.h"
 
 #include "chardev/char-io.h"
@@ -188,12 +189,14 @@ static void char_pty_finalize(Object *obj)
     Chardev *chr = CHARDEV(obj);
     PtyChardev *s = PTY_CHARDEV(obj);
 
+    unsetenv_fd(chr->label);
     pty_chr_state(chr, 0);
     object_unref(OBJECT(s->ioc));
     pty_chr_timer_cancel(s);
     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
 
+
 static void char_pty_open(Chardev *chr,
                           ChardevBackend *backend,
                           bool *be_opened,
@@ -204,19 +207,28 @@ static void char_pty_open(Chardev *chr,
     char pty_name[PATH_MAX];
     char *name;
 
+    master_fd = getenv_fd(chr->label);
+    if (master_fd >= 0) {
+        chr->filename = g_strdup_printf("pty:unknown");
+        goto have_fd;
+    }
+
     master_fd = qemu_openpty_raw(&slave_fd, pty_name);
     if (master_fd < 0) {
         error_setg_errno(errp, errno, "Failed to create PTY");
         return;
     }
-
+    if (!chr->close_on_cpr) {
+        setenv_fd(chr->label, master_fd);
+    }
     close(slave_fd);
     qemu_set_nonblock(master_fd);
-
     chr->filename = g_strdup_printf("pty:%s", pty_name);
     qemu_printf("char device redirected to %s (label %s)\n",
                 pty_name, chr->label);
 
+have_fd:
+    qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
     s = PTY_CHARDEV(chr);
     s->ioc = QIO_CHANNEL(qio_channel_file_new_fd(master_fd));
     name = g_strdup_printf("chardev-pty-%s", chr->label);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 17/22] chardev: socket accept subroutine
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (15 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 16/22] chardev: cpr for pty Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 18/22] chardev: cpr for sockets Steve Sistare
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Factor out the post-accept actions into a subroutine that can be used in a
subsequent patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 io/channel-socket.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index e1b4667..de49880 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -349,16 +349,34 @@ void qio_channel_socket_dgram_async(QIOChannelSocket *ioc,
                            context);
 }
 
+static int qio_channel_socket_post_accept(QIOChannelSocket *cioc,
+                                           Error **errp)
+{
+    cioc->localAddrLen = sizeof(cioc->localAddr);
+    if (getsockname(cioc->fd, (struct sockaddr *)&cioc->localAddr,
+                    &cioc->localAddrLen) < 0) {
+        error_setg_errno(errp, errno,
+                         "Unable to query local socket address");
+        return 1;
+    }
+
+#ifndef WIN32
+    if (cioc->localAddr.ss_family == AF_UNIX) {
+        QIOChannel *ioc_local = QIO_CHANNEL(cioc);
+        qio_channel_set_feature(ioc_local, QIO_CHANNEL_FEATURE_FD_PASS);
+    }
+#endif /* WIN32 */
+
+    return 0;
+}
 
 QIOChannelSocket *
 qio_channel_socket_accept(QIOChannelSocket *ioc,
                           Error **errp)
 {
-    QIOChannelSocket *cioc;
+    QIOChannelSocket *cioc = qio_channel_socket_new();
 
-    cioc = qio_channel_socket_new();
     cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
-    cioc->localAddrLen = sizeof(ioc->localAddr);
 
  retry:
     trace_qio_channel_socket_accept(ioc);
@@ -372,24 +390,11 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
         trace_qio_channel_socket_accept_fail(ioc);
         goto error;
     }
-
-    if (getsockname(cioc->fd, (struct sockaddr *)&cioc->localAddr,
-                    &cioc->localAddrLen) < 0) {
-        error_setg_errno(errp, errno,
-                         "Unable to query local socket address");
-        goto error;
+    if (!qio_channel_socket_post_accept(cioc, errp)) {
+        trace_qio_channel_socket_accept_complete(ioc, cioc, cioc->fd);
+        return cioc;
     }
 
-#ifndef WIN32
-    if (cioc->localAddr.ss_family == AF_UNIX) {
-        QIOChannel *ioc_local = QIO_CHANNEL(cioc);
-        qio_channel_set_feature(ioc_local, QIO_CHANNEL_FEATURE_FD_PASS);
-    }
-#endif /* WIN32 */
-
-    trace_qio_channel_socket_accept_complete(ioc, cioc, cioc->fd);
-    return cioc;
-
  error:
     object_unref(OBJECT(cioc));
     return NULL;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 18/22] chardev: cpr for sockets
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (16 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 17/22] chardev: socket accept subroutine Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 16:22   ` Daniel P. Berrangé
  2021-01-05 15:42 ` [PATCH V2 19/22] monitor: cpr support Steve Sistare
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Define qio_channel_socket_reuse to initialize a channel based on an existing
socket fd.  Save accepted socket fds in the environment before cprsave, and
look for fds in the environment after cprload.  Reject cprsave if a socket
enables the TLS or websocket option.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-socket.c       | 30 ++++++++++++++++++++++++++++++
 include/io/channel-socket.h | 12 ++++++++++++
 io/channel-socket.c         |  9 +++++++++
 stubs/Makefile.objs         |  1 +
 stubs/cpr.c                 |  3 +++
 5 files changed, 55 insertions(+)
 create mode 100644 stubs/cpr.c

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index ef62dbf..0965305 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -36,6 +36,7 @@
 #include "qapi/qapi-visit-sockets.h"
 
 #include "chardev/char-io.h"
+#include "qemu/env.h"
 
 /***********************************************************/
 /* TCP Net console */
@@ -400,6 +401,7 @@ static void tcp_chr_free_connection(Chardev *chr)
     SocketChardev *s = SOCKET_CHARDEV(chr);
     int i;
 
+    unsetenv_fd(chr->label);
     if (s->read_msgfds_num) {
         for (i = 0; i < s->read_msgfds_num; i++) {
             close(s->read_msgfds[i]);
@@ -1157,6 +1159,25 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
     return false;
 }
 
+static void load_char_socket_fd(Chardev *chr)
+{
+    SocketChardev *sockchar = SOCKET_CHARDEV(chr);
+    QIOChannelSocket *sioc;
+    int fd = getenv_fd(chr->label);
+
+    if (fd != -1) {
+        sockchar = SOCKET_CHARDEV(chr);
+        sioc = qio_channel_socket_reuse(fd, NULL);
+        if (sioc) {
+            tcp_chr_accept(sockchar->listener, sioc, chr);
+        } else {
+            error_printf("error: could not restore socket for %s\n",
+                         chr->label);
+        }
+    } else if (sockchar->sioc && !chr->close_on_cpr) {
+        setenv_fd(chr->label, sockchar->sioc->fd);
+    }
+}
 
 static int qmp_chardev_open_socket_server(Chardev *chr,
                                           bool is_telnet,
@@ -1360,6 +1381,13 @@ static void qmp_chardev_open_socket(Chardev *chr,
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
     }
 
+    if (!s->tls_creds && !s->is_websock) {
+        qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_CPR);
+    } else if (only_cpr_capable) {
+        error_setg(errp, "error: socket %s is not cpr capable due to %s option",
+                   chr->label, (s->tls_creds ? "TLS" : "websocket"));
+    }
+
     /* be isn't opened until we get a connection */
     *be_opened = false;
 
@@ -1375,6 +1403,8 @@ static void qmp_chardev_open_socket(Chardev *chr,
             return;
         }
     }
+
+    load_char_socket_fd(chr);
 }
 
 static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index 777ff59..e425a01 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -260,5 +260,17 @@ QIOChannelSocket *
 qio_channel_socket_accept(QIOChannelSocket *ioc,
                           Error **errp);
 
+/**
+ * qio_channel_socket_reuse:
+ * @fd: existing client socket descriptor
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Construct a client channel using @fd.
+ *
+ * Returns: the new client channel, or NULL on error
+ */
+QIOChannelSocket *
+qio_channel_socket_reuse(int fd,
+                         Error **errp);
 
 #endif /* QIO_CHANNEL_SOCKET_H */
diff --git a/io/channel-socket.c b/io/channel-socket.c
index de49880..07981be 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -400,6 +400,15 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
     return NULL;
 }
 
+QIOChannelSocket *
+qio_channel_socket_reuse(int fd,
+                         Error **errp)
+{
+    QIOChannelSocket *cioc = qio_channel_socket_new();
+    cioc->fd = fd;
+    return qio_channel_socket_post_accept(cioc, errp) ? 0 : cioc;
+}
+
 static void qio_channel_socket_init(Object *obj)
 {
     QIOChannelSocket *ioc = QIO_CHANNEL_SOCKET(obj);
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index d42046a..f6c335b 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -1,5 +1,6 @@
 stub-obj-y += blk-commit-all.o
 stub-obj-y += cmos.o
+stub-obj-y += cpr.o
 stub-obj-y += cpu-get-clock.o
 stub-obj-y += cpu-get-icount.o
 stub-obj-y += dump.o
diff --git a/stubs/cpr.c b/stubs/cpr.c
new file mode 100644
index 0000000..aaa189e
--- /dev/null
+++ b/stubs/cpr.c
@@ -0,0 +1,3 @@
+#include "qemu/osdep.h"
+
+bool only_cpr_capable;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 19/22] monitor: cpr support
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (17 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 18/22] chardev: cpr for sockets Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 20/22] cpr: only-cpr-capable option Steve Sistare
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

A monitor socket requires special treatment.  Save and restore the
qmp negotiation status.  Stop the monitor's iothread in cpsave. Otherwise,
the thread will detect the close of the monitor socket and call unsetenv_fd,which modifies environ and races with execv which uses environ.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/monitor/monitor.h |  2 ++
 migration/cpr.c           |  2 ++
 monitor/monitor.c         |  5 +++++
 monitor/qmp.c             | 43 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 1018d75..5456cff 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -22,6 +22,8 @@ void monitor_init_hmp(Chardev *chr, bool use_readline, Error **errp);
 int monitor_init(MonitorOptions *opts, bool allow_hmp, Error **errp);
 int monitor_init_opts(QemuOpts *opts, Error **errp);
 void monitor_cleanup(void);
+void monitor_iothread_stop(void);
+void monitor_cprsave(void);
 
 int monitor_suspend(Monitor *mon);
 void monitor_resume(Monitor *mon);
diff --git a/migration/cpr.c b/migration/cpr.c
index de85d56..0f49c7d 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -137,9 +137,11 @@ void cprsave(const char *file, CprMode mode, Error **errp)
         if (vfio_cprsave()) {
             goto err;
         }
+        monitor_iothread_stop();
         walkenv(FD_PREFIX, preserve_fd, 0);
         vhost_dev_reset_all();
         qemu_term_exit();
+        monitor_cprsave();
         setenv("QEMU_START_FREEZE", "", 1);
         qemu_system_exec_request();
     }
diff --git a/monitor/monitor.c b/monitor/monitor.c
index b385a3d..1bda67c 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -591,6 +591,11 @@ void monitor_cleanup(void)
     }
 }
 
+void monitor_iothread_stop(void)
+{
+    iothread_stop(mon_iothread);
+}
+
 static void monitor_qapi_event_init(void)
 {
     monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash,
diff --git a/monitor/qmp.c b/monitor/qmp.c
index d433cea..d7eeab1 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -33,6 +33,7 @@
 #include "qapi/qmp/qlist.h"
 #include "qapi/qmp/qstring.h"
 #include "trace.h"
+#include "qemu/env.h"
 
 struct QMPRequest {
     /* Owner of the request */
@@ -398,6 +399,21 @@ static void monitor_qmp_setup_handlers_bh(void *opaque)
     monitor_list_append(&mon->common);
 }
 
+static void setenv_qmp(const char *name, bool val)
+{
+    setenv_bool(name, val);
+}
+
+static bool getenv_qmp(const char *name)
+{
+    bool ret = getenv_bool(name);
+    if (ret != -1) {
+        unsetenv_bool(name);
+        return ret;
+    }
+    return false;
+}
+
 void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
 {
     MonitorQMP *mon = g_new0(MonitorQMP, 1);
@@ -438,4 +454,31 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
                                  NULL, &mon->common, NULL, true);
         monitor_list_append(&mon->common);
     }
+
+    /*
+     * If a chr->label qmp env var is true, this is a restored qmp
+     * connection with capabilities negotiated.
+     */
+    if (getenv_qmp(chr->label) == true) {
+        mon->commands = &qmp_commands;
+    }
+}
+
+/* Save the result of capability negotiation in the environment */
+
+void monitor_cprsave(void)
+{
+    Monitor *mon;
+    MonitorQMP *qmp_mon;
+
+    QTAILQ_FOREACH(mon, &mon_list, entry) {
+        if (!monitor_is_qmp(mon)) {
+            continue;
+        }
+
+        qmp_mon = container_of(mon, MonitorQMP, common);
+        if (qmp_mon->commands == &qmp_commands) {
+            setenv_qmp(mon->chr.chr->label, true);
+        }
+    }
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 20/22] cpr: only-cpr-capable option
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (18 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 19/22] monitor: cpr support Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 21/22] cpr: maintainers Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 22/22] simplify savevm Steve Sistare
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add the only-cpr-capable option, which causes qemu to exit with an error
if any devices that are not capable of cpr are added.  This guarantees that
a cprsave operation will not fail with an unsupported device error.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 chardev/char-socket.c   |  1 +
 exec.c                  |  3 +++
 include/sysemu/sysemu.h |  1 +
 migration/migration.c   |  6 ++++++
 qemu-options.hx         |  8 ++++++++
 softmmu/vl.c            | 13 +++++++++++++
 6 files changed, 32 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 0965305..580d731 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -36,6 +36,7 @@
 #include "qapi/qapi-visit-sockets.h"
 
 #include "chardev/char-io.h"
+#include "sysemu/sysemu.h"
 #include "qemu/env.h"
 
 /***********************************************************/
diff --git a/exec.c b/exec.c
index 6a6e43d..01732f5 100644
--- a/exec.c
+++ b/exec.c
@@ -2271,6 +2271,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                 new_block->flags |= RAM_SHARED;
                 addr = file_ram_alloc(new_block, maxlen, mfd, false, errp);
                 trace_anon_memfd_alloc(name, maxlen, addr, mfd);
+            } else if (only_cpr_capable) {
+                error_report("only-cpr-capable requires memfd-alloc");
+                errno = 0;
             } else {
                 addr = phys_mem_alloc(maxlen, &mr->align, shared);
             }
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index f0017d4..a72e7da 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -14,6 +14,7 @@ extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
 extern bool memfd_alloc;
+extern bool only_cpr_capable;
 extern int start_on_wake;
 
 void qemu_add_data_dir(const char *path);
diff --git a/migration/migration.c b/migration/migration.c
index 8fe3633..5459a2a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1099,6 +1099,12 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_X_COLO] && only_cpr_capable) {
+        error_setg(errp, "x-colo is not compatible with -only-cpr-capable");
+        return false;
+    }
+
+
     return true;
 }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 1ab5af5..ff8464f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4116,6 +4116,14 @@ SRST
     an unmigratable state.
 ERST
 
+DEF("only-cpr-capable", 0, QEMU_OPTION_only_cpr_capable, \
+    "-only-cpr-capable    allow only cpr capable devices\n", QEMU_ARCH_ALL)
+SRST
+``-only-cpr-capable``
+    Only allow cpr capable devices, which guarantees that cprsave will not fail
+    with an unsupported device error.
+ERST
+
 #ifdef __linux__
 DEF("memfd-alloc", 0,  QEMU_OPTION_memfd_alloc, \
     "-memfd-alloc         allocate anonymous memory using memfd_create\n",
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 9f2be5c..e101056 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -164,6 +164,7 @@ bool boot_strict;
 uint8_t *boot_splash_filedata;
 int only_migratable; /* turn it off unless user states otherwise */
 bool wakeup_suspend_enabled;
+bool only_cpr_capable;
 bool memfd_alloc;
 int start_on_wake;
 static char **argv_main;
@@ -3679,6 +3680,9 @@ void qemu_init(int argc, char **argv, char **envp)
             case QEMU_OPTION_only_migratable:
                 only_migratable = 1;
                 break;
+            case QEMU_OPTION_only_cpr_capable:
+                only_cpr_capable = true;
+                break;
             case QEMU_OPTION_memfd_alloc:
                 memfd_alloc = true;
                 break;
@@ -4433,6 +4437,15 @@ void qemu_init(int argc, char **argv, char **envp)
 
     cpu_synchronize_all_post_init();
 
+    if (only_cpr_capable && replay_mode != REPLAY_MODE_NONE) {
+        error_report("replay is not compatible with -only-cpr-capable");
+        exit(1);
+    }
+    if (only_cpr_capable && !qemu_chr_cpr_capable(&err)) {
+        error_report_err(err);
+        exit(1);
+    }
+
     rom_reset_order_override();
 
     /* Did we create any drives that we failed to create a device for? */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 21/22] cpr: maintainers
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (19 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 20/22] cpr: only-cpr-capable option Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  2021-01-05 15:42 ` [PATCH V2 22/22] simplify savevm Steve Sistare
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Add the maintainers for cpr related files.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0886eb3..93044e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2631,6 +2631,17 @@ F: net/colo*
 F: net/filter-rewriter.c
 F: net/filter-mirror.c
 
+CPR
+M: Steve Sistare <steven.sistare@oracle.com>
+M: Mark Kanda <mark.kanda@oracle.com>
+S: Maintained
+F: hw/vfio/cpr.c
+F: include/migration/cpr.h
+F: migration/cpr.c
+F: qapi/cpr.json
+F: include/qemu/env.h
+F: util/env.c
+
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
 R: Paolo Bonzini <pbonzini@redhat.com>
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 22/22] simplify savevm
  2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
                   ` (20 preceding siblings ...)
  2021-01-05 15:42 ` [PATCH V2 21/22] cpr: maintainers Steve Sistare
@ 2021-01-05 15:42 ` Steve Sistare
  21 siblings, 0 replies; 29+ messages in thread
From: Steve Sistare @ 2021-01-05 15:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrange, Michael S. Tsirkin, Jason Zeng,
	Alex Bennée, Juan Quintela, Dr. David Alan Gilbert,
	Markus Armbruster, Alex Williamson, Steve Sistare,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé

Use qf_file_open to simplify a few functions in savevm.c.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/savevm.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index a843d20..994ad1a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2788,8 +2788,8 @@ int save_snapshot(const char *name, Error **errp)
 void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
                                 Error **errp)
 {
+    const char *ioc_name = "migration-xen-save-state";
     QEMUFile *f;
-    QIOChannelFile *ioc;
     int saved_vm_running;
     int ret;
 
@@ -2803,13 +2803,10 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
     vm_stop(RUN_STATE_SAVE_VM);
     global_state_store_running();
 
-    ioc = qio_channel_file_new_path(filename, O_WRONLY | O_CREAT, 0660, errp);
-    if (!ioc) {
+    f = qf_file_open(filename, O_WRONLY | O_CREAT, 0660, ioc_name, errp);
+    if (!f) {
         goto the_end;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-xen-save-state");
-    f = qemu_fopen_channel_output(QIO_CHANNEL(ioc));
-    object_unref(OBJECT(ioc));
     ret = qemu_save_device_state(f);
     if (ret < 0 || qemu_fclose(f) < 0) {
         error_setg(errp, QERR_IO_ERROR);
@@ -2837,8 +2834,8 @@ void qmp_xen_save_devices_state(const char *filename, bool has_live, bool live,
 
 void qmp_xen_load_devices_state(const char *filename, Error **errp)
 {
+    const char *ioc_name = "migration-xen-load-state";
     QEMUFile *f;
-    QIOChannelFile *ioc;
     int ret;
 
     /* Guest must be paused before loading the device state; the RAM state
@@ -2850,14 +2847,10 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
     }
     vm_stop(RUN_STATE_RESTORE_VM);
 
-    ioc = qio_channel_file_new_path(filename, O_RDONLY | O_BINARY, 0, errp);
-    if (!ioc) {
+    f = qf_file_open(filename, O_RDONLY | O_BINARY, 0, ioc_name, errp);
+    if (!f) {
         return;
     }
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-xen-load-state");
-    f = qemu_fopen_channel_input(QIO_CHANNEL(ioc));
-    object_unref(OBJECT(ioc));
-
     ret = qemu_loadvm_state(f);
     qemu_fclose(f);
     if (ret < 0) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 18/22] chardev: cpr for sockets
  2021-01-05 15:42 ` [PATCH V2 18/22] chardev: cpr for sockets Steve Sistare
@ 2021-01-05 16:22   ` Daniel P. Berrangé
  2021-01-05 16:35     ` Steven Sistare
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel P. Berrangé @ 2021-01-05 16:22 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Michael S. Tsirkin, Alex Bennée, Juan Quintela,
	qemu-devel, Dr. David Alan Gilbert, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Tue, Jan 05, 2021 at 07:42:06AM -0800, Steve Sistare wrote:
> Define qio_channel_socket_reuse to initialize a channel based on an existing
> socket fd.  Save accepted socket fds in the environment before cprsave, and
> look for fds in the environment after cprload.  Reject cprsave if a socket
> enables the TLS or websocket option.
> 
> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  chardev/char-socket.c       | 30 ++++++++++++++++++++++++++++++
>  include/io/channel-socket.h | 12 ++++++++++++
>  io/channel-socket.c         |  9 +++++++++
>  stubs/Makefile.objs         |  1 +
>  stubs/cpr.c                 |  3 +++
>  5 files changed, 55 insertions(+)
>  create mode 100644 stubs/cpr.c
> 

> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index 777ff59..e425a01 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -260,5 +260,17 @@ QIOChannelSocket *
>  qio_channel_socket_accept(QIOChannelSocket *ioc,
>                            Error **errp);
>  
> +/**
> + * qio_channel_socket_reuse:
> + * @fd: existing client socket descriptor
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Construct a client channel using @fd.
> + *
> + * Returns: the new client channel, or NULL on error
> + */
> +QIOChannelSocket *
> +qio_channel_socket_reuse(int fd,
> +                         Error **errp);
>  
>  #endif /* QIO_CHANNEL_SOCKET_H */
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index de49880..07981be 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -400,6 +400,15 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
>      return NULL;
>  }
>  
> +QIOChannelSocket *
> +qio_channel_socket_reuse(int fd,
> +                         Error **errp)
> +{
> +    QIOChannelSocket *cioc = qio_channel_socket_new();
> +    cioc->fd = fd;
> +    return qio_channel_socket_post_accept(cioc, errp) ? 0 : cioc;
> +}

Why do we need to add this new API when we already have

 qio_channel_socket_new_fd(int fd, Error **errp)

which accepts a pre-opened socket FD ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 05/22] vl: memfd-alloc option
  2021-01-05 15:41 ` [PATCH V2 05/22] vl: memfd-alloc option Steve Sistare
@ 2021-01-05 16:27   ` Daniel P. Berrangé
  2021-01-06 16:36     ` Steven Sistare
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel P. Berrangé @ 2021-01-05 16:27 UTC (permalink / raw)
  To: Steve Sistare
  Cc: Jason Zeng, Michael S. Tsirkin, Alex Bennée, Juan Quintela,
	qemu-devel, Dr. David Alan Gilbert, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Tue, Jan 05, 2021 at 07:41:53AM -0800, Steve Sistare wrote:
> Allocate anonymous memory using memfd_create if the memfd-alloc option is
> set.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  exec.c                  | 38 ++++++++++++++++++++++++++++++--------
>  include/sysemu/sysemu.h |  1 +
>  qemu-options.hx         | 11 +++++++++++
>  softmmu/vl.c            |  4 ++++
>  trace-events            |  1 +
>  5 files changed, 47 insertions(+), 8 deletions(-)

> diff --git a/qemu-options.hx b/qemu-options.hx
> index 708583b..455b43b7 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4094,6 +4094,17 @@ SRST
>      an unmigratable state.
>  ERST
>  
> +#ifdef __linux__
> +DEF("memfd-alloc", 0,  QEMU_OPTION_memfd_alloc, \
> +    "-memfd-alloc         allocate anonymous memory using memfd_create\n",
> +    QEMU_ARCH_ALL)
> +#endif
> +
> +SRST
> +``-memfd-alloc``
> +    Allocate anonymous memory using memfd_create (Linux only).
> +ERST

Do we really need a new arg for this ? It is already possible to request
use of memfd for the guest RAM using

  -object memory-backend-memfd,id=ram-node0,size=NNNN

this memory backend object framework was intended to remove the need to
add new ad-hoc CLI args for controlling memory allocation.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 18/22] chardev: cpr for sockets
  2021-01-05 16:22   ` Daniel P. Berrangé
@ 2021-01-05 16:35     ` Steven Sistare
  0 siblings, 0 replies; 29+ messages in thread
From: Steven Sistare @ 2021-01-05 16:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Jason Zeng, Michael S. Tsirkin, Alex Bennée, Juan Quintela,
	qemu-devel, Dr. David Alan Gilbert, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On 1/5/2021 11:22 AM, Daniel P. Berrangé wrote:
> On Tue, Jan 05, 2021 at 07:42:06AM -0800, Steve Sistare wrote:
>> Define qio_channel_socket_reuse to initialize a channel based on an existing
>> socket fd.  Save accepted socket fds in the environment before cprsave, and
>> look for fds in the environment after cprload.  Reject cprsave if a socket
>> enables the TLS or websocket option.
>>
>> Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  chardev/char-socket.c       | 30 ++++++++++++++++++++++++++++++
>>  include/io/channel-socket.h | 12 ++++++++++++
>>  io/channel-socket.c         |  9 +++++++++
>>  stubs/Makefile.objs         |  1 +
>>  stubs/cpr.c                 |  3 +++
>>  5 files changed, 55 insertions(+)
>>  create mode 100644 stubs/cpr.c
>>
> 
>> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
>> index 777ff59..e425a01 100644
>> --- a/include/io/channel-socket.h
>> +++ b/include/io/channel-socket.h
>> @@ -260,5 +260,17 @@ QIOChannelSocket *
>>  qio_channel_socket_accept(QIOChannelSocket *ioc,
>>                            Error **errp);
>>  
>> +/**
>> + * qio_channel_socket_reuse:
>> + * @fd: existing client socket descriptor
>> + * @errp: pointer to a NULL-initialized error object
>> + *
>> + * Construct a client channel using @fd.
>> + *
>> + * Returns: the new client channel, or NULL on error
>> + */
>> +QIOChannelSocket *
>> +qio_channel_socket_reuse(int fd,
>> +                         Error **errp);
>>  
>>  #endif /* QIO_CHANNEL_SOCKET_H */
>> diff --git a/io/channel-socket.c b/io/channel-socket.c
>> index de49880..07981be 100644
>> --- a/io/channel-socket.c
>> +++ b/io/channel-socket.c
>> @@ -400,6 +400,15 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
>>      return NULL;
>>  }
>>  
>> +QIOChannelSocket *
>> +qio_channel_socket_reuse(int fd,
>> +                         Error **errp)
>> +{
>> +    QIOChannelSocket *cioc = qio_channel_socket_new();
>> +    cioc->fd = fd;
>> +    return qio_channel_socket_post_accept(cioc, errp) ? 0 : cioc;
>> +}
> 
> Why do we need to add this new API when we already have
> 
>  qio_channel_socket_new_fd(int fd, Error **errp)
> 
> which accepts a pre-opened socket FD ?

That was fast!
Good call, thanks.  I missed that qio_channel_socket_new_fd calls qio_channel_socket_set_fd and
the latter performs the necessary post-accept actions.  I will also delete patch 17.

- Steve


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 05/22] vl: memfd-alloc option
  2021-01-05 16:27   ` Daniel P. Berrangé
@ 2021-01-06 16:36     ` Steven Sistare
  2021-01-06 20:10       ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Sistare @ 2021-01-06 16:36 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Jason Zeng, Michael S. Tsirkin, Alex Bennée, Juan Quintela,
	qemu-devel, Dr. David Alan Gilbert, Alex Williamson,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On 1/5/2021 11:27 AM, Daniel P. Berrangé wrote:
> On Tue, Jan 05, 2021 at 07:41:53AM -0800, Steve Sistare wrote:
>> Allocate anonymous memory using memfd_create if the memfd-alloc option is
>> set.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  exec.c                  | 38 ++++++++++++++++++++++++++++++--------
>>  include/sysemu/sysemu.h |  1 +
>>  qemu-options.hx         | 11 +++++++++++
>>  softmmu/vl.c            |  4 ++++
>>  trace-events            |  1 +
>>  5 files changed, 47 insertions(+), 8 deletions(-)
> 
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 708583b..455b43b7 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -4094,6 +4094,17 @@ SRST
>>      an unmigratable state.
>>  ERST
>>  
>> +#ifdef __linux__
>> +DEF("memfd-alloc", 0,  QEMU_OPTION_memfd_alloc, \
>> +    "-memfd-alloc         allocate anonymous memory using memfd_create\n",
>> +    QEMU_ARCH_ALL)
>> +#endif
>> +
>> +SRST
>> +``-memfd-alloc``
>> +    Allocate anonymous memory using memfd_create (Linux only).
>> +ERST
> 
> Do we really need a new arg for this ? It is already possible to request
> use of memfd for the guest RAM using
> 
>   -object memory-backend-memfd,id=ram-node0,size=NNNN
> 
> this memory backend object framework was intended to remove the need to
> add new ad-hoc CLI args for controlling memory allocation.

Yes, I considered that, but there are other memory regions that cannot be controlled
by the command line but which must be preserved, such as vram, bios, and rom.  If vram
is not preserved, parts of the screen will be blank until the user performs some action
which refreshes the display.  bios and rom should be preserved rather than re-recreated
with potentially different contents from the firmware images in the updated qemu package.

However, your comment reminds me that I must add a few lines of code to preserve the 
memory-backend-memfd.

- Steve


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 05/22] vl: memfd-alloc option
  2021-01-06 16:36     ` Steven Sistare
@ 2021-01-06 20:10       ` Paolo Bonzini
  2021-01-06 21:19         ` Steven Sistare
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-01-06 20:10 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Daniel P. Berrangé,
	Michael S. Tsirkin, Jason Zeng, Philippe Mathieu-Daudé,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Alex Bennée, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Il mer 6 gen 2021, 17:37 Steven Sistare <steven.sistare@oracle.com> ha
scritto:

> Yes, I considered that, but there are other memory regions that cannot be
> controlled
> by the command line but which must be preserved, such as vram, bios, and
> rom.  If vram
> is not preserved, parts of the screen will be blank until the user
> performs some action
> which refreshes the display.  bios and rom should be preserved rather than
> re-recreated
> with potentially different contents from the firmware images in the
> updated qemu package.
>
> However, your comment reminds me that I must add a few lines of code to
> preserve the
> memory-backend-memfd.
>

A new option specific to memory is the wrong way to do this. If a special
mode must be specified when starting QEMU, you can make it a -machine
option and block the QMP commands unless it's specified. Otherwise you can
use "normal" migration to marshal and unmarshal across the update those
memory regions that aren't backed by shared memory or memfd.

Also, because of the mess that vl.c had grown into, adding new "simple"
options is going to be very very hard. In fact I am working on turning many
options like -smp or -m into syntactic sugar for -machine; at some point I
would like to (almost) forbid adding _any_ new option.

Paolo



> - Steve
>
>

[-- Attachment #2: Type: text/html, Size: 2037 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 05/22] vl: memfd-alloc option
  2021-01-06 20:10       ` Paolo Bonzini
@ 2021-01-06 21:19         ` Steven Sistare
  0 siblings, 0 replies; 29+ messages in thread
From: Steven Sistare @ 2021-01-06 21:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Daniel P. Berrangé,
	Michael S. Tsirkin, Jason Zeng, Philippe Mathieu-Daudé,
	Juan Quintela, qemu-devel, Dr. David Alan Gilbert,
	Alex Williamson, Stefan Hajnoczi, Marc-André Lureau,
	Alex Bennée, Markus Armbruster

On 1/6/2021 3:10 PM, Paolo Bonzini wrote:
> Il mer 6 gen 2021, 17:37 Steven Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>> ha scritto:
> 
>     Yes, I considered that, but there are other memory regions that cannot be controlled
>     by the command line but which must be preserved, such as vram, bios, and rom.  If vram
>     is not preserved, parts of the screen will be blank until the user performs some action
>     which refreshes the display.  bios and rom should be preserved rather than re-recreated
>     with potentially different contents from the firmware images in the updated qemu package.
> 
>     However, your comment reminds me that I must add a few lines of code to preserve the
>     memory-backend-memfd.
> 
> 
> A new option specific to memory is the wrong way to do this. If a special mode must be specified when starting QEMU, you can make it a -machine option and block the QMP commands unless it's specified. Otherwise you can use "normal" migration to marshal and unmarshal across the update those memory regions that aren't backed by shared memory or memfd.
> 
> Also, because of the mess that vl.c had grown into, adding new "simple" options is going to be very very hard. In fact I am working on turning many options like -smp or -m into syntactic sugar for -machine; at some point I would like to (almost) forbid adding _any_ new option.

Will do.  Thanks for the heads up on the future of vl.c.

I defined the option independently of cpr for generality.  Do you think this could be useful?
If yes, I will name and define the -machine option to use memfd;
if no, I will name and define it to enable cpr, and implicitly enable memfd without saying so.

- Steve
 


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-01-06 21:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05 15:41 [PATCH V2 00/22] Live Update Steve Sistare
2021-01-05 15:41 ` [PATCH V2 01/22] as_flat_walk Steve Sistare
2021-01-05 15:41 ` [PATCH V2 02/22] qemu_ram_volatile Steve Sistare
2021-01-05 15:41 ` [PATCH V2 03/22] oslib: qemu_clr_cloexec Steve Sistare
2021-01-05 15:41 ` [PATCH V2 04/22] util: env var helpers Steve Sistare
2021-01-05 15:41 ` [PATCH V2 05/22] vl: memfd-alloc option Steve Sistare
2021-01-05 16:27   ` Daniel P. Berrangé
2021-01-06 16:36     ` Steven Sistare
2021-01-06 20:10       ` Paolo Bonzini
2021-01-06 21:19         ` Steven Sistare
2021-01-05 15:41 ` [PATCH V2 06/22] vl: add helper to request re-exec Steve Sistare
2021-01-05 15:41 ` [PATCH V2 07/22] cpr Steve Sistare
2021-01-05 15:41 ` [PATCH V2 08/22] cpr: QMP interfaces Steve Sistare
2021-01-05 15:41 ` [PATCH V2 09/22] cpr: HMP interfaces Steve Sistare
2021-01-05 15:41 ` [PATCH V2 10/22] pci: export functions for cpr Steve Sistare
2021-01-05 15:41 ` [PATCH V2 11/22] vfio-pci: refactor " Steve Sistare
2021-01-05 15:42 ` [PATCH V2 12/22] vfio-pci: cpr Steve Sistare
2021-01-05 15:42 ` [PATCH V2 13/22] vhost: reset vhost devices upon cprsave Steve Sistare
2021-01-05 15:42 ` [PATCH V2 14/22] chardev: cpr framework Steve Sistare
2021-01-05 15:42 ` [PATCH V2 15/22] chardev: cpr for simple devices Steve Sistare
2021-01-05 15:42 ` [PATCH V2 16/22] chardev: cpr for pty Steve Sistare
2021-01-05 15:42 ` [PATCH V2 17/22] chardev: socket accept subroutine Steve Sistare
2021-01-05 15:42 ` [PATCH V2 18/22] chardev: cpr for sockets Steve Sistare
2021-01-05 16:22   ` Daniel P. Berrangé
2021-01-05 16:35     ` Steven Sistare
2021-01-05 15:42 ` [PATCH V2 19/22] monitor: cpr support Steve Sistare
2021-01-05 15:42 ` [PATCH V2 20/22] cpr: only-cpr-capable option Steve Sistare
2021-01-05 15:42 ` [PATCH V2 21/22] cpr: maintainers Steve Sistare
2021-01-05 15:42 ` [PATCH V2 22/22] simplify savevm Steve Sistare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.