qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steve Sistare <steven.sistare@oracle.com>
To: qemu-devel@nongnu.org
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Zeng" <jason.zeng@linux.intel.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Juan Quintela" <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Steve Sistare" <steven.sistare@oracle.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>
Subject: [PATCH V2 00/22] Live Update
Date: Tue,  5 Jan 2021 07:41:48 -0800	[thread overview]
Message-ID: <1609861330-129855-1-git-send-email-steven.sistare@oracle.com> (raw)

Provide the cprsave and cprload commands for live update.  These save and
restore VM state, with minimal guest pause time, so that qemu may be updated
to a new version in between.

cprsave stops the VM and saves vmstate to an ordinary file.  It supports two
modes: restart and reboot.  For restart, cprsave exec's the qemu binary (or
/usr/bin/qemu-exec if it exists) with the same argv.  qemu restarts in a
paused state and waits for the cprload command.

To use the restart mode, qemu must be started with the memfd-alloc option,
which allocates guest ram using memfd_create.  The memfd's are saved to
the environment and kept open across exec, after which they are found from
the environment and re-mmap'd.  Hence guest ram is preserved in place,
albeit with new virtual addresses in the qemu process.  The caller resumes
the guest by calling cprload, which loads state from the file.  If the VM
was running at cprsave time, then VM execution resumes.  cprsave supports
any type of guest image and block device, but the caller must not modify
guest block devices between cprsave and cprload.

The restart mode supports vfio devices by preserving the vfio container,
group, device, and event descriptors across the qemu re-exec, and by
updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_SUSPEND
and VFIO_DMA_MAP_FLAG_RESUME as proposed in 
https://lore.kernel.org/kvm/1609861013-129801-1-git-send-email-steven.sistare@oracle.com

For the reboot mode, cprsave saves state and exits qemu, and the caller is
allowed to update the host kernel and system software and reboot.  The
caller resumes the guest by running qemu with the same arguments as the
original process and calling cprload.  To use this mode, guest ram must be
mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
PKRAM as proposed in https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/

The reboot mode supports vfio devices if the caller suspends the guest
instead of stopping the VM, such as by issuing guest-suspend-ram to the
qemu guest agent.  The guest drivers' suspend methods flush outstanding
requests and re-initialize the devices, and thus there is no device state
to save and restore.

The first patches add helper functions:

  - as_flat_walk
  - qemu_ram_volatile
  - oslib: qemu_clr_cloexec
  - util: env var helpers
  - vl: memfd-alloc option
  - vl: add helper to request re-exec

The next patches implement cprsave and cprload:

  - cpr
  - cpr: QMP interfaces
  - cpr: HMP interfaces

The next patches add vfio support for the restart mode:

  - pci: export functions for cpr
  - vfio-pci: refactor for cpr
  - vfio-pci: cpr

The next patches preserve various descriptor-based backend devices across
a cprsave restart:

  - vhost: reset vhost devices upon cprsave
  - chardev: cpr framework
  - chardev: cpr for simple devices
  - chardev: cpr for pty
  - chardev: socket accept subroutine
  - chardev: cpr for sockets
  - monitor: cpr support
  - cpr: only-cpr-capable option
  - cpr: maintainers
  - simplify savevm

Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
"cprload restart".  The software update is performed while the guest is
running to minimize downtime.

window 1				| window 2
					|
# qemu-system-x86_64 ... 		|
QEMU 4.2.0 monitor - type 'help' ...	|
(qemu) info status			|
VM status: running			|
					| # yum update qemu
(qemu) cprsave /tmp/qemu.sav restart	|
QEMU 4.2.1 monitor - type 'help' ...	|
(qemu) info status			|
VM status: paused (prelaunch)		|
(qemu) cprload /tmp/qemu.sav		|
(qemu) info status			|
VM status: running			|


Here is an example of updating the host kernel using "cprload reboot"

window 1					| window 2
						|
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...		|
(qemu) info status				|
VM status: running				|
						| # yum update kernel-uek
(qemu) cprsave /tmp/qemu.sav restart		|
						|
# systemctl kexec				|
kexec_core: Starting new kernel			|
...						|
						|
# qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
QEMU 4.2.1 monitor - type 'help' ...		|
(qemu) info status				|
VM status: paused (prelaunch)			|
(qemu) cprload /tmp/qemu.sav			|
(qemu) info status				|
VM status: running				|

Changes from V1 to V2:
  - revert vmstate infrastructure changes
  - refactor cpr functions into new files
  - delete MADV_DOEXEC and use memfd + VFIO_DMA_UNMAP_FLAG_SUSPEND to 
    preserve memory.
  - add framework to filter chardev's that support cpr
  - save and restore vfio eventfd's
  - modify cprinfo QMP interface
  - incorporate misc review feedback
  - remove unrelated and unneeded patches
  - refactor all patches into a shorter and easier to review series

Steve Sistare (17):
  as_flat_walk
  qemu_ram_volatile
  oslib: qemu_clr_cloexec
  util: env var helpers
  vl: memfd-alloc option
  vl: add helper to request re-exec
  cpr
  pci: export functions for cpr
  vfio-pci: refactor for cpr
  vfio-pci: cpr
  chardev: cpr framework
  chardev: cpr for simple devices
  chardev: cpr for pty
  chardev: socket accept subroutine
  cpr: only-cpr-capable option
  cpr: maintainers
  simplify savevm

Mark Kanda (5):
  cpr: QMP interfaces
  cpr: HMP interfaces
  vhost: reset vhost devices upon cprsave
  chardev: cpr for sockets
  monitor: cpr support

 MAINTAINERS                   |  11 +++
 chardev/char-mux.c            |   1 +
 chardev/char-null.c           |   1 +
 chardev/char-pty.c            |  16 +++-
 chardev/char-serial.c         |   1 +
 chardev/char-socket.c         |  31 +++++++
 chardev/char-stdio.c          |   8 ++
 chardev/char.c                |  41 ++++++++-
 exec.c                        |  75 +++++++++++++--
 gdbstub.c                     |   1 +
 hmp-commands.hx               |  44 +++++++++
 hw/pci/msix.c                 |  20 ++--
 hw/pci/pci.c                  |   7 +-
 hw/vfio/Makefile.objs         |   2 +-
 hw/vfio/common.c              |  63 ++++++++++++-
 hw/vfio/cpr.c                 | 117 +++++++++++++++++++++++
 hw/vfio/pci.c                 | 209 ++++++++++++++++++++++++++++++++++++++----
 hw/vfio/trace-events          |   1 +
 hw/virtio/vhost.c             |  11 +++
 include/chardev/char.h        |   6 ++
 include/exec/memory.h         |  11 +++
 include/hw/pci/msix.h         |   5 +
 include/hw/pci/pci.h          |   2 +
 include/hw/vfio/vfio-common.h |   7 ++
 include/hw/virtio/vhost.h     |   1 +
 include/io/channel-socket.h   |  12 +++
 include/migration/cpr.h       |  17 ++++
 include/monitor/hmp.h         |   3 +
 include/monitor/monitor.h     |   2 +
 include/qemu/env.h            |  27 ++++++
 include/qemu/osdep.h          |   1 +
 include/sysemu/sysemu.h       |   4 +
 io/channel-socket.c           |  52 +++++++----
 linux-headers/linux/vfio.h    |   5 +
 migration/Makefile.objs       |   2 +-
 migration/cpr.c               | 198 +++++++++++++++++++++++++++++++++++++++
 migration/migration.c         |   6 ++
 migration/savevm.c            |  19 ++--
 migration/savevm.h            |   2 +
 monitor/hmp-cmds.c            |  48 ++++++++++
 monitor/monitor.c             |   5 +
 monitor/qmp-cmds.c            |  31 +++++++
 monitor/qmp.c                 |  43 +++++++++
 qapi/Makefile.objs            |   3 +-
 qapi/char.json                |   5 +-
 qapi/cpr.json                 |  68 ++++++++++++++
 qapi/qapi-schema.json         |   1 +
 qemu-options.hx               |  45 ++++++++-
 slirp                         |   2 +-
 softmmu/memory.c              |  17 ++++
 softmmu/vl.c                  |  68 +++++++++++++-
 stubs/Makefile.objs           |   1 +
 stubs/cpr.c                   |   3 +
 trace-events                  |   1 +
 util/Makefile.objs            |   2 +-
 util/env.c                    | 119 ++++++++++++++++++++++++
 util/oslib-posix.c            |   9 ++
 util/oslib-win32.c            |   4 +
 58 files changed, 1433 insertions(+), 84 deletions(-)
 create mode 100644 hw/vfio/cpr.c
 create mode 100644 include/migration/cpr.h
 create mode 100644 include/qemu/env.h
 create mode 100644 migration/cpr.c
 create mode 100644 qapi/cpr.json
 create mode 100644 stubs/cpr.c
 create mode 100644 util/env.c

-- 
1.8.3.1



             reply	other threads:[~2021-01-05 16:10 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05 15:41 Steve Sistare [this message]
2021-01-05 15:41 ` [PATCH V2 01/22] as_flat_walk Steve Sistare
2021-01-05 15:41 ` [PATCH V2 02/22] qemu_ram_volatile Steve Sistare
2021-01-05 15:41 ` [PATCH V2 03/22] oslib: qemu_clr_cloexec Steve Sistare
2021-01-05 15:41 ` [PATCH V2 04/22] util: env var helpers Steve Sistare
2021-01-05 15:41 ` [PATCH V2 05/22] vl: memfd-alloc option Steve Sistare
2021-01-05 16:27   ` Daniel P. Berrangé
2021-01-06 16:36     ` Steven Sistare
2021-01-06 20:10       ` Paolo Bonzini
2021-01-06 21:19         ` Steven Sistare
2021-01-05 15:41 ` [PATCH V2 06/22] vl: add helper to request re-exec Steve Sistare
2021-01-05 15:41 ` [PATCH V2 07/22] cpr Steve Sistare
2021-01-05 15:41 ` [PATCH V2 08/22] cpr: QMP interfaces Steve Sistare
2021-01-05 15:41 ` [PATCH V2 09/22] cpr: HMP interfaces Steve Sistare
2021-01-05 15:41 ` [PATCH V2 10/22] pci: export functions for cpr Steve Sistare
2021-01-05 15:41 ` [PATCH V2 11/22] vfio-pci: refactor " Steve Sistare
2021-01-05 15:42 ` [PATCH V2 12/22] vfio-pci: cpr Steve Sistare
2021-01-05 15:42 ` [PATCH V2 13/22] vhost: reset vhost devices upon cprsave Steve Sistare
2021-01-05 15:42 ` [PATCH V2 14/22] chardev: cpr framework Steve Sistare
2021-01-05 15:42 ` [PATCH V2 15/22] chardev: cpr for simple devices Steve Sistare
2021-01-05 15:42 ` [PATCH V2 16/22] chardev: cpr for pty Steve Sistare
2021-01-05 15:42 ` [PATCH V2 17/22] chardev: socket accept subroutine Steve Sistare
2021-01-05 15:42 ` [PATCH V2 18/22] chardev: cpr for sockets Steve Sistare
2021-01-05 16:22   ` Daniel P. Berrangé
2021-01-05 16:35     ` Steven Sistare
2021-01-05 15:42 ` [PATCH V2 19/22] monitor: cpr support Steve Sistare
2021-01-05 15:42 ` [PATCH V2 20/22] cpr: only-cpr-capable option Steve Sistare
2021-01-05 15:42 ` [PATCH V2 21/22] cpr: maintainers Steve Sistare
2021-01-05 15:42 ` [PATCH V2 22/22] simplify savevm Steve Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1609861330-129855-1-git-send-email-steven.sistare@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).