On 7/30/2020 1:49 PM, Dr. David Alan Gilbert wrote: > * Steve Sistare (steven.sistare@oracle.com) wrote: >> Improve and extend the qemu functions that save and restore VM state so a >> guest may be suspended and resumed with minimal pause time. qemu may be >> updated to a new version in between. > > Nice. > >> The first set of patches adds the cprsave and cprload commands to save and >> restore VM state, and allow the host kernel to be updated and rebooted in >> between. The VM must create guest RAM in a persistent shared memory file, >> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in >> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/ >> >> cprsave stops the VCPUs and saves VM device state in a simple file, and >> thus supports any type of guest image and block device. The caller must >> not modify the VM's block devices between cprsave and cprload. > > can I ask why you don't just add a migration flag to skip the devices > you don't want, and then do a migrate to a file? > (i.e. migrate "exec:cat > afile") > We already have the 'x-ignore-shared' capability that's used for doing > RAM snapshots of VMs; primarily I think for being able to start a VM > from a RAM snapshot as a fast VM start trick. > (There's also a xen_save_devices that does something similar). > If you backed the RAM as you say, enabled x-ignore-shared and then did: > > migrate "exec:cat > afile" > > and restarted the destination with: > > migrate_incoming "exec:cat afile" > > what is different (except the later stuff about the vfio magic and > chardevs). > > Dave Yes, I did consider whether to extend the migration syntax and implemention in save_vmstate and load_vmstate, versus creating something new. Those functions handle stuff like bdrv snapshot, aio, and migration which are n/a for the cpr use case, and the cpr functions handle state that is n/a for the migration case. I judged that a single function handling both would be less readable and maintainable. At their core all these routines call qemu_loadvm_state() and qemu_savevm_state(). The surrounding code is mostly different. Take a look at savevm.c:save_vmstate() vs save_cpr_snapshot() attached and savevm.c:load_vmstate() vs load_cpr_snapshot() attached I attached the complete versions of the cpr functions because they are built up over multiple patches in this series, thus hard to visualize in patch form. - Steve > >> cprsave and cprload support guests with vfio devices if the caller first >> suspends the guest by issuing guest-suspend-ram to the qemu guest agent. >> The guest drivers suspend methods flush outstanding requests and re- >> initialize the devices, and thus there is no device state to save and >> restore. >> >> 1 savevm: add vmstate handler iterators >> 2 savevm: VM handlers mode mask >> 3 savevm: QMP command for cprsave >> 4 savevm: HMP Command for cprsave >> 5 savevm: QMP command for cprload >> 6 savevm: HMP Command for cprload >> 7 savevm: QMP command for cprinfo >> 8 savevm: HMP command for cprinfo >> 9 savevm: prevent cprsave if memory is volatile >> 10 kvmclock: restore paused KVM clock >> 11 cpu: disable ticks when suspended >> 12 vl: pause option >> 13 gdbstub: gdb support for suspended state >> >> The next patches add a restart method that eliminates the persistent memory >> constraint, and allows qemu to be updated across the restart, but does not >> allow host reboot. Anonymous memory segments used by the guest are >> preserved across a re-exec of qemu, mapped at the same VA, via a proposed >> madvise(MADV_DOEXEC) option in the Linux kernel. See >> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/ >> >> 14 savevm: VMS_RESTART and cprsave restart >> 15 vl: QEMU_START_FREEZE env var >> 16 oslib: add qemu_clr_cloexec >> 17 util: env var helpers >> 18 osdep: import MADV_DOEXEC >> 19 memory: ram_block_add cosmetic changes >> 20 vl: add helper to request re-exec >> 21 exec, memory: exec(3) to restart >> 22 char: qio_channel_socket_accept reuse fd >> 23 char: save/restore chardev socket fds >> 24 ui: save/restore vnc socket fds >> 25 char: save/restore chardev pty fds >> 26 monitor: save/restore QMP negotiation status >> 27 vhost: reset vhost devices upon cprsave >> 28 char: restore terminal on restart >> >> The next patches extend the restart method to save and restore vfio-pci >> state, eliminating the requirement for a guest agent. The vfio container, >> group, and device descriptors are preserved across the qemu re-exec. >> >> 29 pci: export pci_update_mappings >> 30 vfio-pci: save and restore >> 31 vfio-pci: trace pci config >> 32 vfio-pci: improved tracing >> >> Here is an example of updating qemu from v4.2.0 to v4.2.1 using >> "cprload restart". The software update is performed while the guest is >> running to minimize downtime. >> >> window 1 | window 2 >> | >> # qemu-system-x86_64 ... | >> QEMU 4.2.0 monitor - type 'help' ... | >> (qemu) info status | >> VM status: running | >> | # yum update qemu >> (qemu) cprsave /tmp/qemu.sav restart | >> QEMU 4.2.1 monitor - type 'help' ... | >> (qemu) info status | >> VM status: paused (prelaunch) | >> (qemu) cprload /tmp/qemu.sav | >> (qemu) info status | >> VM status: running | >> >> >> Here is an example of updating the host kernel using "cprload reboot" >> >> window 1 | window 2 >> | >> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...| >> QEMU 4.2.1 monitor - type 'help' ... | >> (qemu) info status | >> VM status: running | >> | # yum update kernel-uek >> (qemu) cprsave /tmp/qemu.sav restart | >> | >> # systemctl kexec | >> kexec_core: Starting new kernel | >> ... | >> | >> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...| >> QEMU 4.2.1 monitor - type 'help' ... | >> (qemu) info status | >> VM status: paused (prelaunch) | >> (qemu) cprload /tmp/qemu.sav | >> (qemu) info status | >> VM status: running | >> >> >> Mark Kanda (5): >> char: qio_channel_socket_accept reuse fd >> char: save/restore chardev socket fds >> ui: save/restore vnc socket fds >> monitor: save/restore QMP negotiation status >> vhost: reset vhost devices upon cprsave >> >> Steve Sistare (27): >> savevm: add vmstate handler iterators >> savevm: VM handlers mode mask >> savevm: QMP command for cprsave >> savevm: HMP Command for cprsave >> savevm: QMP command for cprload >> savevm: HMP Command for cprload >> savevm: QMP command for cprinfo >> savevm: HMP command for cprinfo >> savevm: prevent cprsave if memory is volatile >> kvmclock: restore paused KVM clock >> cpu: disable ticks when suspended >> vl: pause option >> gdbstub: gdb support for suspended state >> savevm: VMS_RESTART and cprsave restart >> vl: QEMU_START_FREEZE env var >> oslib: add qemu_clr_cloexec >> util: env var helpers >> osdep: import MADV_DOEXEC >> memory: ram_block_add cosmetic changes >> vl: add helper to request re-exec >> exec, memory: exec(3) to restart >> char: save/restore chardev pty fds >> char: restore terminal on restart >> pci: export pci_update_mappings >> vfio-pci: save and restore >> vfio-pci: trace pci config >> vfio-pci: improved tracing >> >> MAINTAINERS | 7 ++ >> accel/kvm/kvm-all.c | 8 +- >> accel/kvm/trace-events | 3 +- >> chardev/char-pty.c | 38 +++++-- >> chardev/char-socket.c | 35 ++++++ >> chardev/char-stdio.c | 7 ++ >> chardev/char.c | 16 +++ >> exec.c | 88 +++++++++++++-- >> gdbstub.c | 11 +- >> hmp-commands.hx | 46 ++++++++ >> hw/i386/kvm/clock.c | 6 +- >> hw/pci/msix.c | 1 + >> hw/pci/pci.c | 17 +-- >> hw/pci/trace-events | 5 +- >> hw/vfio/common.c | 115 ++++++++++++++++---- >> hw/vfio/pci.c | 179 ++++++++++++++++++++++++++++++- >> hw/vfio/platform.c | 2 +- >> hw/vfio/trace-events | 11 +- >> hw/virtio/vhost.c | 12 +++ >> include/chardev/char.h | 8 ++ >> include/exec/memory.h | 4 + >> include/hw/pci/pci.h | 2 + >> include/hw/vfio/vfio-common.h | 4 +- >> include/io/channel-socket.h | 3 +- >> include/migration/register.h | 3 + >> include/migration/vmstate.h | 11 ++ >> include/monitor/hmp.h | 3 + >> include/qemu/cutils.h | 1 + >> include/qemu/env.h | 31 ++++++ >> include/qemu/osdep.h | 8 ++ >> include/sysemu/sysemu.h | 10 ++ >> io/channel-socket.c | 12 ++- >> io/net-listener.c | 4 +- >> migration/block.c | 1 + >> migration/migration.c | 4 +- >> migration/ram.c | 1 + >> migration/savevm.c | 237 ++++++++++++++++++++++++++++++++++++----- >> migration/savevm.h | 4 +- >> monitor/hmp-cmds.c | 28 +++++ >> monitor/qmp-cmds.c | 16 +++ >> monitor/qmp.c | 42 ++++++++ >> qapi/migration.json | 35 ++++++ >> qapi/pragma.json | 1 + >> qemu-options.hx | 9 ++ >> scsi/qemu-pr-helper.c | 2 +- >> softmmu/vl.c | 65 ++++++++++- >> tests/qtest/tpm-emu.c | 2 +- >> tests/test-char.c | 2 +- >> tests/test-io-channel-socket.c | 4 +- >> trace-events | 2 + >> ui/vnc.c | 153 +++++++++++++++++++++----- >> util/Makefile.objs | 2 +- >> util/env.c | 132 +++++++++++++++++++++++ >> util/oslib-posix.c | 9 ++ >> util/oslib-win32.c | 4 + >> 55 files changed, 1331 insertions(+), 135 deletions(-) >> create mode 100644 include/qemu/env.h >> create mode 100644 util/env.c >> >> -- >> 1.8.3.1 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >