[PATCH v2 00/41] postcopy live migration

* [PATCH v2 00/41] postcopy live migration
@ 2012-06-04  9:57 Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
                   ` (42 more replies)
  0 siblings, 43 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

After the long time, we have v2. This is qemu part.
The linux kernel part is sent separatedly.

Changes v1 -> v2:
- split up patches for review
- buffered file refactored
- many bug fixes
  Espcially PV drivers can work with postcopy
- optimization/heuristic

Patches
1 - 30: refactoring exsiting code and preparation
31 - 37: implement postcopy itself (essential part)
38 - 41: some optimization/heuristic for postcopy

Intro
=====
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration

Usage
=====
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
  command line options
  -postcopy [-postcopy-flags <flags>]
  where flags is for changing behavior for benchmark/debugging
  Currently the following flags are available
  0: default
  1: enable touching page request

  example:
  qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm

- outging part
  options for migrate command 
  migrate [-p [-n] [-m]] URI [<prefault forward> [<prefault backword>]]
  -p: indicate postcopy migration
  -n: disable background transferring pages: This is for benchmark/debugging
  -m: move background transfer of postcopy mode
  <prefault forward>: The number of forward pages which is sent with on-demand
  <prefault backward>: The number of backward pages which is sent with
                       on-demand

  example:
  migrate -p -n tcp:<dest ip address>:4444 
  migrate -p -n -m tcp:<dest ip address>:4444 32 0

TODO
====
- benchmark/evaluation. Especially how async page fault affects the result.
- improve/optimization
  At the moment at least what I'm aware of is
  - making incoming socket non-blocking with thread
    As page compression is comming, it is impractical to non-blocking read
    and check if the necessary data is read.
  - touching pages in incoming qemu process by fd handler seems suboptimal.
    creating dedicated thread?
  - outgoing handler seems suboptimal causing latency.
- consider on FUSE/CUSE possibility
- don't fork umemd, but create thread?

basic postcopy work flow
========================
        qemu on the destination
              |
              V
        open(/dev/umem)
              |
              V
        UMEM_INIT
              |
              V
        Here we have two file descriptors to
        umem device and shmem file
              |
              |                                  umemd
              |                                  daemon on the destination
              |
              V    create pipe to communicate
        fork()---------------------------------------,
              |                                      |
              V                                      |
        close(socket)                                V
        close(shmem)                              mmap(shmem file)
              |                                      |
              V                                      V
        mmap(umem device) for guest RAM           close(shmem file)
              |                                      |
        close(umem device)                           |
              |                                      |
              V                                      |
        wait for ready from daemon <----pipe-----send ready message
              |                                      |
              |                                 Here the daemon takes over 
        send ok------------pipe---------------> the owner of the socket    
              |				        to the source              
              V                                      |
        entering post copy stage                     |
        start guest execution                        |
              |                                      |
              V                                      V
        access guest RAM                          read() to get faulted pages
              |                                      |
              V                                      V
        page fault ------------------------------>page offset is returned
        block                                        |
                                                     V
                                                  pull page from the source
                                                  write the page contents
                                                  to the shmem.
                                                     |
                                                     V
        unblock     <-----------------------------write() to tell served pages
        the fault handler returns the page
        page fault is resolved
              |
              |                                   pages can be sent
              |                                   backgroundly
              |                                      |
              |                                      V
              |                                   write()
              |                                      |
              V                                      V
        The specified pages<-----pipe------------request to touch pages
        are made present by                          |
        touching guest RAM.                          |
              |                                      |
              V                                      V
             reply-------------pipe-------------> release the cached page
              |                                   madvise(MADV_REMOVE)
              |                                      |
              V                                      V

                 all the pages are pulled from the source

              |                                      |
              V                                      V
        the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
       (note: I'm not sure if this can be implemented or not)
              |                                      |
              V                                      V
        migration completes                        exit()

Isaku Yamahata (41):
  arch_init: export sort_ram_list() and ram_save_block()
  arch_init: export RAM_SAVE_xxx flags for postcopy
  arch_init/ram_save: introduce constant for ram save version = 4
  arch_init: refactor host_from_stream_offset()
  arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
  arch_init: refactor ram_save_block()
  arch_init/ram_save_live: factor out ram_save_limit
  arch_init/ram_load: refactor ram_load
  arch_init: introduce helper function to find ram block with id string
  arch_init: simplify a bit by ram_find_block()
  arch_init: factor out counting transferred bytes
  arch_init: factor out setting last_block, last_offset
  exec.c: factor out qemu_get_ram_ptr()
  exec.c: export last_ram_offset()
  savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
  savevm: qemu_pending_size() to return pending buffered size
  savevm, buffered_file: introduce method to drain buffer of buffered
    file
  QEMUFile: add qemu_file_fd() for later use
  savevm/QEMUFile: drop qemu_stdio_fd
  savevm/QEMUFileSocket: drop duplicated member fd
  savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
  savevm/QEMUFile: introduce qemu_fopen_fd
  migration.c: remove redundant line in migrate_init()
  migration: export migrate_fd_completed() and migrate_fd_cleanup()
  migration: factor out parameters into MigrationParams
  buffered_file: factor out buffer management logic
  buffered_file: Introduce QEMUFileNonblock for nonblock write
  buffered_file: add qemu_file to read/write to buffer in memory
  umem.h: import Linux umem.h
  update-linux-headers.sh: teach umem.h to update-linux-headers.sh
  configure: add CONFIG_POSTCOPY option
  savevm: add new section that is used by postcopy
  postcopy: introduce -postcopy and -postcopy-flags option
  postcopy outgoing: add -p and -n option to migrate command
  postcopy: introduce helper functions for postcopy
  postcopy: implement incoming part of postcopy live migration
  postcopy: implement outgoing part of postcopy live migration
  postcopy/outgoing: add forward, backward option to specify the size
    of prefault
  postcopy/outgoing: implement prefault
  migrate: add -m (movebg) option to migrate command
  migration/postcopy: add movebg mode

 Makefile.target                 |    5 +
 arch_init.c                     |  298 ++++---
 arch_init.h                     |   20 +
 block-migration.c               |    8 +-
 buffered_file.c                 |  322 ++++++--
 buffered_file.h                 |   32 +
 configure                       |   12 +
 cpu-all.h                       |    9 +
 exec-obsolete.h                 |    1 +
 exec.c                          |   87 ++-
 hmp-commands.hx                 |   18 +-
 hmp.c                           |   10 +-
 linux-headers/linux/umem.h      |   42 +
 migration-exec.c                |   12 +-
 migration-fd.c                  |   25 +-
 migration-postcopy-stub.c       |   77 ++
 migration-postcopy.c            | 1771 +++++++++++++++++++++++++++++++++++++++
 migration-tcp.c                 |   25 +-
 migration-unix.c                |   26 +-
 migration.c                     |   97 ++-
 migration.h                     |   47 +-
 qapi-schema.json                |    4 +-
 qemu-common.h                   |    2 +
 qemu-file.h                     |    8 +-
 qemu-options.hx                 |   25 +
 qmp-commands.hx                 |    4 +-
 savevm.c                        |  177 ++++-
 scripts/update-linux-headers.sh |    2 +-
 sysemu.h                        |    4 +-
 umem.c                          |  364 ++++++++
 umem.h                          |  101 +++
 vl.c                            |   16 +-
 vmstate.h                       |    2 +-
 33 files changed, 3373 insertions(+), 280 deletions(-)
 create mode 100644 linux-headers/linux/umem.h
 create mode 100644 migration-postcopy-stub.c
 create mode 100644 migration-postcopy.c
 create mode 100644 umem.c
 create mode 100644 umem.h

^ permalink raw reply	[flat|nested] 58+ messages in thread