[PATCH RFC 00/20] Add postcopy live migration support

* [PATCH RFC 00/20] Add postcopy live migration support
@ 2017-03-27  9:06 Joshua Otto
  2017-03-27  9:06 ` [PATCH RFC 01/20] tools: rename COLO 'postcopy' to 'aftercopy' Joshua Otto
                   ` (21 more replies)
  0 siblings, 22 replies; 53+ messages in thread
From: Joshua Otto @ 2017-03-27  9:06 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, andrew.cooper3, ian.jackson, czylin, Joshua Otto,
	imhy.yang, hjarmstr

Hi,

We're a team of three fourth-year undergraduate software engineering students at
the University of Waterloo in Canada.  In late 2015 we posted on the list [1] to
ask for a project to undertake for our program's capstone design project, and
Andrew Cooper pointed us in the direction of the live migration implementation
as an area that could use some attention.  We were particularly interested in
post-copy live migration (as evaluated by [2] and discussed on the list at [3]),
and have been working on an implementation of this on-and-off since then.

We now have a working implementation of this scheme, and are submitting it for
comment.  The changes are also available as the 'postcopy' branch of the GitHub
repository at [4]

As a brief overview of our approach:
- We introduce a mechanism by which libxl can indicate to the libxc stream
  helper process that the iterative migration precopy loop should be terminated
  and postcopy should begin.
- At this point, we suspend the domain, collect the final set of dirty pfns and
  write these pfns (and _not_ their contents) into the stream.
- At the destination, the xc restore logic registers itself as a pager for the
  migrating domain, 'evicts' all of the pfns indicated by the sender as
  outstanding, and then resumes the domain at the destination.
- As the domain executes, the migration sender continues to push the remaining
  oustanding pages to the receiver in the background.  The receiver
  monitors both the stream for incoming page data and the paging ring event
  channel for page faults triggered by the guest.  Page faults are forwarded on
  the back-channel migration stream to the migration sender, which prioritizes
  these pages for transmission.

By leveraging the existing paging API, we are able to implement the postcopy
scheme without any hypervisor modifications - all of our changes are confined to
the userspace toolstack.  However, we inherit from the paging API the
requirement that the domains be HVM and that the host have HAP/EPT support.

We haven't yet had the opportunity to perform a quantitative evaluation of the
performance trade-offs between the traditional pre-copy and our post-copy
strategies, but intend to.  Informally, we've been testing our implementation by
migrating a domain running the x86 memtest program (which is obviously a
tremendously write-heavy workload), and have observed a substantial reduction in
total time required for migration completion (at the expense of a visually
obvious 'slowdown' in the execution of the program).  We've also noticed that,
when performing a postcopy without any leading precopy iterations, the time
required at the destination to 'evict' all of the outstanding pages is
substantial - possibly because there is no batching mechanism by which pages can
be evicted - so this area in particular might require further attention.

We're really interested in any feedback you might have!

Thanks!

Harley Armstrong, Chester Lin, Joshua Otto

[1] https://lists.gt.net/xen/devel/410255
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368
[3] https://lists.gt.net/xen/devel/261568
[4] https://github.com/jtotto/xen

Joshua Otto (20):
  tools: rename COLO 'postcopy' to 'aftercopy'
  libxc/xc_sr: parameterise write_record() on fd
  libxc/xc_sr_restore.c: use write_record() in
    send_checkpoint_dirty_pfn_list()
  libxc/xc_sr_save.c: add WRITE_TRIVIAL_RECORD_FN()
  libxc/xc_sr: factor out filter_pages()
  libxc/xc_sr: factor helpers out of handle_page_data()
  migration: defer precopy policy to libxl
  libxl/migration: add precopy tuning parameters
  libxc/xc_sr_save: introduce save batch types
  libxc/xc_sr_save.c: initialise rec.data before free()
  libxc/migration: correct hvm record ordering specification
  libxc/migration: specify postcopy live migration
  libxc/migration: add try_read_record()
  libxc/migration: implement the sender side of postcopy live migration
  libxc/migration: implement the receiver side of postcopy live
    migration
  libxl/libxl_stream_write.c: track callback chains with an explicit
    phase
  libxl/libxl_stream_read.c: track callback chains with an explicit
    phase
  libxl/migration: implement the sender side of postcopy live migration
  libxl/migration: implement the receiver side of postcopy live
    migration
  tools: expose postcopy live migration support in libxl and xl

 docs/specs/libxc-migration-stream.pandoc |  184 ++++-
 docs/specs/libxl-migration-stream.pandoc |   19 +-
 tools/libxc/include/xenguest.h           |  170 ++--
 tools/libxc/xc_nomigrate.c               |    3 +-
 tools/libxc/xc_private.c                 |   21 +-
 tools/libxc/xc_private.h                 |    2 +
 tools/libxc/xc_sr_common.c               |  118 ++-
 tools/libxc/xc_sr_common.h               |  152 +++-
 tools/libxc/xc_sr_common_x86.c           |    2 +-
 tools/libxc/xc_sr_restore.c              | 1297 +++++++++++++++++++++++++-----
 tools/libxc/xc_sr_restore_x86_hvm.c      |   38 +-
 tools/libxc/xc_sr_save.c                 |  828 +++++++++++++++----
 tools/libxc/xc_sr_save_x86_hvm.c         |   18 +-
 tools/libxc/xc_sr_save_x86_pv.c          |   17 +-
 tools/libxc/xc_sr_stream_format.h        |   15 +-
 tools/libxc/xg_save_restore.h            |   16 +-
 tools/libxl/libxl.h                      |   44 +-
 tools/libxl/libxl_colo_restore.c         |    2 +-
 tools/libxl/libxl_colo_save.c            |    2 +-
 tools/libxl/libxl_create.c               |  167 +++-
 tools/libxl/libxl_dom_save.c             |   55 +-
 tools/libxl/libxl_domain.c               |   41 +-
 tools/libxl/libxl_internal.h             |   79 +-
 tools/libxl/libxl_remus.c                |    2 +-
 tools/libxl/libxl_save_callout.c         |    3 +-
 tools/libxl/libxl_save_helper.c          |    7 +-
 tools/libxl/libxl_save_msgs_gen.pl       |   10 +-
 tools/libxl/libxl_sr_stream_format.h     |   13 +-
 tools/libxl/libxl_stream_read.c          |  136 +++-
 tools/libxl/libxl_stream_write.c         |  161 ++--
 tools/ocaml/libs/xl/xenlight_stubs.c     |    2 +-
 tools/xl/xl.h                            |    7 +-
 tools/xl/xl_cmdtable.c                   |   25 +-
 tools/xl/xl_migrate.c                    |   85 +-
 tools/xl/xl_vmcontrol.c                  |    8 +-
 35 files changed, 3144 insertions(+), 605 deletions(-)

-- 
2.7.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread