All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joshua Otto <jtotto@uwaterloo.ca>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: ian.jackson@eu.citrix.com, hjarmstr@uwaterloo.ca,
	wei.liu2@citrix.com, czylin@uwaterloo.ca, imhy.yang@gmail.com
Subject: Re: [PATCH RFC 00/20] Add postcopy live migration support
Date: Fri, 31 Mar 2017 00:51:46 -0400	[thread overview]
Message-ID: <20170331045136.GA2415@eagle> (raw)
In-Reply-To: <e5258f60-12bc-eb11-4203-25bcc02513f9@citrix.com>

On Wed, Mar 29, 2017 at 11:50:52PM +0100, Andrew Cooper wrote:
> On 27/03/2017 10:06, Joshua Otto wrote:
> > Hi,
> >
> > We're a team of three fourth-year undergraduate software engineering students at
> > the University of Waterloo in Canada.  In late 2015 we posted on the list [1] to
> > ask for a project to undertake for our program's capstone design project, and
> > Andrew Cooper pointed us in the direction of the live migration implementation
> > as an area that could use some attention.  We were particularly interested in
> > post-copy live migration (as evaluated by [2] and discussed on the list at [3]),
> > and have been working on an implementation of this on-and-off since then.
> >
> > We now have a working implementation of this scheme, and are submitting it for
> > comment.  The changes are also available as the 'postcopy' branch of the GitHub
> > repository at [4]
> >
> > As a brief overview of our approach:
> > - We introduce a mechanism by which libxl can indicate to the libxc stream
> >   helper process that the iterative migration precopy loop should be terminated
> >   and postcopy should begin.
> > - At this point, we suspend the domain, collect the final set of dirty pfns and
> >   write these pfns (and _not_ their contents) into the stream.
> > - At the destination, the xc restore logic registers itself as a pager for the
> >   migrating domain, 'evicts' all of the pfns indicated by the sender as
> >   outstanding, and then resumes the domain at the destination.
> > - As the domain executes, the migration sender continues to push the remaining
> >   oustanding pages to the receiver in the background.  The receiver
> >   monitors both the stream for incoming page data and the paging ring event
> >   channel for page faults triggered by the guest.  Page faults are forwarded on
> >   the back-channel migration stream to the migration sender, which prioritizes
> >   these pages for transmission.
> >
> > By leveraging the existing paging API, we are able to implement the postcopy
> > scheme without any hypervisor modifications - all of our changes are confined to
> > the userspace toolstack.  However, we inherit from the paging API the
> > requirement that the domains be HVM and that the host have HAP/EPT support.
> 
> Wow.  Considering that the paging API has had no in-tree consumers (and
> its out-of-tree consumer folded), I am astounded that it hasn't bitrotten.

Well, there's tools/xenpaging, which was a helpful reference when
putting this together.  The user-space pager actually has rotted a bit
(I'm fairly certain the VM event ring protocol has changed subtly under
its feet), so I also needed to consult tools/xen-access to get things
right.

> 
> >
> > We haven't yet had the opportunity to perform a quantitative evaluation of the
> > performance trade-offs between the traditional pre-copy and our post-copy
> > strategies, but intend to.  Informally, we've been testing our implementation by
> > migrating a domain running the x86 memtest program (which is obviously a
> > tremendously write-heavy workload), and have observed a substantial reduction in
> > total time required for migration completion (at the expense of a visually
> > obvious 'slowdown' in the execution of the program).
> 
> Do you have any numbers, even for this informal testing?

We have a much more ambitious test matrix planned, but sure, here's an
early encouraging set of measurements - for a domain with 2GB of memory
and a 256MB writable working set (the application driving the writes
being fio submitting writes against a ramdisk), we measured these times:

                    Pre-copy + Stop-and-copy |  1 precopy iteration +
                             (s)             |       postcopy (s)
                   --------------------------+-------------------------
 Precopy Duration:           66.97           |         44.44
 Suspend Duration:            6.807          |          3.23
Postcopy Duration:            N/A            |          4.83

However...

That 3.23s suspend for the hybrid migration seems too high, doesn't it?

There's currently a serious performance bug that we're still trying to
work out in the case of pure-postcopy migrations, with no leading
precopy.  Attempting a pure postcopy migration when running the
experiment above yields:

                     Pure postcopy (s)
                   ----------------------
 Precopy Duration:           0
 Suspend Duration:          21.93
Postcopy Duration:          44.22

Although the postcopy scheme clearly works, it takes 21.93s (!) to
unpause the guest at the destination.  The eviction of the unmigrated
pages completes in a second or two because of the lack of batching
support (still bad, but not this bad) - the holdup is somewhere on the
domain creation sequence between domcreate_stream_done() and
domcreate_complete().

I suspect that this is the result of a bad interaction between QEMU's
startup sequence (its foreign memory mapping behaviour in particular)
and the postcopy paging.  Specifically: the paging ring has room only
for 8 requests at a time.  When QEMU attempts to map a large range, the
range gets postcopy-faulted over synchronously in batches of 8 pages at
a time, and each such batch implies a synchronous copy of its pages
over the network (and the 100us xenforeignmemory_map() retry timer)
before the next batch can begin.

If I am able to confirm that this is the case, a sensible solution would
seem to be supporting paging range-population requests (i.e. a new
paging ring request type for a _range_ of gfns).  In the mean time, you
should expect to observe this effect as well in experiments.  It appears
to be largely (but not completely) mitigated by performing a single
pre-copy iteration first.

> 
> >   We've also noticed that,
> > when performing a postcopy without any leading precopy iterations, the time
> > required at the destination to 'evict' all of the outstanding pages is
> > substantial - possibly because there is no batching mechanism by which pages can
> > be evicted - so this area in particular might require further attention.
> >
> > We're really interested in any feedback you might have!
> 
> Do you have a design document for this?  The spec modifications and code
> comments are great, but there is no substitute (as far as understanding
> goes) for a description in terms of the algorithm and design choices.

As I replied to Wei, not yet, but we'd happily prepare one for v2.

Thanks!

Josh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-03-31  4:52 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-27  9:06 [PATCH RFC 00/20] Add postcopy live migration support Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 01/20] tools: rename COLO 'postcopy' to 'aftercopy' Joshua Otto
2017-03-28 16:34   ` Wei Liu
2017-04-11  6:19     ` Zhang Chen
2017-03-27  9:06 ` [PATCH RFC 02/20] libxc/xc_sr: parameterise write_record() on fd Joshua Otto
2017-03-28 18:53   ` Andrew Cooper
2017-03-31 14:19   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 03/20] libxc/xc_sr_restore.c: use write_record() in send_checkpoint_dirty_pfn_list() Joshua Otto
2017-03-28 18:56   ` Andrew Cooper
2017-03-31 14:19   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 04/20] libxc/xc_sr_save.c: add WRITE_TRIVIAL_RECORD_FN() Joshua Otto
2017-03-28 19:03   ` Andrew Cooper
2017-03-30  4:28     ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 05/20] libxc/xc_sr: factor out filter_pages() Joshua Otto
2017-03-28 19:27   ` Andrew Cooper
2017-03-30  4:42     ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 06/20] libxc/xc_sr: factor helpers out of handle_page_data() Joshua Otto
2017-03-28 19:52   ` Andrew Cooper
2017-03-30  4:49     ` Joshua Otto
2017-04-12 15:16       ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 07/20] migration: defer precopy policy to libxl Joshua Otto
2017-03-29 18:54   ` Jennifer Herbert
2017-03-30  5:28     ` Joshua Otto
2017-03-29 20:18   ` Andrew Cooper
2017-03-30  5:19     ` Joshua Otto
2017-04-12 15:16       ` Wei Liu
2017-04-18 17:56         ` Ian Jackson
2017-03-27  9:06 ` [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters Joshua Otto
2017-03-29 21:08   ` Andrew Cooper
2017-03-30  6:03     ` Joshua Otto
2017-04-12 15:37       ` Wei Liu
2017-04-27 22:51         ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 09/20] libxc/xc_sr_save: introduce save batch types Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 10/20] libxc/xc_sr_save.c: initialise rec.data before free() Joshua Otto
2017-03-28 19:59   ` Andrew Cooper
2017-03-29 17:47     ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 11/20] libxc/migration: correct hvm record ordering specification Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 12/20] libxc/migration: specify postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 13/20] libxc/migration: add try_read_record() Joshua Otto
2017-04-12 15:16   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 14/20] libxc/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 15/20] libxc/migration: implement the receiver " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 16/20] libxl/libxl_stream_write.c: track callback chains with an explicit phase Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 17/20] libxl/libxl_stream_read.c: " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 18/20] libxl/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 19/20] libxl/migration: implement the receiver " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 20/20] tools: expose postcopy live migration support in libxl and xl Joshua Otto
2017-03-28 14:41 ` [PATCH RFC 00/20] Add postcopy live migration support Wei Liu
2017-03-30  4:13   ` Joshua Otto
2017-03-31 14:19     ` Wei Liu
2017-03-29 22:50 ` Andrew Cooper
2017-03-31  4:51   ` Joshua Otto [this message]
2017-04-12 15:38     ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170331045136.GA2415@eagle \
    --to=jtotto@uwaterloo.ca \
    --cc=andrew.cooper3@citrix.com \
    --cc=czylin@uwaterloo.ca \
    --cc=hjarmstr@uwaterloo.ca \
    --cc=ian.jackson@eu.citrix.com \
    --cc=imhy.yang@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.