All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joshua Otto <jtotto@uwaterloo.ca>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: ian.jackson@eu.citrix.com, hjarmstr@uwaterloo.ca,
	wei.liu2@citrix.com, czylin@uwaterloo.ca, imhy.yang@gmail.com
Subject: Re: [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters
Date: Thu, 30 Mar 2017 02:03:29 -0400	[thread overview]
Message-ID: <20170330060329.GE5346@eagle> (raw)
In-Reply-To: <481b024c-83d2-38c2-98a9-59a2bffb8776@citrix.com>

On Wed, Mar 29, 2017 at 10:08:02PM +0100, Andrew Cooper wrote:
> On 27/03/17 10:06, Joshua Otto wrote:
> > In the context of the live migration algorithm, the precopy iteration
> > count refers to the number of page-copying iterations performed prior to
> > the suspension of the guest and transmission of the final set of dirty
> > pages.  Similarly, the precopy dirty threshold refers to the dirty page
> > count below which we judge it more profitable to proceed to
> > stop-and-copy rather than continue with the precopy.  These would be
> > helpful tuning parameters to work with when migrating particularly busy
> > guests, as they enable an administrator to reap the available benefits
> > of the precopy algorithm (the transmission of guest pages _not_ in the
> > writable working set can be completed without guest downtime) while
> > reducing the total amount of time required for the migration (as
> > iterations of the precopy loop that will certainly be redundant can be
> > skipped in favour of an earlier suspension).
> >
> > To expose these tuning parameters to users:
> > - introduce a new libxl API function, libxl_domain_live_migrate(),
> >   taking the same parameters as libxl_domain_suspend() _and_
> >   precopy_iterations and precopy_dirty_threshold parameters, and
> >   consider these parameters in the precopy policy
> >
> >   (though a pair of new parameters on their own might not warrant an
> >   entirely new API function, it is added in anticipation of a number of
> >   additional migration-only parameters that would be cumbersome on the
> >   whole to tack on to the existing suspend API)
> >
> > - switch xl migrate to the new libxl_domain_live_migrate() and add new
> >   --postcopy-iterations and --postcopy-threshold parameters to pass
> >   through
> >
> > Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
> 
> This will have to defer to the tools maintainers, but I purposefully
> didn't expose these knobs to users when rewriting live migration,
> because they cannot be meaningfully chosen by anyone outside of a
> testing scenario.  (That is not to say they aren't useful for testing
> purposes, but I didn't upstream my version of this patch.)

Ahhh, I wondered why those parameters to xc_domain_save() were present
but ignored.  That's reasonable.

I guess the way I had imagined an administrator using them would be in a
non-production/test environment - if they could run workloads
representative of their production application in this environment, they
could experiment with different --precopy-iterations and
--precopy-threshold values (having just a high-level understanding of
what they control) and choose the ones that result in the best outcome
for later use in production.

> I spent quite a while wondering how best to expose these tunables in a
> way that end users could sensibly use them, and the best I came up with
> was this:
> 
> First, run the guest under logdirty for a period of time to establish
> the working set, and how steady it is.  From this, you have a baseline
> for the target threshold, and a plausible way of estimating the
> downtime.  (Better yet, as XenCenter, XenServers windows GUI, has proved
> time and time again, users love graphs!  Even if they don't necessarily
> understand them.)
> 
> From this baseline, the conditions you need to care about are the rate
> of convergence.  On a steady VM, you should converge asymptotically to
> the measured threshold, although on 5 or fewer iterations, the
> asymptotic properties don't appear cleanly.  (Of course, the larger the
> VM, the more iterations, and the more likely to spot this.)
> 
> Users will either care about the migration completing successfully, or
> avoiding interrupting the workload.  The majority case would be both,
> but every user will have one of these two options which is more
> important than the other.  As a result, there need to be some options to
> cover "if $X happens, do I continue or abort".
> 
> The case where the VM becomes more busy is harder however.  For the
> users which care about not interrupting the workload, there will be a
> point above which they'd prefer to abort the migration rather than
> continue it.  For the users which want the migration to complete, they'd
> prefer to pause the VM and take a downtime hit, rather than aborting.
> 
> Therefore, you really need two thresholds; the one above which you
> always abort, the one where you would normally choose to pause.  The
> decision as to what to do depends on where you are between these
> thresholds when the dirty state converges.  (Of course, if the VM
> suddenly becomes more idle, it is sensible to continue beyond the lower
> threshold, as it will reduce the downtime.)  The absolute number of
> iterations on the other hand doesn't actually matter from a users point
> of view, so isn't a useful control to have.
> 
> Another thing to be careful with is the measure of convergence with
> respect to guest busyness, and other factors influencing the absolute
> iteration time, such as congestion of the network between the two
> hosts.  I haven't yet come up with a sensible way of reconciling this
> with the above, in a way which can be expressed as a useful set of controls.
> 
> 
> The plan, following migration v2, was always to come back to this and
> see about doing something better than the current hard coded parameters,
> but I am still working on fixing migration in other areas (not having
> VMs crash when moving, because they observe important differences in the
> hardware).

I think a good strategy would be to solicit three parameters from the
user:
- the precopy duration they're willing to tolerate
- the downtime duration they're willing to tolerate
- the bandwidth of the link between the hosts (we could try and estimate
  it for them but I'd rather just make them run iperf)

Then, after applying this patch, alter the policy so that precopy simply
runs for the duration that the user is willing to wait.  After that,
using the bandwidth estimate, compute the approximate downtime required
to transfer the final set of dirty-pages.  If this is less than what the
user indicated is acceptable, proceed with the stop-and-copy - otherwise
abort.

This still requires the user to figure out for themselves how long their
workload can really wait, but hopefully they already had some idea
before deciding to attempt live migration in the first place.

> How does your postcopy proposal influence/change the above logic?

Well, the 'downtime' phase of the migration becomes a very short, fixed
interval, regardless of guest busyness, so you can't ask the user 'how
much downtime can you tolerate?'  Instead, the question becomes the
murkier 'how much memory performance degradation can your guest
tolerate?'  I.e. is the postcopy migration going to essentially be
downtime, or can useful work get done between faults? (for example,
guests that are I/O bound would do much better with postcopy than they
would with a long stop-and-copy)

To answer that question, they're back to the approach I outlined at the
beginning - they'd have to experiment in a test environment and observe
their workload's response to the alternatives to make an informed
choice.

Cheers,

Josh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-03-30  6:03 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-27  9:06 [PATCH RFC 00/20] Add postcopy live migration support Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 01/20] tools: rename COLO 'postcopy' to 'aftercopy' Joshua Otto
2017-03-28 16:34   ` Wei Liu
2017-04-11  6:19     ` Zhang Chen
2017-03-27  9:06 ` [PATCH RFC 02/20] libxc/xc_sr: parameterise write_record() on fd Joshua Otto
2017-03-28 18:53   ` Andrew Cooper
2017-03-31 14:19   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 03/20] libxc/xc_sr_restore.c: use write_record() in send_checkpoint_dirty_pfn_list() Joshua Otto
2017-03-28 18:56   ` Andrew Cooper
2017-03-31 14:19   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 04/20] libxc/xc_sr_save.c: add WRITE_TRIVIAL_RECORD_FN() Joshua Otto
2017-03-28 19:03   ` Andrew Cooper
2017-03-30  4:28     ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 05/20] libxc/xc_sr: factor out filter_pages() Joshua Otto
2017-03-28 19:27   ` Andrew Cooper
2017-03-30  4:42     ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 06/20] libxc/xc_sr: factor helpers out of handle_page_data() Joshua Otto
2017-03-28 19:52   ` Andrew Cooper
2017-03-30  4:49     ` Joshua Otto
2017-04-12 15:16       ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 07/20] migration: defer precopy policy to libxl Joshua Otto
2017-03-29 18:54   ` Jennifer Herbert
2017-03-30  5:28     ` Joshua Otto
2017-03-29 20:18   ` Andrew Cooper
2017-03-30  5:19     ` Joshua Otto
2017-04-12 15:16       ` Wei Liu
2017-04-18 17:56         ` Ian Jackson
2017-03-27  9:06 ` [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters Joshua Otto
2017-03-29 21:08   ` Andrew Cooper
2017-03-30  6:03     ` Joshua Otto [this message]
2017-04-12 15:37       ` Wei Liu
2017-04-27 22:51         ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 09/20] libxc/xc_sr_save: introduce save batch types Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 10/20] libxc/xc_sr_save.c: initialise rec.data before free() Joshua Otto
2017-03-28 19:59   ` Andrew Cooper
2017-03-29 17:47     ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 11/20] libxc/migration: correct hvm record ordering specification Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 12/20] libxc/migration: specify postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 13/20] libxc/migration: add try_read_record() Joshua Otto
2017-04-12 15:16   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 14/20] libxc/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 15/20] libxc/migration: implement the receiver " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 16/20] libxl/libxl_stream_write.c: track callback chains with an explicit phase Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 17/20] libxl/libxl_stream_read.c: " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 18/20] libxl/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 19/20] libxl/migration: implement the receiver " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 20/20] tools: expose postcopy live migration support in libxl and xl Joshua Otto
2017-03-28 14:41 ` [PATCH RFC 00/20] Add postcopy live migration support Wei Liu
2017-03-30  4:13   ` Joshua Otto
2017-03-31 14:19     ` Wei Liu
2017-03-29 22:50 ` Andrew Cooper
2017-03-31  4:51   ` Joshua Otto
2017-04-12 15:38     ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170330060329.GE5346@eagle \
    --to=jtotto@uwaterloo.ca \
    --cc=andrew.cooper3@citrix.com \
    --cc=czylin@uwaterloo.ca \
    --cc=hjarmstr@uwaterloo.ca \
    --cc=ian.jackson@eu.citrix.com \
    --cc=imhy.yang@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.