All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Joshua Otto <jtotto@uwaterloo.ca>, xen-devel@lists.xenproject.org
Cc: ian.jackson@eu.citrix.com, hjarmstr@uwaterloo.ca,
	wei.liu2@citrix.com, czylin@uwaterloo.ca, imhy.yang@gmail.com
Subject: Re: [PATCH RFC 07/20] migration: defer precopy policy to libxl
Date: Wed, 29 Mar 2017 21:18:10 +0100	[thread overview]
Message-ID: <00c2630a-b919-09b5-1a91-116e4a3f9e19@citrix.com> (raw)
In-Reply-To: <1490605592-12189-8-git-send-email-jtotto@uwaterloo.ca>

On 27/03/17 10:06, Joshua Otto wrote:
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such as interactive user input or the passage of
>   wall-clock time
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
>
> To permit users to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy as such a callback in
>   libxl and plumb it through the IPC machinery to libxc, effectively
>   maintaing the old policy for now
>
> Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>

This patch should be split into two.  One modifying libxc to use struct
precopy_stats, and a second to wire up the RPC call.

> ---
>  tools/libxc/include/xenguest.h     |  23 ++++-
>  tools/libxc/xc_nomigrate.c         |   3 +-
>  tools/libxc/xc_sr_common.h         |   7 +-
>  tools/libxc/xc_sr_save.c           | 194 ++++++++++++++++++++++++++-----------
>  tools/libxl/libxl_dom_save.c       |  20 ++++
>  tools/libxl/libxl_save_callout.c   |   3 +-
>  tools/libxl/libxl_save_helper.c    |   7 +-
>  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
>  8 files changed, 189 insertions(+), 72 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index aa8cc8b..30ffb6f 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,14 @@
>   */
>  struct xenevtchn_handle;
>  
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> +    unsigned iteration;
> +    unsigned total_written;
> +    long dirty_count; /* -1 if unknown */

total_written and dirty_count are liable to be equal, so having them as
different widths of integer clearly can't be correct.

> +};
> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>      /* Called after expiration of checkpoint interval,
> @@ -46,6 +54,17 @@ struct save_callbacks {
>       */
>      int (*suspend)(void* data);
>  
> +    /* Called after every batch of page data sent during the precopy phase of a
> +     * live migration to ask the caller what to do next based on the current
> +     * state of the precopy migration.
> +     */
> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely and
> +                                        * tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
> +                                        * remaining dirty pages. */
> +    int (*precopy_policy)(struct precopy_stats stats, void *data);

Structures shouldn't be passed by value like this, as the compiler has
to do a lot of memcpy() work to make it happen.  You should pass by
const pointer, as (as far as I can tell), they are strictly read-only to
the implementation of this hook?

> +
>      /* Called after the guest's dirty pages have been
>       *  copied into an output buffer.
>       * Callback function resumes the guest & the device model,
> @@ -100,8 +119,8 @@ typedef enum {
>   *        doesn't use checkpointing
>   * @return 0 on success, -1 on failure
>   */
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
> -                   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
> +                   uint32_t flags /* XCFLAGS_xxx */,
>                     struct save_callbacks* callbacks, int hvm,
>                     xc_migration_stream_t stream_type, int recv_fd);

It would be cleaner for existing callers, and to extend in the future,
to encapsulate all of these parameters in a struct domain_save_params
and pass it by pointer to here.

That way, we'd avoid the situation we currently have where some
information is passed in bitfields in a single parameter, whereas other
booleans are passed as integers.

The hvm parameter specifically is useless, and can be removed by
rearranging the sanity checks until after the xc_domain_getinfo() call.

>  
> diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
> index 15c838f..2af64e4 100644
> --- a/tools/libxc/xc_nomigrate.c
> +++ b/tools/libxc/xc_nomigrate.c
> @@ -20,8 +20,7 @@
>  #include <xenctrl.h>
>  #include <xenguest.h>
>  
> -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
> -                   uint32_t max_factor, uint32_t flags,
> +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
>                     struct save_callbacks* callbacks, int hvm,
>                     xc_migration_stream_t stream_type, int recv_fd)
>  {
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index b1aa88e..a9160bd 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -198,12 +198,11 @@ struct xc_sr_context
>              /* Further debugging information in the stream. */
>              bool debug;
>  
> -            /* Parameters for tweaking live migration. */
> -            unsigned max_iterations;
> -            unsigned dirty_threshold;
> -
>              unsigned long p2m_size;
>  
> +            struct precopy_stats stats;
> +            int policy_decision;
> +
>              xen_pfn_t *batch_pfns;
>              unsigned nr_batch_pfns;
>              unsigned long *deferred_pages;
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 797aec5..eb95334 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -271,13 +271,29 @@ static int write_batch(struct xc_sr_context *ctx)
>  }
>  
>  /*
> + * Test if the batch is full.
> + */
> +static bool batch_full(struct xc_sr_context *ctx)

const struct xc_sr_context *ctx

This is a predicate, after all.

> +{
> +    return ctx->save.nr_batch_pfns == MAX_BATCH_SIZE;
> +}
> +
> +/*
> + * Test if the batch is empty.
> + */
> +static bool batch_empty(struct xc_sr_context *ctx)
> +{
> +    return ctx->save.nr_batch_pfns == 0;
> +}
> +
> +/*
>   * Flush a batch of pfns into the stream.
>   */
>  static int flush_batch(struct xc_sr_context *ctx)
>  {
>      int rc = 0;
>  
> -    if ( ctx->save.nr_batch_pfns == 0 )
> +    if ( batch_empty(ctx) )
>          return rc;
>  
>      rc = write_batch(ctx);
> @@ -293,19 +309,12 @@ static int flush_batch(struct xc_sr_context *ctx)
>  }
>  
>  /*
> - * Add a single pfn to the batch, flushing the batch if full.
> + * Add a single pfn to the batch.
>   */
> -static int add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
> +static void add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
>  {
> -    int rc = 0;
> -
> -    if ( ctx->save.nr_batch_pfns == MAX_BATCH_SIZE )
> -        rc = flush_batch(ctx);
> -
> -    if ( rc == 0 )
> -        ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
> -
> -    return rc;
> +    assert(ctx->save.nr_batch_pfns < MAX_BATCH_SIZE);
> +    ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
>  }
>  
>  /*
> @@ -352,10 +361,15 @@ static int suspend_domain(struct xc_sr_context *ctx)
>   * Send a subset of pages in the guests p2m, according to the dirty bitmap.
>   * Used for each subsequent iteration of the live migration loop.
>   *
> + * During the precopy stage of a live migration, test the user-supplied
> + * policy function after each batch of pages and cut off the operation
> + * early if indicated.  Unless aborting, the dirty pages remaining in this round
> + * are transferred into the deferred_pages bitmap.

Is this actually a sensible thing to do?  On iteration 0, this is going
to be a phenomenal number of RPC calls, which are all going to make the
same decision.

> + *
>   * Bitmap is bounded by p2m_size.
>   */
>  static int send_dirty_pages(struct xc_sr_context *ctx,
> -                            unsigned long entries)
> +                            unsigned long entries, bool precopy)

Shouldn't this precopy boolean be some kind of state variable in ctx ?

>  {
>      xc_interface *xch = ctx->xch;
>      xen_pfn_t p;
> @@ -364,31 +378,57 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
> -    for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
> +    int (*precopy_policy)(struct precopy_stats, void *) =
> +        ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
> +    assert(batch_empty(ctx));
> +    for ( p = 0, written = 0; p < ctx->save.p2m_size; )

This looks suspicious without an increment.  Conceptually, it might be
better as a do {} while ( decision == XGS_POLICY_CONTINUE_PRECOPY ); loop?

>      {
> -        if ( !test_bit(p, dirty_bitmap) )
> -            continue;
> +        if ( ctx->save.live && precopy )
> +        {
> +            ctx->save.policy_decision = precopy_policy(ctx->save.stats, data);

Newline here please.

> +            if ( ctx->save.policy_decision == XGS_POLICY_ABORT )
> +            {

Please but a log message here indicating that abort has been requested. 
Otherwise, the migration will give up with a failure and no obvious
indication why.

> +                return -1;
> +            }
> +            else if ( ctx->save.policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +            {
> +                /* Any outstanding dirty pages are now deferred until the next
> +                 * phase of the migration. */

/*
 * The comment style for multiline comments
 * is like this.
 */

> +                bitmap_or(ctx->save.deferred_pages, dirty_bitmap,
> +                          ctx->save.p2m_size);
> +                if ( entries > written )
> +                    ctx->save.nr_deferred_pages += entries - written;
> +
> +                goto done;
> +            }
> +        }
>  
> -        rc = add_to_batch(ctx, p);
> +        for ( ; p < ctx->save.p2m_size && !batch_full(ctx); ++p )
> +        {
> +            if ( test_and_clear_bit(p, dirty_bitmap) )
> +            {
> +                add_to_batch(ctx, p);
> +                ++written;
> +                ++ctx->save.stats.total_written;
> +            }
> +        }
> +
> +        rc = flush_batch(ctx);
>          if ( rc )
>              return rc;
>  
> -        /* Update progress every 4MB worth of memory sent. */
> -        if ( (written & ((1U << (22 - 12)) - 1)) == 0 )
> -            xc_report_progress_step(xch, written, entries);
> -
> -        ++written;
> +        /* Update progress after every batch (4MB) worth of memory sent. */
> +        xc_report_progress_step(xch, written, entries);
>      }
>  
> -    rc = flush_batch(ctx);
> -    if ( rc )
> -        return rc;
> -
>      if ( written > entries )
>          DPRINTF("Bitmap contained more entries than expected...");
>  
>      xc_report_progress_step(xch, entries, entries);
>  
> + done:
>      return ctx->save.ops.check_vm_state(ctx);
>  }
>  
> @@ -396,14 +436,14 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
>   * Send all pages in the guests p2m.  Used as the first iteration of the live
>   * migration loop, and for a non-live save.
>   */
> -static int send_all_pages(struct xc_sr_context *ctx)
> +static int send_all_pages(struct xc_sr_context *ctx, bool precopy)
>  {
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
>      bitmap_set(dirty_bitmap, ctx->save.p2m_size);
>  
> -    return send_dirty_pages(ctx, ctx->save.p2m_size);
> +    return send_dirty_pages(ctx, ctx->save.p2m_size, precopy);
>  }
>  
>  static int enable_logdirty(struct xc_sr_context *ctx)
> @@ -446,8 +486,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
>      xc_interface *xch = ctx->xch;
>      char *new_str = NULL;
>  
> -    if ( asprintf(&new_str, "Frames iteration %u of %u",
> -                  iter, ctx->save.max_iterations) == -1 )
> +    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
>      {
>          PERROR("Unable to allocate new progress string");
>          return -1;
> @@ -468,20 +507,47 @@ static int send_memory_live(struct xc_sr_context *ctx)
>      xc_interface *xch = ctx->xch;
>      xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>      char *progress_str = NULL;
> -    unsigned x;
>      int rc;
>  
> +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> +                                    &ctx->save.dirty_bitmap_hbuf);
> +
> +    int (*precopy_policy)(struct precopy_stats, void *) =
> +        ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
>      rc = update_progress_string(ctx, &progress_str, 0);
>      if ( rc )
>          goto out;
>  
> -    rc = send_all_pages(ctx);
> +#define CONSULT_POLICY                                                        \

:(

The reason this code is readable and (hopefully) easy to follow, is due
in large part to a lack of macros like this trying to hide what is
actually going on.

> +    do {                                                                      \
> +        if ( ctx->save.policy_decision == XGS_POLICY_ABORT )                  \
> +        {                                                                     \
> +            rc = -1;                                                          \
> +            goto out;                                                         \
> +        }                                                                     \
> +        else if ( ctx->save.policy_decision != XGS_POLICY_CONTINUE_PRECOPY )  \
> +        {                                                                     \
> +            rc = 0;                                                           \
> +            goto out;                                                         \
> +        }                                                                     \
> +    } while (0)
> +
> +    ctx->save.stats = (struct precopy_stats)
> +        {
> +            .iteration     = 0,
> +            .total_written = 0,
> +            .dirty_count   = -1
> +        };
> +    rc = send_all_pages(ctx, /* precopy */ true);
>      if ( rc )
>          goto out;
>  
> -    for ( x = 1;
> -          ((x < ctx->save.max_iterations) &&
> -           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
> +    /* send_all_pages() has updated the stats */
> +    CONSULT_POLICY;
> +
> +    for ( ctx->save.stats.iteration = 1; ; ++ctx->save.stats.iteration )

Again, without an exit condition, this looks suspicious.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2017-03-29 20:18 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-27  9:06 [PATCH RFC 00/20] Add postcopy live migration support Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 01/20] tools: rename COLO 'postcopy' to 'aftercopy' Joshua Otto
2017-03-28 16:34   ` Wei Liu
2017-04-11  6:19     ` Zhang Chen
2017-03-27  9:06 ` [PATCH RFC 02/20] libxc/xc_sr: parameterise write_record() on fd Joshua Otto
2017-03-28 18:53   ` Andrew Cooper
2017-03-31 14:19   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 03/20] libxc/xc_sr_restore.c: use write_record() in send_checkpoint_dirty_pfn_list() Joshua Otto
2017-03-28 18:56   ` Andrew Cooper
2017-03-31 14:19   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 04/20] libxc/xc_sr_save.c: add WRITE_TRIVIAL_RECORD_FN() Joshua Otto
2017-03-28 19:03   ` Andrew Cooper
2017-03-30  4:28     ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 05/20] libxc/xc_sr: factor out filter_pages() Joshua Otto
2017-03-28 19:27   ` Andrew Cooper
2017-03-30  4:42     ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 06/20] libxc/xc_sr: factor helpers out of handle_page_data() Joshua Otto
2017-03-28 19:52   ` Andrew Cooper
2017-03-30  4:49     ` Joshua Otto
2017-04-12 15:16       ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 07/20] migration: defer precopy policy to libxl Joshua Otto
2017-03-29 18:54   ` Jennifer Herbert
2017-03-30  5:28     ` Joshua Otto
2017-03-29 20:18   ` Andrew Cooper [this message]
2017-03-30  5:19     ` Joshua Otto
2017-04-12 15:16       ` Wei Liu
2017-04-18 17:56         ` Ian Jackson
2017-03-27  9:06 ` [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters Joshua Otto
2017-03-29 21:08   ` Andrew Cooper
2017-03-30  6:03     ` Joshua Otto
2017-04-12 15:37       ` Wei Liu
2017-04-27 22:51         ` Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 09/20] libxc/xc_sr_save: introduce save batch types Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 10/20] libxc/xc_sr_save.c: initialise rec.data before free() Joshua Otto
2017-03-28 19:59   ` Andrew Cooper
2017-03-29 17:47     ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 11/20] libxc/migration: correct hvm record ordering specification Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 12/20] libxc/migration: specify postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 13/20] libxc/migration: add try_read_record() Joshua Otto
2017-04-12 15:16   ` Wei Liu
2017-03-27  9:06 ` [PATCH RFC 14/20] libxc/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 15/20] libxc/migration: implement the receiver " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 16/20] libxl/libxl_stream_write.c: track callback chains with an explicit phase Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 17/20] libxl/libxl_stream_read.c: " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 18/20] libxl/migration: implement the sender side of postcopy live migration Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 19/20] libxl/migration: implement the receiver " Joshua Otto
2017-03-27  9:06 ` [PATCH RFC 20/20] tools: expose postcopy live migration support in libxl and xl Joshua Otto
2017-03-28 14:41 ` [PATCH RFC 00/20] Add postcopy live migration support Wei Liu
2017-03-30  4:13   ` Joshua Otto
2017-03-31 14:19     ` Wei Liu
2017-03-29 22:50 ` Andrew Cooper
2017-03-31  4:51   ` Joshua Otto
2017-04-12 15:38     ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00c2630a-b919-09b5-1a91-116e4a3f9e19@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=czylin@uwaterloo.ca \
    --cc=hjarmstr@uwaterloo.ca \
    --cc=ian.jackson@eu.citrix.com \
    --cc=imhy.yang@gmail.com \
    --cc=jtotto@uwaterloo.ca \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.