All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Introduce migration precopy policy
@ 2017-09-19 18:06 Jennifer Herbert
  2017-09-19 18:06 ` [PATCH v2 1/3] Tidy libxc xc_domain_save Jennifer Herbert
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jennifer Herbert @ 2017-09-19 18:06 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu, xen-devel, jtotto; +Cc: Jennifer Herbert

Hi all,

v2:

Have tidied some formatting, s.o.b Joshua, and added a RFC patch
showing how to use this in  libxl.

v1:

Here I present a updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product,
and hence am trying to get this subset  pulled in now. I also
hope this will aid Josh in pushing the remainder of his series.

Here I present two patches, the first which does some tidy up, removing
unused and unhelpful paramaters to xc_domain_save(), and the second
which allows a precopy callback to be specified, providing the
test for when to end the live phase of migration should end.
If none is provided, a default policy of the current behaviour
is used.

Jennifer Herbert (3):
  Tidy libxc xc_domain_save
  Introduce migration precopy policy
  RFC: migration: defer precopy policy to libxl

 tools/libxc/include/xenguest.h     |  35 +++++++++++--
 tools/libxc/xc_nomigrate.c         |   3 +-
 tools/libxc/xc_sr_common.h         |   6 +--
 tools/libxc/xc_sr_save.c           | 105 ++++++++++++++++++++++++-------------
 tools/libxl/libxl_dom_save.c       |  20 +++++++
 tools/libxl/libxl_save_callout.c   |   4 +-
 tools/libxl/libxl_save_helper.c    |   7 +--
 tools/libxl/libxl_save_msgs_gen.pl |   4 +-
 8 files changed, 130 insertions(+), 54 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] Tidy libxc xc_domain_save
  2017-09-19 18:06 [PATCH v2 0/3] Introduce migration precopy policy Jennifer Herbert
@ 2017-09-19 18:06 ` Jennifer Herbert
  2017-09-19 18:06 ` [PATCH v2 2/3] Introduce migration precopy policy Jennifer Herbert
  2017-09-19 18:06 ` [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl Jennifer Herbert
  2 siblings, 0 replies; 11+ messages in thread
From: Jennifer Herbert @ 2017-09-19 18:06 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu, xen-devel, jtotto; +Cc: Jennifer Herbert

Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenguest.h   | 4 ++--
 tools/libxc/xc_nomigrate.c       | 3 +--
 tools/libxc/xc_sr_save.c         | 8 +++-----
 tools/libxl/libxl_save_callout.c | 4 ++--
 tools/libxl/libxl_save_helper.c  | 7 ++-----
 5 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 5cd8111..6626f0c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -100,8 +100,8 @@ typedef enum {
  *        doesn't use checkpointing
  * @return 0 on success, -1 on failure
  */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
-                   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+                   uint32_t flags /* XCFLAGS_xxx */,
                    struct save_callbacks* callbacks, int hvm,
                    xc_migration_stream_t stream_type, int recv_fd);
 
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 317c8ce..fe8f68c 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
 #include <xenctrl.h>
 #include <xenguest.h>
 
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
-                   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
                    xc_migration_stream_t stream_type, int recv_fd)
 {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index ca6913b..1e7502d 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 };
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
-                   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
-                   struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type, int recv_fd)
+                   uint32_t flags, struct save_callbacks* callbacks,
+                   int hvm, xc_migration_stream_t stream_type, int recv_fd)
 {
     struct xc_sr_context ctx =
         {
@@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
     if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO )
         assert(callbacks->wait_checkpoint);
 
-    DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
-            io_fd, dom, max_iters, max_factor, flags, hvm);
+    DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm);
 
     if ( xc_domain_getinfo(xch, dom, 1, &ctx.dominfo) != 1 )
     {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 891c669..6452d70 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
         libxl__srm_callout_enumcallbacks_save(&shs->callbacks.save.a);
 
     const unsigned long argnums[] = {
-        dss->domid, 0, 0, dss->xcflags, dss->hvm,
-        cbflags, dss->checkpointed_stream,
+        dss->domid, dss->xcflags, dss->hvm, cbflags,
+        dss->checkpointed_stream,
     };
 
     shs->ao = ao;
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 1dece23..38089a0 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -251,8 +251,6 @@ int main(int argc, char **argv)
         io_fd =                             atoi(NEXTARG);
         recv_fd =                           atoi(NEXTARG);
         uint32_t dom =                      strtoul(NEXTARG,0,10);
-        uint32_t max_iters =                strtoul(NEXTARG,0,10);
-        uint32_t max_factor =               strtoul(NEXTARG,0,10);
         uint32_t flags =                    strtoul(NEXTARG,0,10);
         int hvm =                           atoi(NEXTARG);
         unsigned cbflags =                  strtoul(NEXTARG,0,10);
@@ -264,9 +262,8 @@ int main(int argc, char **argv)
         startup("save");
         setup_signals(save_signal_handler);
 
-        r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-                           &helper_save_callbacks, hvm, stream_type,
-                           recv_fd);
+        r = xc_domain_save(xch, io_fd, dom, flags, &helper_save_callbacks,
+                           hvm, stream_type, recv_fd);
         complete(r);
 
     } else if (!strcmp(mode,"--restore-domain")) {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-19 18:06 [PATCH v2 0/3] Introduce migration precopy policy Jennifer Herbert
  2017-09-19 18:06 ` [PATCH v2 1/3] Tidy libxc xc_domain_save Jennifer Herbert
@ 2017-09-19 18:06 ` Jennifer Herbert
  2017-09-20  8:35   ` Paul Durrant
  2017-09-20 10:20   ` Roger Pau Monné
  2017-09-19 18:06 ` [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl Jennifer Herbert
  2 siblings, 2 replies; 11+ messages in thread
From: Jennifer Herbert @ 2017-09-19 18:06 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu, xen-devel, jtotto; +Cc: Jennifer Herbert

This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>

---

v2:

Have made a few formatting corrections, added typedef as suggested.

v1:

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.

---
 tools/libxc/include/xenguest.h | 31 ++++++++++++--
 tools/libxc/xc_sr_common.h     |  6 +--
 tools/libxc/xc_sr_save.c       | 97 +++++++++++++++++++++++++++++-------------
 3 files changed, 97 insertions(+), 37 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..a2a654c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,16 @@
  */
 struct xenevtchn_handle;
 
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+    unsigned int iteration;
+    unsigned int total_written;
+    long dirty_count; /* -1 if unknown */
+};
+
+typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
     /* Called after expiration of checkpoint interval,
@@ -46,7 +56,20 @@ struct save_callbacks {
      */
     int (*suspend)(void* data);
 
-    /* Called after the guest's dirty pages have been
+    /*
+     * Called after every batch of page data sent during the precopy
+     * phase of a live migration to ask the caller what to do next
+     * based on the current state of the precopy migration.
+     */
+#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
+                                        * and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
+                                        * remaining dirty pages. */
+    precopy_policy_t precopy_policy;
+
+    /*
+     * Called after the guest's dirty pages have been
      *  copied into an output buffer.
      * Callback function resumes the guest & the device model,
      *  returns to xc_domain_save.
@@ -55,7 +78,8 @@ struct save_callbacks {
      */
     int (*postcopy)(void* data);
 
-    /* Called after the memory checkpoint has been flushed
+    /*
+     * Called after the memory checkpoint has been flushed
      * out into the network. Typical actions performed in this
      * callback include:
      *   (a) send the saved device model state (for HVM guests),
@@ -65,7 +89,8 @@ struct save_callbacks {
      *
      * returns:
      * 0: terminate checkpointing gracefully
-     * 1: take another checkpoint */
+     * 1: take another checkpoint 
+     */
     int (*checkpoint)(void* data);
 
     /*
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22a..3635704 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -198,12 +198,10 @@ struct xc_sr_context
             /* Further debugging information in the stream. */
             bool debug;
 
-            /* Parameters for tweaking live migration. */
-            unsigned max_iterations;
-            unsigned dirty_threshold;
-
             unsigned long p2m_size;
 
+            struct precopy_stats stats;
+
             xen_pfn_t *batch_pfns;
             unsigned nr_batch_pfns;
             unsigned long *deferred_pages;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1e7502d..f58c008 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
     xc_interface *xch = ctx->xch;
     char *new_str = NULL;
 
-    if ( asprintf(&new_str, "Frames iteration %u of %u",
-                  iter, ctx->save.max_iterations) == -1 )
+    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
     {
         PERROR("Unable to allocate new progress string");
         return -1;
@@ -467,6 +466,24 @@ static int update_progress_string(struct xc_sr_context *ctx,
 }
 
 /*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding when
+ * the precopy phase should terminate and what should be done next.
+ *
+ * The policy implemented here behaves identically to the policy previously
+ * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
+ * the live migration when there are either fewer than 50 dirty pages, or more
+ * than 5 precopy rounds have completed.
+ */
+static int simple_precopy_policy(struct precopy_stats stats, void *user)
+{
+    return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
+            stats.iteration >= 5)
+        ? XGS_POLICY_STOP_AND_COPY
+        : XGS_POLICY_CONTINUE_PRECOPY;
+}
+
+/*
  * Send memory while guest is running.
  */
 static int send_memory_live(struct xc_sr_context *ctx)
@@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
     xc_interface *xch = ctx->xch;
     xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
     char *progress_str = NULL;
-    unsigned x;
+    unsigned int x = 0;
     int rc;
+    int policy_decision;
+
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->save.dirty_bitmap_hbuf);
+
+    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
+    void *data = ctx->save.callbacks->data;
+
+    struct precopy_stats *policy_stats;
 
     rc = update_progress_string(ctx, &progress_str, 0);
     if ( rc )
         goto out;
 
-    rc = send_all_pages(ctx);
-    if ( rc )
-        goto out;
+    ctx->save.stats = (struct precopy_stats)
+        { .dirty_count   = ctx->save.p2m_size };
+    policy_stats = &ctx->save.stats;
+
+    if ( precopy_policy == NULL )
+         precopy_policy = simple_precopy_policy;
+
+    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
+
+    do {
+        policy_decision = precopy_policy(*policy_stats, data);
+        x++;
+
+        if ( stats.dirty_count > 0 && policy_decision != XGS_POLICY_ABORT ) 
+        {
+            rc = update_progress_string(ctx, &progress_str, x);
+            if ( rc )
+                goto out;
+
+            rc = send_dirty_pages(ctx, stats.dirty_count);
+            if ( rc )
+                goto out;
+        }
+
+        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
+            break;
+
+        policy_stats->iteration     = x;
+        policy_stats->total_written += policy_stats->dirty_count;
+        policy_stats->dirty_count   = -1;
+
+        policy_decision = precopy_policy(*policy_stats, data);
+
+        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
+           break;
 
-    for ( x = 1;
-          ((x < ctx->save.max_iterations) &&
-           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
-    {
         if ( xc_shadow_control(
                  xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
                  &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
@@ -499,17 +553,9 @@ static int send_memory_live(struct xc_sr_context *ctx)
             goto out;
         }
 
-        if ( stats.dirty_count == 0 )
-            break;
+        policy_stats->dirty_count = stats.dirty_count;
 
-        rc = update_progress_string(ctx, &progress_str, x);
-        if ( rc )
-            goto out;
-
-        rc = send_dirty_pages(ctx, stats.dirty_count);
-        if ( rc )
-            goto out;
-    }
+    } while ( true );
 
  out:
     xc_set_progress_prefix(xch, NULL);
@@ -601,7 +647,7 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
     if ( ctx->save.live )
     {
         rc = update_progress_string(ctx, &progress_str,
-                                    ctx->save.max_iterations);
+                                    ctx->save.stats.iteration);
         if ( rc )
             goto out;
     }
@@ -937,15 +983,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
            stream_type == XC_MIG_STREAM_REMUS ||
            stream_type == XC_MIG_STREAM_COLO);
 
-    /*
-     * TODO: Find some time to better tweak the live migration algorithm.
-     *
-     * These parameters are better than the legacy algorithm especially for
-     * busy guests.
-     */
-    ctx.save.max_iterations = 5;
-    ctx.save.dirty_threshold = 50;
-
     /* Sanity checks for callbacks. */
     if ( hvm )
         assert(callbacks->switch_qemu_logdirty);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl
  2017-09-19 18:06 [PATCH v2 0/3] Introduce migration precopy policy Jennifer Herbert
  2017-09-19 18:06 ` [PATCH v2 1/3] Tidy libxc xc_domain_save Jennifer Herbert
  2017-09-19 18:06 ` [PATCH v2 2/3] Introduce migration precopy policy Jennifer Herbert
@ 2017-09-19 18:06 ` Jennifer Herbert
  2017-09-20 10:37   ` Roger Pau Monné
  2 siblings, 1 reply; 11+ messages in thread
From: Jennifer Herbert @ 2017-09-19 18:06 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu, xen-devel, jtotto; +Cc: Jennifer Herbert

Provide an implementation of the old policy as a callback in
libxl and plumb it through the IPC machinery to libxc.

This serves as an example for defining a libxl policy,
and provides no advantage over the default policy in libxc.

Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
---

I have included this patch, as rfc, as requested by Ian, to show how
libxl can provide a migration precopy policy. This was part of the
same larger patch from Joshua - I have not changed or tested it.

 tools/libxl/libxl_dom_save.c       | 20 ++++++++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl |  4 +++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 77fe30e..6d28cce 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -328,6 +328,25 @@ int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
     return rc;
 }
 
+/*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding when
+ * the precopy phase should terminate and what should be done next.
+ *
+ * The policy implemented here behaves identically to the policy previously
+ * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
+ * the live migration when there are either fewer than 50 dirty pages, or more
+ * than 5 precopy rounds have completed.
+ */
+static int libxl__save_live_migration_simple_precopy_policy(
+    struct precopy_stats stats, void *user)
+{
+    return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
+            stats.iteration >= 5)
+        ? XGS_POLICY_STOP_AND_COPY
+        : XGS_POLICY_CONTINUE_PRECOPY;
+}
+
 /*----- main code for saving, in order of execution -----*/
 
 void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
@@ -401,6 +420,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_NONE)
         callbacks->suspend = libxl__domain_suspend_callback;
 
+    callbacks->precopy_policy = libxl__save_live_migration_simple_precopy_policy;
     callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
 
     dss->sws.ao  = dss->ao;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 3ae7373..bb1d4e9 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -33,6 +33,7 @@ our @msgs = (
                                               'xen_pfn_t', 'console_gfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
+    [ 10, 'scxW',   "precopy_policy", ['struct precopy_stats', 'stats'] ]
 );
 
 #----------------------------------------
@@ -141,7 +142,8 @@ static void bytes_put(unsigned char *const buf, int *len,
 
 END
 
-foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 'xen_pfn_t') {
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned),
+                        'unsigned long', 'xen_pfn_t', 'struct precopy_stats') {
     my $typeid = typeid($simpletype);
     $out_body{'callout'} .= <<END;
 static int ${typeid}_get(const unsigned char **msg,
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-19 18:06 ` [PATCH v2 2/3] Introduce migration precopy policy Jennifer Herbert
@ 2017-09-20  8:35   ` Paul Durrant
  2017-09-20 10:20   ` Roger Pau Monné
  1 sibling, 0 replies; 11+ messages in thread
From: Paul Durrant @ 2017-09-20  8:35 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu, xen-devel, jtotto; +Cc: Jennifer Herbert

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
> Jennifer Herbert
> Sent: 19 September 2017 19:06
> To: Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu <wei.liu2@citrix.com>;
> xen-devel@lists.xenproject.org; jtotto@uwaterloo.ca
> Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
> Subject: [Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy
> 
> This Patch allows a migration precopy policy to be specified.
> 
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such external state transfer state, interactive
>   user input, or the passage of wall-clock time.
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
> 
> To permit callers to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy, used when
>   precopy_policy callback  is not provided.
> 
> Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
> Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
> 
> ---
> 
> v2:
> 
> Have made a few formatting corrections, added typedef as suggested.
> 
> v1:
> 
> This is updated/modified subset of patch 7/20, part of
> Joshua Otto's "Add postcopy live migration support." patch,
> dated 27th March 2017.  As indicated on the original thread,
> I wish to make use of this this within the XenServer product.
> I hope this will aid Josh in pushing the remainder of his series.
> 
> ---
>  tools/libxc/include/xenguest.h | 31 ++++++++++++--
>  tools/libxc/xc_sr_common.h     |  6 +--
>  tools/libxc/xc_sr_save.c       | 97 +++++++++++++++++++++++++++++--------
> -----
>  3 files changed, 97 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 6626f0c..a2a654c 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,16 @@
>   */
>  struct xenevtchn_handle;
> 
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> +    unsigned int iteration;
> +    unsigned int total_written;
> +    long dirty_count; /* -1 if unknown */
> +};
> +
> +typedef int (*precopy_policy_t)(struct precopy_stats, void *);
> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>      /* Called after expiration of checkpoint interval,
> @@ -46,7 +56,20 @@ struct save_callbacks {
>       */
>      int (*suspend)(void* data);
> 
> -    /* Called after the guest's dirty pages have been
> +    /*
> +     * Called after every batch of page data sent during the precopy
> +     * phase of a live migration to ask the caller what to do next
> +     * based on the current state of the precopy migration.
> +     */
> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
> +                                        * and tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy
> phase. */
> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and
> transmit the
> +                                        * remaining dirty pages. */
> +    precopy_policy_t precopy_policy;
> +
> +    /*
> +     * Called after the guest's dirty pages have been
>       *  copied into an output buffer.
>       * Callback function resumes the guest & the device model,
>       *  returns to xc_domain_save.
> @@ -55,7 +78,8 @@ struct save_callbacks {
>       */
>      int (*postcopy)(void* data);
> 
> -    /* Called after the memory checkpoint has been flushed
> +    /*
> +     * Called after the memory checkpoint has been flushed
>       * out into the network. Typical actions performed in this
>       * callback include:
>       *   (a) send the saved device model state (for HVM guests),
> @@ -65,7 +89,8 @@ struct save_callbacks {
>       *
>       * returns:
>       * 0: terminate checkpointing gracefully
> -     * 1: take another checkpoint */
> +     * 1: take another checkpoint
> +     */
>      int (*checkpoint)(void* data);
> 
>      /*
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index a83f22a..3635704 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -198,12 +198,10 @@ struct xc_sr_context
>              /* Further debugging information in the stream. */
>              bool debug;
> 
> -            /* Parameters for tweaking live migration. */
> -            unsigned max_iterations;
> -            unsigned dirty_threshold;
> -
>              unsigned long p2m_size;
> 
> +            struct precopy_stats stats;
> +
>              xen_pfn_t *batch_pfns;
>              unsigned nr_batch_pfns;
>              unsigned long *deferred_pages;
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 1e7502d..f58c008 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -452,8 +452,7 @@ static int update_progress_string(struct
> xc_sr_context *ctx,
>      xc_interface *xch = ctx->xch;
>      char *new_str = NULL;
> 
> -    if ( asprintf(&new_str, "Frames iteration %u of %u",
> -                  iter, ctx->save.max_iterations) == -1 )
> +    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
>      {
>          PERROR("Unable to allocate new progress string");
>          return -1;
> @@ -467,6 +466,24 @@ static int update_progress_string(struct
> xc_sr_context *ctx,
>  }
> 
>  /*
> + * This is the live migration precopy policy - it's called periodically during
> + * the precopy phase of live migrations, and is responsible for deciding
> when
> + * the precopy phase should terminate and what should be done next.
> + *
> + * The policy implemented here behaves identically to the policy previously
> + * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy
> phase of
> + * the live migration when there are either fewer than 50 dirty pages, or
> more
> + * than 5 precopy rounds have completed.
> + */
> +static int simple_precopy_policy(struct precopy_stats stats, void *user)
> +{
> +    return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
> +            stats.iteration >= 5)
> +        ? XGS_POLICY_STOP_AND_COPY
> +        : XGS_POLICY_CONTINUE_PRECOPY;
> +}
> +
> +/*
>   * Send memory while guest is running.
>   */
>  static int send_memory_live(struct xc_sr_context *ctx)
> @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context
> *ctx)
>      xc_interface *xch = ctx->xch;
>      xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>      char *progress_str = NULL;
> -    unsigned x;
> +    unsigned int x = 0;
>      int rc;
> +    int policy_decision;
> +
> +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> +                                    &ctx->save.dirty_bitmap_hbuf);
> +
> +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
> +    struct precopy_stats *policy_stats;
> 
>      rc = update_progress_string(ctx, &progress_str, 0);
>      if ( rc )
>          goto out;
> 
> -    rc = send_all_pages(ctx);
> -    if ( rc )
> -        goto out;
> +    ctx->save.stats = (struct precopy_stats)
> +        { .dirty_count   = ctx->save.p2m_size };
> +    policy_stats = &ctx->save.stats;
> +
> +    if ( precopy_policy == NULL )
> +         precopy_policy = simple_precopy_policy;
> +
> +    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
> +
> +    do {
> +        policy_decision = precopy_policy(*policy_stats, data);
> +        x++;
> +
> +        if ( stats.dirty_count > 0 && policy_decision != XGS_POLICY_ABORT )
> +        {
> +            rc = update_progress_string(ctx, &progress_str, x);
> +            if ( rc )
> +                goto out;
> +
> +            rc = send_dirty_pages(ctx, stats.dirty_count);
> +            if ( rc )
> +                goto out;
> +        }
> +
> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +            break;
> +
> +        policy_stats->iteration     = x;
> +        policy_stats->total_written += policy_stats->dirty_count;
> +        policy_stats->dirty_count   = -1;
> +
> +        policy_decision = precopy_policy(*policy_stats, data);
> +
> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +           break;
> 
> -    for ( x = 1;
> -          ((x < ctx->save.max_iterations) &&
> -           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
> -    {
>          if ( xc_shadow_control(
>                   xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
>                   &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
> @@ -499,17 +553,9 @@ static int send_memory_live(struct xc_sr_context
> *ctx)
>              goto out;
>          }
> 
> -        if ( stats.dirty_count == 0 )
> -            break;
> +        policy_stats->dirty_count = stats.dirty_count;
> 
> -        rc = update_progress_string(ctx, &progress_str, x);
> -        if ( rc )
> -            goto out;
> -
> -        rc = send_dirty_pages(ctx, stats.dirty_count);
> -        if ( rc )
> -            goto out;
> -    }
> +    } while ( true );

I'm sure any compiler worth its salt will optimise to an unconditional jump, but I tend to prefer using for (;;) for infinite loops. Doesn't really matter so...

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
>   out:
>      xc_set_progress_prefix(xch, NULL);
> @@ -601,7 +647,7 @@ static int suspend_and_send_dirty(struct
> xc_sr_context *ctx)
>      if ( ctx->save.live )
>      {
>          rc = update_progress_string(ctx, &progress_str,
> -                                    ctx->save.max_iterations);
> +                                    ctx->save.stats.iteration);
>          if ( rc )
>              goto out;
>      }
> @@ -937,15 +983,6 @@ int xc_domain_save(xc_interface *xch, int io_fd,
> uint32_t dom,
>             stream_type == XC_MIG_STREAM_REMUS ||
>             stream_type == XC_MIG_STREAM_COLO);
> 
> -    /*
> -     * TODO: Find some time to better tweak the live migration algorithm.
> -     *
> -     * These parameters are better than the legacy algorithm especially for
> -     * busy guests.
> -     */
> -    ctx.save.max_iterations = 5;
> -    ctx.save.dirty_threshold = 50;
> -
>      /* Sanity checks for callbacks. */
>      if ( hvm )
>          assert(callbacks->switch_qemu_logdirty);
> --
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-19 18:06 ` [PATCH v2 2/3] Introduce migration precopy policy Jennifer Herbert
  2017-09-20  8:35   ` Paul Durrant
@ 2017-09-20 10:20   ` Roger Pau Monné
  2017-09-20 16:18     ` Jennifer Herbert
  1 sibling, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2017-09-20 10:20 UTC (permalink / raw)
  To: Jennifer Herbert; +Cc: Wei Liu, Ian Jackson, jtotto, xen-devel

On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote:
> This Patch allows a migration precopy policy to be specified.
> 
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such external state transfer state, interactive
>   user input, or the passage of wall-clock time.
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
> 
> To permit callers to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy, used when
>   precopy_policy callback  is not provided.
> 
> Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
> Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
> 
> ---
> 
> v2:
> 
> Have made a few formatting corrections, added typedef as suggested.
> 
> v1:
> 
> This is updated/modified subset of patch 7/20, part of
> Joshua Otto's "Add postcopy live migration support." patch,
> dated 27th March 2017.  As indicated on the original thread,
> I wish to make use of this this within the XenServer product.
> I hope this will aid Josh in pushing the remainder of his series.
> 
> ---
>  tools/libxc/include/xenguest.h | 31 ++++++++++++--
>  tools/libxc/xc_sr_common.h     |  6 +--
>  tools/libxc/xc_sr_save.c       | 97 +++++++++++++++++++++++++++++-------------
>  3 files changed, 97 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 6626f0c..a2a654c 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -39,6 +39,16 @@
>   */
>  struct xenevtchn_handle;
>  
> +/* For save's precopy_policy(). */
> +struct precopy_stats
> +{
> +    unsigned int iteration;
> +    unsigned int total_written;
> +    long dirty_count; /* -1 if unknown */
> +};
> +
> +typedef int (*precopy_policy_t)(struct precopy_stats, void *);

Shouldn't precopy_stats be a pointer (const pointer probably seeing
it's usage)?

> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>      /* Called after expiration of checkpoint interval,
> @@ -46,7 +56,20 @@ struct save_callbacks {
>       */
>      int (*suspend)(void* data);
>  
> -    /* Called after the guest's dirty pages have been
> +    /*
> +     * Called after every batch of page data sent during the precopy
> +     * phase of a live migration to ask the caller what to do next
> +     * based on the current state of the precopy migration.

I would add:

"Should return one of the values listed below:"

> +     */
> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
> +                                        * and tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
> +                                        * remaining dirty pages. */
> +    precopy_policy_t precopy_policy;
> +
> +    /*
> +     * Called after the guest's dirty pages have been
>       *  copied into an output buffer.
>       * Callback function resumes the guest & the device model,
>       *  returns to xc_domain_save.
> @@ -55,7 +78,8 @@ struct save_callbacks {
>       */
>      int (*postcopy)(void* data);
>  
> -    /* Called after the memory checkpoint has been flushed
> +    /*
> +     * Called after the memory checkpoint has been flushed
>       * out into the network. Typical actions performed in this
>       * callback include:
>       *   (a) send the saved device model state (for HVM guests),
> @@ -65,7 +89,8 @@ struct save_callbacks {
>       *
>       * returns:
>       * 0: terminate checkpointing gracefully
> -     * 1: take another checkpoint */
> +     * 1: take another checkpoint 
                                    ^ trailing space
> +     */
>      int (*checkpoint)(void* data);
>  
>      /*
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index a83f22a..3635704 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -198,12 +198,10 @@ struct xc_sr_context
>              /* Further debugging information in the stream. */
>              bool debug;
>  
> -            /* Parameters for tweaking live migration. */
> -            unsigned max_iterations;
> -            unsigned dirty_threshold;
> -
>              unsigned long p2m_size;
>  
> +            struct precopy_stats stats;
> +
>              xen_pfn_t *batch_pfns;
>              unsigned nr_batch_pfns;
>              unsigned long *deferred_pages;
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 1e7502d..f58c008 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
>      xc_interface *xch = ctx->xch;
>      char *new_str = NULL;
>  
> -    if ( asprintf(&new_str, "Frames iteration %u of %u",
> -                  iter, ctx->save.max_iterations) == -1 )
> +    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )

Why you still need iter here? You could easily fetch it from
ctx->save.stats.iteration

>      {
>          PERROR("Unable to allocate new progress string");
>          return -1;
> @@ -467,6 +466,24 @@ static int update_progress_string(struct xc_sr_context *ctx,
>  }
>  
>  /*
> + * This is the live migration precopy policy - it's called periodically during
> + * the precopy phase of live migrations, and is responsible for deciding when
> + * the precopy phase should terminate and what should be done next.
> + *
> + * The policy implemented here behaves identically to the policy previously
> + * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
> + * the live migration when there are either fewer than 50 dirty pages, or more
> + * than 5 precopy rounds have completed.
> + */
> +static int simple_precopy_policy(struct precopy_stats stats, void *user)
> +{
> +    return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
> +            stats.iteration >= 5)

While here, could you make those values defines?

> +        ? XGS_POLICY_STOP_AND_COPY
> +        : XGS_POLICY_CONTINUE_PRECOPY;
> +}
> +
> +/*
>   * Send memory while guest is running.
>   */
>  static int send_memory_live(struct xc_sr_context *ctx)
> @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
>      xc_interface *xch = ctx->xch;
>      xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>      char *progress_str = NULL;
> -    unsigned x;
> +    unsigned int x = 0;
>      int rc;
> +    int policy_decision;
> +
> +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> +                                    &ctx->save.dirty_bitmap_hbuf);
> +
> +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
> +    void *data = ctx->save.callbacks->data;
> +
> +    struct precopy_stats *policy_stats;
>  
>      rc = update_progress_string(ctx, &progress_str, 0);
>      if ( rc )
>          goto out;
>  
> -    rc = send_all_pages(ctx);
> -    if ( rc )
> -        goto out;
> +    ctx->save.stats = (struct precopy_stats)
> +        { .dirty_count   = ctx->save.p2m_size };

This is exactly the same as 'stats' at this point. I'm slightly
confused about why you need 2 different stats variable, plus a pointer
to a stats variable (stats, ctx->save.stats and *policy_stats).

> +    policy_stats = &ctx->save.stats;
> +
> +    if ( precopy_policy == NULL )
> +         precopy_policy = simple_precopy_policy;
> +
> +    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
> +
> +    do {
> +        policy_decision = precopy_policy(*policy_stats, data);

The comment at the top says:

"Called after every batch of page data sent during the precopy phase"

Yet here the hook seems to be called before any processing has been
done for the first iteration of the loop.

> +        x++;

Also updating x here seems weird, we completely ignore iteration 0.

> +
> +        if ( stats.dirty_count > 0 && policy_decision != XGS_POLICY_ABORT ) 

Trailing space at the end of the line.

> +        {
> +            rc = update_progress_string(ctx, &progress_str, x);
> +            if ( rc )
> +                goto out;
> +
> +            rc = send_dirty_pages(ctx, stats.dirty_count);
> +            if ( rc )
> +                goto out;
> +        }
> +
> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +            break;
> +
> +        policy_stats->iteration     = x;
> +        policy_stats->total_written += policy_stats->dirty_count;
> +        policy_stats->dirty_count   = -1;
> +
> +        policy_decision = precopy_policy(*policy_stats, data);
> +
> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
> +           break;
>  
> -    for ( x = 1;
> -          ((x < ctx->save.max_iterations) &&
> -           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
> -    {
>          if ( xc_shadow_control(
>                   xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
>                   &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
> @@ -499,17 +553,9 @@ static int send_memory_live(struct xc_sr_context *ctx)
>              goto out;
>          }
>  
> -        if ( stats.dirty_count == 0 )
> -            break;
> +        policy_stats->dirty_count = stats.dirty_count;
>  
> -        rc = update_progress_string(ctx, &progress_str, x);
> -        if ( rc )
> -            goto out;
> -
> -        rc = send_dirty_pages(ctx, stats.dirty_count);
> -        if ( rc )
> -            goto out;
> -    }
> +    } while ( true );
>  
>   out:
>      xc_set_progress_prefix(xch, NULL);
> @@ -601,7 +647,7 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
>      if ( ctx->save.live )
>      {
>          rc = update_progress_string(ctx, &progress_str,
> -                                    ctx->save.max_iterations);
> +                                    ctx->save.stats.iteration);

Hm, this as mentioned above seems redundant (ctx is already a
parameter of update_progress_string.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl
  2017-09-19 18:06 ` [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl Jennifer Herbert
@ 2017-09-20 10:37   ` Roger Pau Monné
  0 siblings, 0 replies; 11+ messages in thread
From: Roger Pau Monné @ 2017-09-20 10:37 UTC (permalink / raw)
  To: Jennifer Herbert; +Cc: Wei Liu, Ian Jackson, jtotto, xen-devel

On Tue, Sep 19, 2017 at 07:06:27PM +0100, Jennifer Herbert wrote:
> Provide an implementation of the old policy as a callback in
> libxl and plumb it through the IPC machinery to libxc.
> 
> This serves as an example for defining a libxl policy,
> and provides no advantage over the default policy in libxc.
> 
> Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>

LGTM:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Have we ever thought of providing libxl consumers the ability to set
these hooks? Would that be interesting?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-20 10:20   ` Roger Pau Monné
@ 2017-09-20 16:18     ` Jennifer Herbert
  2017-09-21 11:08       ` Roger Pau Monné
  0 siblings, 1 reply; 11+ messages in thread
From: Jennifer Herbert @ 2017-09-20 16:18 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Wei Liu, Ian Jackson, jtotto, xen-devel

On 20/09/17 11:20, Roger Pau Monné wrote:
> On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote:
>> This Patch allows a migration precopy policy to be specified.
>>
>> The precopy phase of the xc_domain_save() live migration algorithm has
>> historically been implemented to run until either a) (almost) no pages
>> are dirty or b) some fixed, hard-coded maximum number of precopy
>> iterations has been exceeded.  This policy and its implementation are
>> less than ideal for a few reasons:
>> - the logic of the policy is intertwined with the control flow of the
>>    mechanism of the precopy stage
>> - it can't take into account facts external to the immediate
>>    migration context, such external state transfer state, interactive
>>    user input, or the passage of wall-clock time.
>> - it does not permit the user to change their mind, over time, about
>>    what to do at the end of the precopy (they get an unconditional
>>    transition into the stop-and-copy phase of the migration)
>>
>> To permit callers to implement arbitrary higher-level policies governing
>> when the live migration precopy phase should end, and what should be
>> done next:
>> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>>    callbacks
>> - during the precopy phase of live migrations, consult this policy after
>>    each batch of pages transmitted and take the dictated action, which
>>    may be to a) abort the migration entirely, b) continue with the
>>    precopy, or c) proceed to the stop-and-copy phase.
>> - provide an implementation of the old policy, used when
>>    precopy_policy callback  is not provided.
>>
>> Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
>> Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
>>
>> ---
>>
>> v2:
>>
>> Have made a few formatting corrections, added typedef as suggested.
>>
>> v1:
>>
>> This is updated/modified subset of patch 7/20, part of
>> Joshua Otto's "Add postcopy live migration support." patch,
>> dated 27th March 2017.  As indicated on the original thread,
>> I wish to make use of this this within the XenServer product.
>> I hope this will aid Josh in pushing the remainder of his series.
>>
>> ---
>>   tools/libxc/include/xenguest.h | 31 ++++++++++++--
>>   tools/libxc/xc_sr_common.h     |  6 +--
>>   tools/libxc/xc_sr_save.c       | 97 +++++++++++++++++++++++++++++-------------
>>   3 files changed, 97 insertions(+), 37 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
>> index 6626f0c..a2a654c 100644
>> --- a/tools/libxc/include/xenguest.h
>> +++ b/tools/libxc/include/xenguest.h
>> @@ -39,6 +39,16 @@
>>    */
>>   struct xenevtchn_handle;
>>   
>> +/* For save's precopy_policy(). */
>> +struct precopy_stats
>> +{
>> +    unsigned int iteration;
>> +    unsigned int total_written;
>> +    long dirty_count; /* -1 if unknown */
>> +};
>> +
>> +typedef int (*precopy_policy_t)(struct precopy_stats, void *);
> Shouldn't precopy_stats be a pointer (const pointer probably seeing
> it's usage)?

In April Joshua described how he did it like this to help with IPC 
plumbing into libxl.
Ian Jackson explained that you can't pass a pointer across the IPC 
boundary. The two
bits of code run in different processes, with different address spaces.

Since the precopy_stats structure is tiny, it was concluded it would 
have very small impact on performance.
I'd also agree since a pointer to this structure would almost half as 
big at the structure itself.


I think I'll add a comment above the line to explain the decision.

>> +
>>   /* callbacks provided by xc_domain_save */
>>   struct save_callbacks {
>>       /* Called after expiration of checkpoint interval,
>> @@ -46,7 +56,20 @@ struct save_callbacks {
>>        */
>>       int (*suspend)(void* data);
>>   
>> -    /* Called after the guest's dirty pages have been
>> +    /*
>> +     * Called after every batch of page data sent during the precopy
>> +     * phase of a live migration to ask the caller what to do next
>> +     * based on the current state of the precopy migration.
> I would add:
>
> "Should return one of the values listed below:"
>
>> +     */
>> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
>> +                                        * and tidy up. */
>> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
>> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
>> +                                        * remaining dirty pages. */
>> +    precopy_policy_t precopy_policy;
>> +
>> +    /*
>> +     * Called after the guest's dirty pages have been
>>        *  copied into an output buffer.
>>        * Callback function resumes the guest & the device model,
>>        *  returns to xc_domain_save.
>> @@ -55,7 +78,8 @@ struct save_callbacks {
>>        */
>>       int (*postcopy)(void* data);
>>   
>> -    /* Called after the memory checkpoint has been flushed
>> +    /*
>> +     * Called after the memory checkpoint has been flushed
>>        * out into the network. Typical actions performed in this
>>        * callback include:
>>        *   (a) send the saved device model state (for HVM guests),
>> @@ -65,7 +89,8 @@ struct save_callbacks {
>>        *
>>        * returns:
>>        * 0: terminate checkpointing gracefully
>> -     * 1: take another checkpoint */
>> +     * 1: take another checkpoint
>                                      ^ trailing space
>> +     */
>>       int (*checkpoint)(void* data);
>>   
>>       /*
>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>> index a83f22a..3635704 100644
>> --- a/tools/libxc/xc_sr_common.h
>> +++ b/tools/libxc/xc_sr_common.h
>> @@ -198,12 +198,10 @@ struct xc_sr_context
>>               /* Further debugging information in the stream. */
>>               bool debug;
>>   
>> -            /* Parameters for tweaking live migration. */
>> -            unsigned max_iterations;
>> -            unsigned dirty_threshold;
>> -
>>               unsigned long p2m_size;
>>   
>> +            struct precopy_stats stats;
>> +
>>               xen_pfn_t *batch_pfns;
>>               unsigned nr_batch_pfns;
>>               unsigned long *deferred_pages;
>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>> index 1e7502d..f58c008 100644
>> --- a/tools/libxc/xc_sr_save.c
>> +++ b/tools/libxc/xc_sr_save.c
>> @@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
>>       xc_interface *xch = ctx->xch;
>>       char *new_str = NULL;
>>   
>> -    if ( asprintf(&new_str, "Frames iteration %u of %u",
>> -                  iter, ctx->save.max_iterations) == -1 )
>> +    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
> Why you still need iter here? You could easily fetch it from
> ctx->save.stats.iteration

Good spot - will remove.

>>       {
>>           PERROR("Unable to allocate new progress string");
>>           return -1;
>> @@ -467,6 +466,24 @@ static int update_progress_string(struct xc_sr_context *ctx,
>>   }
>>   
>>   /*
>> + * This is the live migration precopy policy - it's called periodically during
>> + * the precopy phase of live migrations, and is responsible for deciding when
>> + * the precopy phase should terminate and what should be done next.
>> + *
>> + * The policy implemented here behaves identically to the policy previously
>> + * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
>> + * the live migration when there are either fewer than 50 dirty pages, or more
>> + * than 5 precopy rounds have completed.
>> + */
>> +static int simple_precopy_policy(struct precopy_stats stats, void *user)
>> +{
>> +    return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
>> +            stats.iteration >= 5)
> While here, could you make those values defines?

sure

>> +        ? XGS_POLICY_STOP_AND_COPY
>> +        : XGS_POLICY_CONTINUE_PRECOPY;
>> +}
>> +
>> +/*
>>    * Send memory while guest is running.
>>    */
>>   static int send_memory_live(struct xc_sr_context *ctx)
>> @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
>>       xc_interface *xch = ctx->xch;
>>       xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>>       char *progress_str = NULL;
>> -    unsigned x;
>> +    unsigned int x = 0;
>>       int rc;
>> +    int policy_decision;
>> +
>> +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>> +                                    &ctx->save.dirty_bitmap_hbuf);
>> +
>> +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
>> +    void *data = ctx->save.callbacks->data;
>> +
>> +    struct precopy_stats *policy_stats;
>>   
>>       rc = update_progress_string(ctx, &progress_str, 0);
>>       if ( rc )
>>           goto out;
>>   
>> -    rc = send_all_pages(ctx);
>> -    if ( rc )
>> -        goto out;
>> +    ctx->save.stats = (struct precopy_stats)
>> +        { .dirty_count   = ctx->save.p2m_size };
> This is exactly the same as 'stats' at this point. I'm slightly
> confused about why you need 2 different stats variable, plus a pointer
> to a stats variable (stats, ctx->save.stats and *policy_stats).

They do start off similar, and are certainly closely related.
xc_shadow_op_stats_t stats has different fields in it then precopy_stats 
policy_stats.
The former has a fault and dirty count, per iteration, while the latter has
iteration number, total_written (over all iterations) and dirty count.

*policy_stats  is just a convenience pointer, reducing the amount of 
indirection on
every access.  I though this made it easier to read.

>> +    policy_stats = &ctx->save.stats;
>> +
>> +    if ( precopy_policy == NULL )
>> +         precopy_policy = simple_precopy_policy;
>> +
>> +    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
>> +
>> +    do {
>> +        policy_decision = precopy_policy(*policy_stats, data);
> The comment at the top says:
>
> "Called after every batch of page data sent during the precopy phase"
>
> Yet here the hook seems to be called before any processing has been
> done for the first iteration of the loop.

I'll change to "Called before and after every batch ...."

>> +        x++;
> Also updating x here seems weird, we completely ignore iteration 0.

The line above the 'x++' checks the policy using 'iteration 0'.  In
patch v1 I used the x variable in initialising the stats, to try and
suggest this, but as its zero, and the default value for a struct is
zero, it was concluded that was unnecessary.  In any case,
logically, this is where it moves from one 'iteration' to another.
Previously there was no iteration zero, as it started on zero.
Now iteration zero is to indicate the starting state.

Combining this comment with Paul's, it could use:
     for (x = 1; ; ++x)
If this is thought to be more readable - although Andrew cooper
described a loop looking like this as "suspicious" on Joshua's version
of this patch.

I have no strong feelings on the matter.... let me know.

>> +
>> +        if ( stats.dirty_count > 0 && policy_decision != XGS_POLICY_ABORT )
> Trailing space at the end of the line.
>
>> +        {
>> +            rc = update_progress_string(ctx, &progress_str, x);
>> +            if ( rc )
>> +                goto out;
>> +
>> +            rc = send_dirty_pages(ctx, stats.dirty_count);
>> +            if ( rc )
>> +                goto out;
>> +        }
>> +
>> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
>> +            break;
>> +
>> +        policy_stats->iteration     = x;
>> +        policy_stats->total_written += policy_stats->dirty_count;
>> +        policy_stats->dirty_count   = -1;
>> +
>> +        policy_decision = precopy_policy(*policy_stats, data);
>> +
>> +        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
>> +           break;
>>   
>> -    for ( x = 1;
>> -          ((x < ctx->save.max_iterations) &&
>> -           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
>> -    {
>>           if ( xc_shadow_control(
>>                    xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
>>                    &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
>> @@ -499,17 +553,9 @@ static int send_memory_live(struct xc_sr_context *ctx)
>>               goto out;
>>           }
>>   
>> -        if ( stats.dirty_count == 0 )
>> -            break;
>> +        policy_stats->dirty_count = stats.dirty_count;
>>   
>> -        rc = update_progress_string(ctx, &progress_str, x);
>> -        if ( rc )
>> -            goto out;
>> -
>> -        rc = send_dirty_pages(ctx, stats.dirty_count);
>> -        if ( rc )
>> -            goto out;
>> -    }
>> +    } while ( true );
>>   
>>    out:
>>       xc_set_progress_prefix(xch, NULL);
>> @@ -601,7 +647,7 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
>>       if ( ctx->save.live )
>>       {
>>           rc = update_progress_string(ctx, &progress_str,
>> -                                    ctx->save.max_iterations);
>> +                                    ctx->save.stats.iteration);
> Hm, this as mentioned above seems redundant (ctx is already a
> parameter of update_progress_string.
>
> Thanks, Roger.

Thanks for the feedback,

-jenny


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-20 16:18     ` Jennifer Herbert
@ 2017-09-21 11:08       ` Roger Pau Monné
  2017-09-21 11:13         ` Wei Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2017-09-21 11:08 UTC (permalink / raw)
  To: Jennifer Herbert; +Cc: Wei Liu, Ian Jackson, jtotto, xen-devel

On Wed, Sep 20, 2017 at 05:18:16PM +0100, Jennifer Herbert wrote:
> On 20/09/17 11:20, Roger Pau Monné wrote:
> > On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote:
> > > +        ? XGS_POLICY_STOP_AND_COPY
> > > +        : XGS_POLICY_CONTINUE_PRECOPY;
> > > +}
> > > +
> > > +/*
> > >    * Send memory while guest is running.
> > >    */
> > >   static int send_memory_live(struct xc_sr_context *ctx)
> > > @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
> > >       xc_interface *xch = ctx->xch;
> > >       xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
> > >       char *progress_str = NULL;
> > > -    unsigned x;
> > > +    unsigned int x = 0;
> > >       int rc;
> > > +    int policy_decision;
> > > +
> > > +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> > > +                                    &ctx->save.dirty_bitmap_hbuf);
> > > +
> > > +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
> > > +    void *data = ctx->save.callbacks->data;
> > > +
> > > +    struct precopy_stats *policy_stats;
> > >       rc = update_progress_string(ctx, &progress_str, 0);
> > >       if ( rc )
> > >           goto out;
> > > -    rc = send_all_pages(ctx);
> > > -    if ( rc )
> > > -        goto out;
> > > +    ctx->save.stats = (struct precopy_stats)
> > > +        { .dirty_count   = ctx->save.p2m_size };
> > This is exactly the same as 'stats' at this point. I'm slightly
> > confused about why you need 2 different stats variable, plus a pointer
> > to a stats variable (stats, ctx->save.stats and *policy_stats).
> 
> They do start off similar, and are certainly closely related.
> xc_shadow_op_stats_t stats has different fields in it then precopy_stats
> policy_stats.
> The former has a fault and dirty count, per iteration, while the latter has
> iteration number, total_written (over all iterations) and dirty count.

OK. I'm not that familiar with this code, so maybe this doesn't make
sense, but wouldn't it be clearer to expand the xc_shadow_op_stats_t
type so that a single variable can contain all this information?

I find it slightly confusing to use two variables of the same type
that track different things.

> *policy_stats  is just a convenience pointer, reducing the amount of
> indirection on
> every access.  I though this made it easier to read.
> 
> > > +    policy_stats = &ctx->save.stats;
> > > +
> > > +    if ( precopy_policy == NULL )
> > > +         precopy_policy = simple_precopy_policy;
> > > +
> > > +    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
> > > +
> > > +    do {
> > > +        policy_decision = precopy_policy(*policy_stats, data);
> > The comment at the top says:
> > 
> > "Called after every batch of page data sent during the precopy phase"
> > 
> > Yet here the hook seems to be called before any processing has been
> > done for the first iteration of the loop.
> 
> I'll change to "Called before and after every batch ...."
> 
> > > +        x++;
> > Also updating x here seems weird, we completely ignore iteration 0.
> 
> The line above the 'x++' checks the policy using 'iteration 0'.  In
> patch v1 I used the x variable in initialising the stats, to try and
> suggest this, but as its zero, and the default value for a struct is
> zero, it was concluded that was unnecessary.  In any case,
> logically, this is where it moves from one 'iteration' to another.
> Previously there was no iteration zero, as it started on zero.
> Now iteration zero is to indicate the starting state.
> 
> Combining this comment with Paul's, it could use:
>     for (x = 1; ; ++x)
> If this is thought to be more readable - although Andrew cooper
> described a loop looking like this as "suspicious" on Joshua's version
> of this patch.
> 
> I have no strong feelings on the matter.... let me know.

I don't really have a strong opinion, I tend to use 'for ( ; ; )' for
unbounded loops, but it's mostly a question of taste.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-21 11:08       ` Roger Pau Monné
@ 2017-09-21 11:13         ` Wei Liu
  2017-09-21 14:44           ` Roger Pau Monné
  0 siblings, 1 reply; 11+ messages in thread
From: Wei Liu @ 2017-09-21 11:13 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Jackson, jtotto, Jennifer Herbert, Wei Liu, xen-devel

On Thu, Sep 21, 2017 at 12:08:04PM +0100, Roger Pau Monné wrote:
> On Wed, Sep 20, 2017 at 05:18:16PM +0100, Jennifer Herbert wrote:
> > On 20/09/17 11:20, Roger Pau Monné wrote:
> > > On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote:
> > > > +        ? XGS_POLICY_STOP_AND_COPY
> > > > +        : XGS_POLICY_CONTINUE_PRECOPY;
> > > > +}
> > > > +
> > > > +/*
> > > >    * Send memory while guest is running.
> > > >    */
> > > >   static int send_memory_live(struct xc_sr_context *ctx)
> > > > @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
> > > >       xc_interface *xch = ctx->xch;
> > > >       xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
> > > >       char *progress_str = NULL;
> > > > -    unsigned x;
> > > > +    unsigned int x = 0;
> > > >       int rc;
> > > > +    int policy_decision;
> > > > +
> > > > +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> > > > +                                    &ctx->save.dirty_bitmap_hbuf);
> > > > +
> > > > +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
> > > > +    void *data = ctx->save.callbacks->data;
> > > > +
> > > > +    struct precopy_stats *policy_stats;
> > > >       rc = update_progress_string(ctx, &progress_str, 0);
> > > >       if ( rc )
> > > >           goto out;
> > > > -    rc = send_all_pages(ctx);
> > > > -    if ( rc )
> > > > -        goto out;
> > > > +    ctx->save.stats = (struct precopy_stats)
> > > > +        { .dirty_count   = ctx->save.p2m_size };
> > > This is exactly the same as 'stats' at this point. I'm slightly
> > > confused about why you need 2 different stats variable, plus a pointer
> > > to a stats variable (stats, ctx->save.stats and *policy_stats).
> > 
> > They do start off similar, and are certainly closely related.
> > xc_shadow_op_stats_t stats has different fields in it then precopy_stats
> > policy_stats.
> > The former has a fault and dirty count, per iteration, while the latter has
> > iteration number, total_written (over all iterations) and dirty count.
> 
> OK. I'm not that familiar with this code, so maybe this doesn't make
> sense, but wouldn't it be clearer to expand the xc_shadow_op_stats_t
> type so that a single variable can contain all this information?
> 
> I find it slightly confusing to use two variables of the same type
> that track different things.
> 

The xc_shadow_op_stats_t is in fact xen_domctl_shadow_op_stats, which
gets passed directly to the hypervisor. So I think having two separate
structs here is okay. They are describing different things after all.

> > *policy_stats  is just a convenience pointer, reducing the amount of
> > indirection on
> > every access.  I though this made it easier to read.
> > 
> > > > +    policy_stats = &ctx->save.stats;
> > > > +
> > > > +    if ( precopy_policy == NULL )
> > > > +         precopy_policy = simple_precopy_policy;
> > > > +
> > > > +    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
> > > > +
> > > > +    do {
> > > > +        policy_decision = precopy_policy(*policy_stats, data);
> > > The comment at the top says:
> > > 
> > > "Called after every batch of page data sent during the precopy phase"
> > > 
> > > Yet here the hook seems to be called before any processing has been
> > > done for the first iteration of the loop.
> > 
> > I'll change to "Called before and after every batch ...."
> > 
> > > > +        x++;
> > > Also updating x here seems weird, we completely ignore iteration 0.
> > 
> > The line above the 'x++' checks the policy using 'iteration 0'.  In
> > patch v1 I used the x variable in initialising the stats, to try and
> > suggest this, but as its zero, and the default value for a struct is
> > zero, it was concluded that was unnecessary.  In any case,
> > logically, this is where it moves from one 'iteration' to another.
> > Previously there was no iteration zero, as it started on zero.
> > Now iteration zero is to indicate the starting state.
> > 
> > Combining this comment with Paul's, it could use:
> >     for (x = 1; ; ++x)
> > If this is thought to be more readable - although Andrew cooper
> > described a loop looking like this as "suspicious" on Joshua's version
> > of this patch.
> > 
> > I have no strong feelings on the matter.... let me know.
> 
> I don't really have a strong opinion, I tend to use 'for ( ; ; )' for
> unbounded loops, but it's mostly a question of taste.
> 

I don't care either. Please pick the style you like. ;-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] Introduce migration precopy policy
  2017-09-21 11:13         ` Wei Liu
@ 2017-09-21 14:44           ` Roger Pau Monné
  0 siblings, 0 replies; 11+ messages in thread
From: Roger Pau Monné @ 2017-09-21 14:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Jackson, Jennifer Herbert, jtotto, xen-devel

On Thu, Sep 21, 2017 at 12:13:55PM +0100, Wei Liu wrote:
> On Thu, Sep 21, 2017 at 12:08:04PM +0100, Roger Pau Monné wrote:
> > On Wed, Sep 20, 2017 at 05:18:16PM +0100, Jennifer Herbert wrote:
> > > On 20/09/17 11:20, Roger Pau Monné wrote:
> > > > On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote:
> > > > > +        ? XGS_POLICY_STOP_AND_COPY
> > > > > +        : XGS_POLICY_CONTINUE_PRECOPY;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > >    * Send memory while guest is running.
> > > > >    */
> > > > >   static int send_memory_live(struct xc_sr_context *ctx)
> > > > > @@ -474,21 +491,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
> > > > >       xc_interface *xch = ctx->xch;
> > > > >       xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
> > > > >       char *progress_str = NULL;
> > > > > -    unsigned x;
> > > > > +    unsigned int x = 0;
> > > > >       int rc;
> > > > > +    int policy_decision;
> > > > > +
> > > > > +    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
> > > > > +                                    &ctx->save.dirty_bitmap_hbuf);
> > > > > +
> > > > > +    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
> > > > > +    void *data = ctx->save.callbacks->data;
> > > > > +
> > > > > +    struct precopy_stats *policy_stats;
> > > > >       rc = update_progress_string(ctx, &progress_str, 0);
> > > > >       if ( rc )
> > > > >           goto out;
> > > > > -    rc = send_all_pages(ctx);
> > > > > -    if ( rc )
> > > > > -        goto out;
> > > > > +    ctx->save.stats = (struct precopy_stats)
> > > > > +        { .dirty_count   = ctx->save.p2m_size };
> > > > This is exactly the same as 'stats' at this point. I'm slightly
> > > > confused about why you need 2 different stats variable, plus a pointer
> > > > to a stats variable (stats, ctx->save.stats and *policy_stats).
> > > 
> > > They do start off similar, and are certainly closely related.
> > > xc_shadow_op_stats_t stats has different fields in it then precopy_stats
> > > policy_stats.
> > > The former has a fault and dirty count, per iteration, while the latter has
> > > iteration number, total_written (over all iterations) and dirty count.
> > 
> > OK. I'm not that familiar with this code, so maybe this doesn't make
> > sense, but wouldn't it be clearer to expand the xc_shadow_op_stats_t
> > type so that a single variable can contain all this information?
> > 
> > I find it slightly confusing to use two variables of the same type
> > that track different things.
> > 
> 
> The xc_shadow_op_stats_t is in fact xen_domctl_shadow_op_stats, which
> gets passed directly to the hypervisor. So I think having two separate
> structs here is okay. They are describing different things after all.

You could have one structure nested inside of the other, but I don't
have such a strong opinion, ie: this is fine.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-21 14:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19 18:06 [PATCH v2 0/3] Introduce migration precopy policy Jennifer Herbert
2017-09-19 18:06 ` [PATCH v2 1/3] Tidy libxc xc_domain_save Jennifer Herbert
2017-09-19 18:06 ` [PATCH v2 2/3] Introduce migration precopy policy Jennifer Herbert
2017-09-20  8:35   ` Paul Durrant
2017-09-20 10:20   ` Roger Pau Monné
2017-09-20 16:18     ` Jennifer Herbert
2017-09-21 11:08       ` Roger Pau Monné
2017-09-21 11:13         ` Wei Liu
2017-09-21 14:44           ` Roger Pau Monné
2017-09-19 18:06 ` [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl Jennifer Herbert
2017-09-20 10:37   ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.