All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Remus v5 0/2] Remus support for Migration-v2
@ 2015-05-14 10:06 Yang Hongyang
  2015-05-14 10:06 ` [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
  2015-05-14 10:06 ` [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
  0 siblings, 2 replies; 21+ messages in thread
From: Yang Hongyang @ 2015-05-14 10:06 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

This patchset implement the Remus support for Migration v2 but without
memory compressing.

Git tree available at:
    https://github.com/macrosheep/xen/tree/Remus-newmig-v5

This patchset is based on
    [PATCH v6 00/16] Misc patches to aid migration v2 Remus support
    https://github.com/macrosheep/xen/tree/misc-remus-v6

Yang Hongyang (2):
  libxc/save: implement Remus checkpointed save
  libxc/restore: implement Remus checkpointed restore

 tools/libxc/xc_sr_common.h  |  14 ++++++
 tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
 tools/libxc/xc_sr_save.c    |  80 +++++++++++++++++++++++--------
 3 files changed, 178 insertions(+), 29 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-14 10:06 [PATCH Remus v5 0/2] Remus support for Migration-v2 Yang Hongyang
@ 2015-05-14 10:06 ` Yang Hongyang
  2015-05-14 12:47   ` Ian Campbell
  2015-05-14 10:06 ` [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
  1 sibling, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2015-05-14 10:06 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

With Remus, the save flow should be:
live migration->{ periodically save(checkpointed save) }

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/xc_sr_save.c | 80 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1d0a46d..1c5d199 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx)
 }
 
 /*
+ * Writes an CHECKPOINT record into the stream.
+ */
+static int write_checkpoint_record(struct xc_sr_context *ctx)
+{
+    struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL };
+
+    return write_record(ctx, &checkpoint);
+}
+
+/*
  * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
  * is constructed in ctx->save.batch_pfns.
  *
@@ -467,6 +477,14 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
+    /*
+     * With Remus, we will enter checkpointed save after live migration.
+     * In checkpointed save loop, we skip the live part and pause straight
+     * away to send dirty pages between checkpoints.
+     */
+    if ( !ctx->save.live )
+        goto last_iter;
+
     rc = enable_logdirty(ctx);
     if ( rc )
         goto out;
@@ -505,6 +523,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
             goto out;
     }
 
+ last_iter:
     rc = suspend_domain(ctx);
     if ( rc )
         goto out;
@@ -667,29 +686,52 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
     if ( rc )
         goto err;
 
-    rc = ctx->save.ops.start_of_checkpoint(ctx);
-    if ( rc )
-        goto err;
+    do {
+        rc = ctx->save.ops.start_of_checkpoint(ctx);
+        if ( rc )
+            goto err;
 
-    if ( ctx->save.live )
-        rc = send_domain_memory_live(ctx);
-    else
-        rc = send_domain_memory_nonlive(ctx);
+        if ( ctx->save.live || ctx->save.checkpointed )
+            rc = send_domain_memory_live(ctx);
+        else
+            rc = send_domain_memory_nonlive(ctx);
 
-    if ( rc )
-        goto err;
+        if ( rc )
+            goto err;
 
-    if ( !ctx->dominfo.shutdown ||
-         (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) )
-    {
-        ERROR("Domain has not been suspended");
-        rc = -1;
-        goto err;
-    }
+        if ( !ctx->dominfo.shutdown ||
+            (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) )
+        {
+            ERROR("Domain has not been suspended");
+            rc = -1;
+            goto err;
+        }
 
-    rc = ctx->save.ops.end_of_checkpoint(ctx);
-    if ( rc )
-        goto err;
+        rc = ctx->save.ops.end_of_checkpoint(ctx);
+        if ( rc )
+            goto err;
+
+        if ( ctx->save.checkpointed )
+        {
+            if ( ctx->save.live )
+            {
+                /* End of live migration, we are sending checkpointed stream */
+                ctx->save.live = false;
+            }
+
+            rc = write_checkpoint_record(ctx);
+            if ( rc )
+                goto err;
+
+            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+
+            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+            if ( rc > 0 )
+                xc_report_progress_single(xch, "Checkpointed save");
+            else
+                ctx->save.checkpointed = false;
+        }
+    } while ( ctx->save.checkpointed );
 
     xc_report_progress_single(xch, "End of stream");
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 10:06 [PATCH Remus v5 0/2] Remus support for Migration-v2 Yang Hongyang
  2015-05-14 10:06 ` [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
@ 2015-05-14 10:06 ` Yang Hongyang
  2015-05-14 10:11   ` Andrew Cooper
  2015-05-14 13:05   ` Ian Campbell
  1 sibling, 2 replies; 21+ messages in thread
From: Yang Hongyang @ 2015-05-14 10:06 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

With Remus, the restore flow should be:
the first full migration stream -> { periodically restore stream }

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/xc_sr_common.h  |  14 ++++++
 tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 117 insertions(+), 10 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index f8121e7..3bf27f1 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -208,6 +208,20 @@ struct xc_sr_context
             /* Plain VM, or checkpoints over time. */
             bool checkpointed;
 
+            /* Currently buffering records between a checkpoint */
+            bool buffer_all_records;
+
+/*
+ * With Remus, we buffer the records sent by the primary at checkpoint,
+ * in case the primary will fail, we can recover from the last
+ * checkpoint state.
+ * This should be enough because primary only send dirty pages at
+ * checkpoint.
+ */
+#define MAX_BUF_RECORDS 1024
+            struct xc_sr_record *buffered_records;
+            unsigned buffered_rec_num;
+
             /*
              * Xenstore and Console parameters.
              * INPUT:  evtchn & domid
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 9ab5760..8468ffc 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     return rc;
 }
 
+static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
+static int handle_checkpoint(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    int rc = 0;
+    unsigned i;
+
+    if ( !ctx->restore.checkpointed )
+    {
+        ERROR("Found checkpoint in non-checkpointed stream");
+        rc = -1;
+        goto err;
+    }
+
+    if ( ctx->restore.buffer_all_records )
+    {
+        IPRINTF("All records buffered");
+
+        /*
+         * We need to set buffer_all_records to false in
+         * order to process records instead of buffer records.
+         * buffer_all_records should be set back to true after
+         * we successfully processed all records.
+         */
+        ctx->restore.buffer_all_records = false;
+        for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
+        {
+            rc = process_record(ctx, &ctx->restore.buffered_records[i]);
+            if ( rc )
+                goto err;
+        }
+        ctx->restore.buffered_rec_num = 0;
+        ctx->restore.buffer_all_records = true;
+        IPRINTF("All records processed");
+    }
+    else
+        ctx->restore.buffer_all_records = true;
+
+ err:
+    return rc;
+}
+
 static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
     int rc = 0;
 
+    if ( ctx->restore.buffer_all_records &&
+         rec->type != REC_TYPE_END &&
+         rec->type != REC_TYPE_CHECKPOINT )
+    {
+        if ( ctx->restore.buffered_rec_num >= MAX_BUF_RECORDS )
+        {
+            ERROR("There are too many records within a checkpoint");
+            return -1;
+        }
+
+        memcpy(&ctx->restore.buffered_records[ctx->restore.buffered_rec_num++],
+               rec, sizeof(*rec));
+
+        return 0;
+    }
+
     switch ( rec->type )
     {
     case REC_TYPE_END:
@@ -487,12 +545,17 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         ctx->restore.verify = true;
         break;
 
+    case REC_TYPE_CHECKPOINT:
+        rc = handle_checkpoint(ctx);
+        break;
+
     default:
         rc = ctx->restore.ops.process_record(ctx, rec);
         break;
     }
 
     free(rec->data);
+    rec->data = NULL;
 
     if ( rc == RECORD_NOT_PROCESSED )
     {
@@ -529,6 +592,15 @@ static int setup(struct xc_sr_context *ctx)
         goto err;
     }
 
+    ctx->restore.buffered_records = malloc(
+        MAX_BUF_RECORDS * sizeof(struct xc_sr_record));
+    if ( !ctx->restore.buffered_records )
+    {
+        ERROR("Unable to allocate memory for buffered records");
+        rc = -1;
+        goto err;
+    }
+
  err:
     return rc;
 }
@@ -536,7 +608,12 @@ static int setup(struct xc_sr_context *ctx)
 static void cleanup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
+    unsigned i;
+
+    for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
+        free(ctx->restore.buffered_records[i].data);
 
+    free(ctx->restore.buffered_records);
     free(ctx->restore.populated_pfns);
     if ( ctx->restore.ops.cleanup(ctx) )
         PERROR("Failed to clean up");
@@ -564,7 +641,27 @@ static int restore(struct xc_sr_context *ctx)
     {
         rc = read_record(ctx, &rec);
         if ( rc )
-            goto err;
+        {
+            if ( ctx->restore.buffer_all_records )
+                goto remus_failover;
+            else
+                goto err;
+        }
+
+#ifdef XG_LIBXL_HVM_COMPAT
+        if ( ctx->dominfo.hvm &&
+             (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
+        {
+            rc = read_qemu(ctx);
+            if ( rc )
+            {
+                if ( ctx->restore.buffer_all_records )
+                    goto remus_failover;
+                else
+                    goto err;
+            }
+        }
+#endif
 
         rc = process_record(ctx, &rec);
         if ( rc )
@@ -572,15 +669,11 @@ static int restore(struct xc_sr_context *ctx)
 
     } while ( rec.type != REC_TYPE_END );
 
-#ifdef XG_LIBXL_HVM_COMPAT
-    if ( ctx->dominfo.hvm )
-    {
-        rc = read_qemu(ctx);
-        if ( rc )
-            goto err;
-    }
-#endif
-
+ remus_failover:
+    /*
+     * With Remus, if we reach here, there must be some error on primary,
+     * failover from the last checkpoint state.
+     */
     rc = ctx->restore.ops.stream_complete(ctx);
     if ( rc )
         goto err;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 10:06 ` [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
@ 2015-05-14 10:11   ` Andrew Cooper
  2015-05-14 13:05   ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2015-05-14 10:11 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram

On 14/05/15 11:06, Yang Hongyang wrote:
> With Remus, the restore flow should be:
> the first full migration stream -> { periodically restore stream }
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
>  tools/libxc/xc_sr_common.h  |  14 ++++++
>  tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 117 insertions(+), 10 deletions(-)
>
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index f8121e7..3bf27f1 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -208,6 +208,20 @@ struct xc_sr_context
>              /* Plain VM, or checkpoints over time. */
>              bool checkpointed;
>  
> +            /* Currently buffering records between a checkpoint */
> +            bool buffer_all_records;
> +
> +/*
> + * With Remus, we buffer the records sent by the primary at checkpoint,
> + * in case the primary will fail, we can recover from the last
> + * checkpoint state.
> + * This should be enough because primary only send dirty pages at
> + * checkpoint.
> + */
> +#define MAX_BUF_RECORDS 1024
> +            struct xc_sr_record *buffered_records;
> +            unsigned buffered_rec_num;
> +
>              /*
>               * Xenstore and Console parameters.
>               * INPUT:  evtchn & domid
> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> index 9ab5760..8468ffc 100644
> --- a/tools/libxc/xc_sr_restore.c
> +++ b/tools/libxc/xc_sr_restore.c
> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>      return rc;
>  }
>  
> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
> +static int handle_checkpoint(struct xc_sr_context *ctx)
> +{
> +    xc_interface *xch = ctx->xch;
> +    int rc = 0;
> +    unsigned i;
> +
> +    if ( !ctx->restore.checkpointed )
> +    {
> +        ERROR("Found checkpoint in non-checkpointed stream");
> +        rc = -1;
> +        goto err;
> +    }
> +
> +    if ( ctx->restore.buffer_all_records )
> +    {
> +        IPRINTF("All records buffered");
> +
> +        /*
> +         * We need to set buffer_all_records to false in
> +         * order to process records instead of buffer records.
> +         * buffer_all_records should be set back to true after
> +         * we successfully processed all records.
> +         */
> +        ctx->restore.buffer_all_records = false;
> +        for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
> +        {
> +            rc = process_record(ctx, &ctx->restore.buffered_records[i]);
> +            if ( rc )
> +                goto err;
> +        }
> +        ctx->restore.buffered_rec_num = 0;
> +        ctx->restore.buffer_all_records = true;
> +        IPRINTF("All records processed");
> +    }
> +    else
> +        ctx->restore.buffer_all_records = true;
> +
> + err:
> +    return rc;
> +}
> +
>  static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>  {
>      xc_interface *xch = ctx->xch;
>      int rc = 0;
>  
> +    if ( ctx->restore.buffer_all_records &&
> +         rec->type != REC_TYPE_END &&
> +         rec->type != REC_TYPE_CHECKPOINT )
> +    {
> +        if ( ctx->restore.buffered_rec_num >= MAX_BUF_RECORDS )
> +        {
> +            ERROR("There are too many records within a checkpoint");
> +            return -1;
> +        }
> +
> +        memcpy(&ctx->restore.buffered_records[ctx->restore.buffered_rec_num++],
> +               rec, sizeof(*rec));
> +
> +        return 0;
> +    }
> +
>      switch ( rec->type )
>      {
>      case REC_TYPE_END:
> @@ -487,12 +545,17 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>          ctx->restore.verify = true;
>          break;
>  
> +    case REC_TYPE_CHECKPOINT:
> +        rc = handle_checkpoint(ctx);
> +        break;
> +
>      default:
>          rc = ctx->restore.ops.process_record(ctx, rec);
>          break;
>      }
>  
>      free(rec->data);
> +    rec->data = NULL;
>  
>      if ( rc == RECORD_NOT_PROCESSED )
>      {
> @@ -529,6 +592,15 @@ static int setup(struct xc_sr_context *ctx)
>          goto err;
>      }
>  
> +    ctx->restore.buffered_records = malloc(
> +        MAX_BUF_RECORDS * sizeof(struct xc_sr_record));
> +    if ( !ctx->restore.buffered_records )
> +    {
> +        ERROR("Unable to allocate memory for buffered records");
> +        rc = -1;
> +        goto err;
> +    }
> +
>   err:
>      return rc;
>  }
> @@ -536,7 +608,12 @@ static int setup(struct xc_sr_context *ctx)
>  static void cleanup(struct xc_sr_context *ctx)
>  {
>      xc_interface *xch = ctx->xch;
> +    unsigned i;
> +
> +    for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
> +        free(ctx->restore.buffered_records[i].data);
>  
> +    free(ctx->restore.buffered_records);
>      free(ctx->restore.populated_pfns);
>      if ( ctx->restore.ops.cleanup(ctx) )
>          PERROR("Failed to clean up");
> @@ -564,7 +641,27 @@ static int restore(struct xc_sr_context *ctx)
>      {
>          rc = read_record(ctx, &rec);
>          if ( rc )
> -            goto err;
> +        {
> +            if ( ctx->restore.buffer_all_records )
> +                goto remus_failover;
> +            else
> +                goto err;
> +        }
> +
> +#ifdef XG_LIBXL_HVM_COMPAT
> +        if ( ctx->dominfo.hvm &&
> +             (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
> +        {
> +            rc = read_qemu(ctx);
> +            if ( rc )
> +            {
> +                if ( ctx->restore.buffer_all_records )
> +                    goto remus_failover;
> +                else
> +                    goto err;
> +            }
> +        }
> +#endif
>  
>          rc = process_record(ctx, &rec);
>          if ( rc )
> @@ -572,15 +669,11 @@ static int restore(struct xc_sr_context *ctx)
>  
>      } while ( rec.type != REC_TYPE_END );
>  
> -#ifdef XG_LIBXL_HVM_COMPAT
> -    if ( ctx->dominfo.hvm )
> -    {
> -        rc = read_qemu(ctx);
> -        if ( rc )
> -            goto err;
> -    }
> -#endif
> -
> + remus_failover:
> +    /*
> +     * With Remus, if we reach here, there must be some error on primary,
> +     * failover from the last checkpoint state.
> +     */
>      rc = ctx->restore.ops.stream_complete(ctx);
>      if ( rc )
>          goto err;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-14 10:06 ` [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
@ 2015-05-14 12:47   ` Ian Campbell
  2015-05-15  2:12     ` Yang Hongyang
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2015-05-14 12:47 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> With Remus, the save flow should be:
> live migration->{ periodically save(checkpointed save) }
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxc/xc_sr_save.c | 80 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 61 insertions(+), 19 deletions(-)
> 
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 1d0a46d..1c5d199 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx)
>  }
>  
>  /*
> + * Writes an CHECKPOINT record into the stream.

"a CHECKPOINT"

> + */
> +static int write_checkpoint_record(struct xc_sr_context *ctx)
> +{
> +    struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL };
> +
> +    return write_record(ctx, &checkpoint);
> +}
> +
> +/*
>   * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
>   * is constructed in ctx->save.batch_pfns.
>   *
> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
> +    /*
> +     * With Remus, we will enter checkpointed save after live migration.
> +     * In checkpointed save loop, we skip the live part and pause straight
> +     * away to send dirty pages between checkpoints.
> +     */
> +    if ( !ctx->save.live )
> +        goto last_iter;

Rather than use goto would it work to refactor everything from here to
the label into some sort of helper and just call that in the "actually
live" case?

Or perhaps everything from the label to the end should be a helper
function which the caller can also use in thecheckpoint case instead of
calling send_domain_memory_live (and which s_d_m_l also calls of
course).

> +        if ( ctx->save.checkpointed )
> +        {
> +            if ( ctx->save.live )
> +            {
> +                /* End of live migration, we are sending checkpointed stream */
> +                ctx->save.live = false;

I think I'm misunderstanding either the purpose of this code or the
comment (or both).

Is it the case that a checkpoint starts with an iteration of live (to
transfer everything over) and then drops into sending periodical
non-live updates at each checkpoint?

If so then I think a more useful comment would be:

        /*
         * We have now completed the initial live portion of the
        checkpoint
         * process. Therefore switch into periodically sending synchronous   
         * batches of pages.
         */

Personally I don't have a problem with just a direct assignment without
the if, since assigning false to an already flase value is a nop.

> +            }
> +
> +            rc = write_checkpoint_record(ctx);
> +            if ( rc )
> +                goto err;
> +
> +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
> +
> +            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
> +            if ( rc > 0 )
> +                xc_report_progress_single(xch, "Checkpointed save");
> +            else
> +                ctx->save.checkpointed = false;
> +        }
> +    } while ( ctx->save.checkpointed );
>  
>      xc_report_progress_single(xch, "End of stream");
>  

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 10:06 ` [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
  2015-05-14 10:11   ` Andrew Cooper
@ 2015-05-14 13:05   ` Ian Campbell
  2015-05-14 13:17     ` Andrew Cooper
  2015-05-15  1:32     ` Yang Hongyang
  1 sibling, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2015-05-14 13:05 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> With Remus, the restore flow should be:
> the first full migration stream -> { periodically restore stream }
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxc/xc_sr_common.h  |  14 ++++++
>  tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 117 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index f8121e7..3bf27f1 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -208,6 +208,20 @@ struct xc_sr_context
>              /* Plain VM, or checkpoints over time. */
>              bool checkpointed;
>  
> +            /* Currently buffering records between a checkpoint */
> +            bool buffer_all_records;
> +
> +/*
> + * With Remus, we buffer the records sent by the primary at checkpoint,
> + * in case the primary will fail, we can recover from the last
> + * checkpoint state.
> + * This should be enough because primary only send dirty pages at
> + * checkpoint.

I'm not sure how it then follows that 1024 buffers is guaranteed to be
enough, unless there is something on the sending side arranging it to be
so?

> + */
> +#define MAX_BUF_RECORDS 1024
> +            struct xc_sr_record *buffered_records;
> +            unsigned buffered_rec_num;
> +
>              /*
>               * Xenstore and Console parameters.
>               * INPUT:  evtchn & domid
> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> index 9ab5760..8468ffc 100644
> --- a/tools/libxc/xc_sr_restore.c
> +++ b/tools/libxc/xc_sr_restore.c
> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>      return rc;
>  }
>  
> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
> +static int handle_checkpoint(struct xc_sr_context *ctx)
> +{
> +    xc_interface *xch = ctx->xch;
> +    int rc = 0;
> +    unsigned i;
> +
> +    if ( !ctx->restore.checkpointed )
> +    {
> +        ERROR("Found checkpoint in non-checkpointed stream");
> +        rc = -1;

Is it usual in migrv2 to set errno as well?

> +        goto err;
> +    }
> +
> +    if ( ctx->restore.buffer_all_records )
> +    {
> +        IPRINTF("All records buffered");
> +
> +        /*
> +         * We need to set buffer_all_records to false in
> +         * order to process records instead of buffer records.
> +         * buffer_all_records should be set back to true after
> +         * we successfully processed all records.
> +         */
> +        ctx->restore.buffer_all_records = false;

I'm not personally a fan of changing global state in order to simulate
the action of what should be a parameter to a function.

Preferable IMHO would be to have process_record gain a parameter to
override the ctx state but become an internal helper (perhaps with a
name change) and then have API function process_record and
process_buffered_records or some such which call it in the right way.

Andy may have a differing opinion though.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 13:05   ` Ian Campbell
@ 2015-05-14 13:17     ` Andrew Cooper
  2015-05-14 14:04       ` Ian Campbell
  2015-05-15  1:32     ` Yang Hongyang
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2015-05-14 13:17 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On 14/05/15 14:05, Ian Campbell wrote:
> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>> With Remus, the restore flow should be:
>> the first full migration stream -> { periodically restore stream }
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxc/xc_sr_common.h  |  14 ++++++
>>  tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 117 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>> index f8121e7..3bf27f1 100644
>> --- a/tools/libxc/xc_sr_common.h
>> +++ b/tools/libxc/xc_sr_common.h
>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>              /* Plain VM, or checkpoints over time. */
>>              bool checkpointed;
>>  
>> +            /* Currently buffering records between a checkpoint */
>> +            bool buffer_all_records;
>> +
>> +/*
>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>> + * in case the primary will fail, we can recover from the last
>> + * checkpoint state.
>> + * This should be enough because primary only send dirty pages at
>> + * checkpoint.
> I'm not sure how it then follows that 1024 buffers is guaranteed to be
> enough, unless there is something on the sending side arranging it to be
> so?
>
>> + */
>> +#define MAX_BUF_RECORDS 1024
>> +            struct xc_sr_record *buffered_records;
>> +            unsigned buffered_rec_num;
>> +
>>              /*
>>               * Xenstore and Console parameters.
>>               * INPUT:  evtchn & domid
>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>> index 9ab5760..8468ffc 100644
>> --- a/tools/libxc/xc_sr_restore.c
>> +++ b/tools/libxc/xc_sr_restore.c
>> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>>      return rc;
>>  }
>>  
>> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
>> +static int handle_checkpoint(struct xc_sr_context *ctx)
>> +{
>> +    xc_interface *xch = ctx->xch;
>> +    int rc = 0;
>> +    unsigned i;
>> +
>> +    if ( !ctx->restore.checkpointed )
>> +    {
>> +        ERROR("Found checkpoint in non-checkpointed stream");
>> +        rc = -1;
> Is it usual in migrv2 to set errno as well?

If a relevant errno is to be had.

There are a lot of cases which are waiting for some real libxc error
codes before they can propagate numeric error information, although in
all cases the log messages will be accurate (and hopefully helpful).

~Andrew

>
>> +        goto err;
>> +    }
>> +
>> +    if ( ctx->restore.buffer_all_records )
>> +    {
>> +        IPRINTF("All records buffered");
>> +
>> +        /*
>> +         * We need to set buffer_all_records to false in
>> +         * order to process records instead of buffer records.
>> +         * buffer_all_records should be set back to true after
>> +         * we successfully processed all records.
>> +         */
>> +        ctx->restore.buffer_all_records = false;
> I'm not personally a fan of changing global state in order to simulate
> the action of what should be a parameter to a function.
>
> Preferable IMHO would be to have process_record gain a parameter to
> override the ctx state but become an internal helper (perhaps with a
> name change) and then have API function process_record and
> process_buffered_records or some such which call it in the right way.
>
> Andy may have a differing opinion though.

Hmm yes - it would be nice to split the buffering logic away from the
processing logic.

However, the two are slightly related.

Perhaps a process_or_buffer_record() helper, and removing all buffering
logic from process_record().

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 13:17     ` Andrew Cooper
@ 2015-05-14 14:04       ` Ian Campbell
  2015-05-15  1:08         ` Yang Hongyang
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2015-05-14 14:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

On Thu, 2015-05-14 at 14:17 +0100, Andrew Cooper wrote:
> On 14/05/15 14:05, Ian Campbell wrote:
> > On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> >> With Remus, the restore flow should be:
> >> the first full migration stream -> { periodically restore stream }
> >>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> CC: Ian Campbell <Ian.Campbell@citrix.com>
> >> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> >> CC: Wei Liu <wei.liu2@citrix.com>
> >> ---
> >>  tools/libxc/xc_sr_common.h  |  14 ++++++
> >>  tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
> >>  2 files changed, 117 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> >> index f8121e7..3bf27f1 100644
> >> --- a/tools/libxc/xc_sr_common.h
> >> +++ b/tools/libxc/xc_sr_common.h
> >> @@ -208,6 +208,20 @@ struct xc_sr_context
> >>              /* Plain VM, or checkpoints over time. */
> >>              bool checkpointed;
> >>  
> >> +            /* Currently buffering records between a checkpoint */
> >> +            bool buffer_all_records;
> >> +
> >> +/*
> >> + * With Remus, we buffer the records sent by the primary at checkpoint,
> >> + * in case the primary will fail, we can recover from the last
> >> + * checkpoint state.
> >> + * This should be enough because primary only send dirty pages at
> >> + * checkpoint.
> > I'm not sure how it then follows that 1024 buffers is guaranteed to be
> > enough, unless there is something on the sending side arranging it to be
> > so?
> >
> >> + */
> >> +#define MAX_BUF_RECORDS 1024
> >> +            struct xc_sr_record *buffered_records;
> >> +            unsigned buffered_rec_num;
> >> +
> >>              /*
> >>               * Xenstore and Console parameters.
> >>               * INPUT:  evtchn & domid
> >> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> >> index 9ab5760..8468ffc 100644
> >> --- a/tools/libxc/xc_sr_restore.c
> >> +++ b/tools/libxc/xc_sr_restore.c
> >> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
> >>      return rc;
> >>  }
> >>  
> >> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
> >> +static int handle_checkpoint(struct xc_sr_context *ctx)
> >> +{
> >> +    xc_interface *xch = ctx->xch;
> >> +    int rc = 0;
> >> +    unsigned i;
> >> +
> >> +    if ( !ctx->restore.checkpointed )
> >> +    {
> >> +        ERROR("Found checkpoint in non-checkpointed stream");
> >> +        rc = -1;
> > Is it usual in migrv2 to set errno as well?
> 
> If a relevant errno is to be had.

EINVAL or ENOSYS perhaps?
> 
> There are a lot of cases which are waiting for some real libxc error
> codes before they can propagate numeric error information, although in
> all cases the log messages will be accurate (and hopefully helpful).
> 
> ~Andrew
> 
> >
> >> +        goto err;
> >> +    }
> >> +
> >> +    if ( ctx->restore.buffer_all_records )
> >> +    {
> >> +        IPRINTF("All records buffered");
> >> +
> >> +        /*
> >> +         * We need to set buffer_all_records to false in
> >> +         * order to process records instead of buffer records.
> >> +         * buffer_all_records should be set back to true after
> >> +         * we successfully processed all records.
> >> +         */
> >> +        ctx->restore.buffer_all_records = false;
> > I'm not personally a fan of changing global state in order to simulate
> > the action of what should be a parameter to a function.
> >
> > Preferable IMHO would be to have process_record gain a parameter to
> > override the ctx state but become an internal helper (perhaps with a
> > name change) and then have API function process_record and
> > process_buffered_records or some such which call it in the right way.
> >
> > Andy may have a differing opinion though.
> 
> Hmm yes - it would be nice to split the buffering logic away from the
> processing logic.
> 
> However, the two are slightly related.
> 
> Perhaps a process_or_buffer_record() helper, and removing all buffering
> logic from process_record().

That seems like it would work.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 14:04       ` Ian Campbell
@ 2015-05-15  1:08         ` Yang Hongyang
  0 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  1:08 UTC (permalink / raw)
  To: Ian Campbell, Andrew Cooper
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram



On 05/14/2015 10:04 PM, Ian Campbell wrote:
> On Thu, 2015-05-14 at 14:17 +0100, Andrew Cooper wrote:
>> On 14/05/15 14:05, Ian Campbell wrote:
>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>>>> With Remus, the restore flow should be:
>>>> the first full migration stream -> { periodically restore stream }
>>>>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>> CC: Wei Liu <wei.liu2@citrix.com>
>>>> ---
>>>>   tools/libxc/xc_sr_common.h  |  14 ++++++
>>>>   tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>>>   2 files changed, 117 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>>>> index f8121e7..3bf27f1 100644
>>>> --- a/tools/libxc/xc_sr_common.h
>>>> +++ b/tools/libxc/xc_sr_common.h
>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>>>               /* Plain VM, or checkpoints over time. */
>>>>               bool checkpointed;
>>>>
>>>> +            /* Currently buffering records between a checkpoint */
>>>> +            bool buffer_all_records;
>>>> +
>>>> +/*
>>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>>>> + * in case the primary will fail, we can recover from the last
>>>> + * checkpoint state.
>>>> + * This should be enough because primary only send dirty pages at
>>>> + * checkpoint.
>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
>>> enough, unless there is something on the sending side arranging it to be
>>> so?
>>>
>>>> + */
>>>> +#define MAX_BUF_RECORDS 1024
>>>> +            struct xc_sr_record *buffered_records;
>>>> +            unsigned buffered_rec_num;
>>>> +
>>>>               /*
>>>>                * Xenstore and Console parameters.
>>>>                * INPUT:  evtchn & domid
>>>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>>>> index 9ab5760..8468ffc 100644
>>>> --- a/tools/libxc/xc_sr_restore.c
>>>> +++ b/tools/libxc/xc_sr_restore.c
>>>> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>>>>       return rc;
>>>>   }
>>>>
>>>> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
>>>> +static int handle_checkpoint(struct xc_sr_context *ctx)
>>>> +{
>>>> +    xc_interface *xch = ctx->xch;
>>>> +    int rc = 0;
>>>> +    unsigned i;
>>>> +
>>>> +    if ( !ctx->restore.checkpointed )
>>>> +    {
>>>> +        ERROR("Found checkpoint in non-checkpointed stream");
>>>> +        rc = -1;
>>> Is it usual in migrv2 to set errno as well?
>>
>> If a relevant errno is to be had.
>
> EINVAL or ENOSYS perhaps?
>>
>> There are a lot of cases which are waiting for some real libxc error
>> codes before they can propagate numeric error information, although in
>> all cases the log messages will be accurate (and hopefully helpful).
>>
>> ~Andrew
>>
>>>
>>>> +        goto err;
>>>> +    }
>>>> +
>>>> +    if ( ctx->restore.buffer_all_records )
>>>> +    {
>>>> +        IPRINTF("All records buffered");
>>>> +
>>>> +        /*
>>>> +         * We need to set buffer_all_records to false in
>>>> +         * order to process records instead of buffer records.
>>>> +         * buffer_all_records should be set back to true after
>>>> +         * we successfully processed all records.
>>>> +         */
>>>> +        ctx->restore.buffer_all_records = false;
>>> I'm not personally a fan of changing global state in order to simulate
>>> the action of what should be a parameter to a function.
>>>
>>> Preferable IMHO would be to have process_record gain a parameter to
>>> override the ctx state but become an internal helper (perhaps with a
>>> name change) and then have API function process_record and
>>> process_buffered_records or some such which call it in the right way.
>>>
>>> Andy may have a differing opinion though.
>>
>> Hmm yes - it would be nice to split the buffering logic away from the
>> processing logic.
>>
>> However, the two are slightly related.
>>
>> Perhaps a process_or_buffer_record() helper, and removing all buffering
>> logic from process_record().
>
> That seems like it would work.

Good idea, add a buffer_record() helper should be an improvement to the
code and make it more clearer, thank you!

>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-14 13:05   ` Ian Campbell
  2015-05-14 13:17     ` Andrew Cooper
@ 2015-05-15  1:32     ` Yang Hongyang
  2015-05-15  9:09       ` Ian Campbell
  1 sibling, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  1:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram



On 05/14/2015 09:05 PM, Ian Campbell wrote:
> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>> With Remus, the restore flow should be:
>> the first full migration stream -> { periodically restore stream }
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>>   tools/libxc/xc_sr_common.h  |  14 ++++++
>>   tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>   2 files changed, 117 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>> index f8121e7..3bf27f1 100644
>> --- a/tools/libxc/xc_sr_common.h
>> +++ b/tools/libxc/xc_sr_common.h
>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>               /* Plain VM, or checkpoints over time. */
>>               bool checkpointed;
>>
>> +            /* Currently buffering records between a checkpoint */
>> +            bool buffer_all_records;
>> +
>> +/*
>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>> + * in case the primary will fail, we can recover from the last
>> + * checkpoint state.
>> + * This should be enough because primary only send dirty pages at
>> + * checkpoint.
>
> I'm not sure how it then follows that 1024 buffers is guaranteed to be
> enough, unless there is something on the sending side arranging it to be
> so?

There are only few records at every checkpoint in my test, mostly under 10,
probably because I don't do much operations in the Guest. I thought This limit
can be adjusted later by further testing.
Since you and Andy both have doubts on this, I have to reconsider on this,
perhaps there should be no limit. Even if the 1024 limit works for
most of the cases, there might be cases that exceed the limit. So I will
add another member 'allocated_rec_num' in the context, when the
'buffered_rec_num' exceed the 'allocated_rec_num', I will reallocate the buffer.
The initial buffer size will be 1024 records which will work for most cases.

/*
  * With Remus, we buffer the records sent by the primary at checkpoint,
  * in case the primary will fail, we can recover from the last
  * checkpoint state.
  * This should be enough for most of the cases because primary only send
  * dirty pages at checkpoint.
  */
#define DEFAULT_BUF_RECORDS 1024
             struct xc_sr_record *buffered_records;
             unsigned allocated_rec_num;
             unsigned buffered_rec_num;

>
>> + */
>> +#define MAX_BUF_RECORDS 1024
>> +            struct xc_sr_record *buffered_records;
>> +            unsigned buffered_rec_num;
>> +
>>               /*
>>                * Xenstore and Console parameters.
>>                * INPUT:  evtchn & domid
>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>> index 9ab5760..8468ffc 100644
>> --- a/tools/libxc/xc_sr_restore.c
>> +++ b/tools/libxc/xc_sr_restore.c
>> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>>       return rc;
>>   }
>>
>> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
>> +static int handle_checkpoint(struct xc_sr_context *ctx)
>> +{
>> +    xc_interface *xch = ctx->xch;
>> +    int rc = 0;
>> +    unsigned i;
>> +
>> +    if ( !ctx->restore.checkpointed )
>> +    {
>> +        ERROR("Found checkpoint in non-checkpointed stream");
>> +        rc = -1;
>
> Is it usual in migrv2 to set errno as well?
>
>> +        goto err;
>> +    }
>> +
>> +    if ( ctx->restore.buffer_all_records )
>> +    {
>> +        IPRINTF("All records buffered");
>> +
>> +        /*
>> +         * We need to set buffer_all_records to false in
>> +         * order to process records instead of buffer records.
>> +         * buffer_all_records should be set back to true after
>> +         * we successfully processed all records.
>> +         */
>> +        ctx->restore.buffer_all_records = false;
>
> I'm not personally a fan of changing global state in order to simulate
> the action of what should be a parameter to a function.
>
> Preferable IMHO would be to have process_record gain a parameter to
> override the ctx state but become an internal helper (perhaps with a
> name change) and then have API function process_record and
> process_buffered_records or some such which call it in the right way.
>
> Andy may have a differing opinion though.
>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-14 12:47   ` Ian Campbell
@ 2015-05-15  2:12     ` Yang Hongyang
  2015-05-15  8:29       ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  2:12 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram



On 05/14/2015 08:47 PM, Ian Campbell wrote:
> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>> With Remus, the save flow should be:
>> live migration->{ periodically save(checkpointed save) }
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>>   tools/libxc/xc_sr_save.c | 80 ++++++++++++++++++++++++++++++++++++------------
>>   1 file changed, 61 insertions(+), 19 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>> index 1d0a46d..1c5d199 100644
>> --- a/tools/libxc/xc_sr_save.c
>> +++ b/tools/libxc/xc_sr_save.c
>> @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx)
>>   }
>>
>>   /*
>> + * Writes an CHECKPOINT record into the stream.
>
> "a CHECKPOINT"
>
>> + */
>> +static int write_checkpoint_record(struct xc_sr_context *ctx)
>> +{
>> +    struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL };
>> +
>> +    return write_record(ctx, &checkpoint);
>> +}
>> +
>> +/*
>>    * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
>>    * is constructed in ctx->save.batch_pfns.
>>    *
>> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
>>       DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>                                       &ctx->save.dirty_bitmap_hbuf);
>>
>> +    /*
>> +     * With Remus, we will enter checkpointed save after live migration.
>> +     * In checkpointed save loop, we skip the live part and pause straight
>> +     * away to send dirty pages between checkpoints.
>> +     */
>> +    if ( !ctx->save.live )
>> +        goto last_iter;
>
> Rather than use goto would it work to refactor everything from here to
> the label into some sort of helper and just call that in the "actually
> live" case?
>
> Or perhaps everything from the label to the end should be a helper
> function which the caller can also use in thecheckpoint case instead of
> calling send_domain_memory_live (and which s_d_m_l also calls of
> course).

I'm going to refactor the send_domain_memory_live() as follows:

split the send_domain_memory_live() into three helper function:
   - send_memory_live()  do the actually live case
   - suspend_and_send_dirty() suspend the guest and send dirty pages
   - send_memory_verify()

then:
   - send_domain_memory_live() combination of those three helper functions
   - send_domain_momory_checkpointed() calls suspend_and_send_dirty() and
                                       send_memory_verify()
   - send_domain_memory_nonlive() stay as it is

Does it make sense?

>
>> +        if ( ctx->save.checkpointed )
>> +        {
>> +            if ( ctx->save.live )
>> +            {
>> +                /* End of live migration, we are sending checkpointed stream */
>> +                ctx->save.live = false;
>
> I think I'm misunderstanding either the purpose of this code or the
> comment (or both).
>
> Is it the case that a checkpoint starts with an iteration of live (to
> transfer everything over) and then drops into sending periodical
> non-live updates at each checkpoint?
>
> If so then I think a more useful comment would be:
>
>          /*
>           * We have now completed the initial live portion of the
>          checkpoint
>           * process. Therefore switch into periodically sending synchronous
>           * batches of pages.
>           */
>

This is much better, Thank you!

> Personally I don't have a problem with just a direct assignment without
> the if, since assigning false to an already flase value is a nop.

Will drop the if.

>
>> +            }
>> +
>> +            rc = write_checkpoint_record(ctx);
>> +            if ( rc )
>> +                goto err;
>> +
>> +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>> +
>> +            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>> +            if ( rc > 0 )
>> +                xc_report_progress_single(xch, "Checkpointed save");
>> +            else
>> +                ctx->save.checkpointed = false;
>> +        }
>> +    } while ( ctx->save.checkpointed );
>>
>>       xc_report_progress_single(xch, "End of stream");
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-15  2:12     ` Yang Hongyang
@ 2015-05-15  8:29       ` Andrew Cooper
  2015-05-15  8:37         ` Yang Hongyang
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2015-05-15  8:29 UTC (permalink / raw)
  To: Yang Hongyang, Ian Campbell
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On 15/05/2015 03:12, Yang Hongyang wrote:
>
>>> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct
>>> xc_sr_context *ctx)
>>>       DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>                                       &ctx->save.dirty_bitmap_hbuf);
>>>
>>> +    /*
>>> +     * With Remus, we will enter checkpointed save after live
>>> migration.
>>> +     * In checkpointed save loop, we skip the live part and pause
>>> straight
>>> +     * away to send dirty pages between checkpoints.
>>> +     */
>>> +    if ( !ctx->save.live )
>>> +        goto last_iter;
>>
>> Rather than use goto would it work to refactor everything from here to
>> the label into some sort of helper and just call that in the "actually
>> live" case?
>>
>> Or perhaps everything from the label to the end should be a helper
>> function which the caller can also use in thecheckpoint case instead of
>> calling send_domain_memory_live (and which s_d_m_l also calls of
>> course).
>
> I'm going to refactor the send_domain_memory_live() as follows:
>
> split the send_domain_memory_live() into three helper function:
>   - send_memory_live()  do the actually live case
>   - suspend_and_send_dirty() suspend the guest and send dirty pages
>   - send_memory_verify()
>
> then:
>   - send_domain_memory_live() combination of those three helper functions
>   - send_domain_momory_checkpointed() calls suspend_and_send_dirty() and
>                                       send_memory_verify()
>   - send_domain_memory_nonlive() stay as it is
> Does it make sense?

verify mode is only a debugging tool, and shouldn't be used in general. 
(It is actually a lot less useful in retrospect than one would imagine.)

However, the rest of the split looks plausible.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-15  8:29       ` Andrew Cooper
@ 2015-05-15  8:37         ` Yang Hongyang
  2015-05-15  9:31           ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  8:37 UTC (permalink / raw)
  To: Andrew Cooper, Ian Campbell
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram



On 05/15/2015 04:29 PM, Andrew Cooper wrote:
> On 15/05/2015 03:12, Yang Hongyang wrote:
>>
>>>> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct
>>>> xc_sr_context *ctx)
>>>>        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>>                                        &ctx->save.dirty_bitmap_hbuf);
>>>>
>>>> +    /*
>>>> +     * With Remus, we will enter checkpointed save after live
>>>> migration.
>>>> +     * In checkpointed save loop, we skip the live part and pause
>>>> straight
>>>> +     * away to send dirty pages between checkpoints.
>>>> +     */
>>>> +    if ( !ctx->save.live )
>>>> +        goto last_iter;
>>>
>>> Rather than use goto would it work to refactor everything from here to
>>> the label into some sort of helper and just call that in the "actually
>>> live" case?
>>>
>>> Or perhaps everything from the label to the end should be a helper
>>> function which the caller can also use in thecheckpoint case instead of
>>> calling send_domain_memory_live (and which s_d_m_l also calls of
>>> course).
>>
>> I'm going to refactor the send_domain_memory_live() as follows:
>>
>> split the send_domain_memory_live() into three helper function:
>>    - send_memory_live()  do the actually live case
>>    - suspend_and_send_dirty() suspend the guest and send dirty pages
>>    - send_memory_verify()
>>
>> then:
>>    - send_domain_memory_live() combination of those three helper functions
>>    - send_domain_momory_checkpointed() calls suspend_and_send_dirty() and
>>                                        send_memory_verify()
>>    - send_domain_memory_nonlive() stay as it is
>> Does it make sense?
>
> verify mode is only a debugging tool, and shouldn't be used in general.
> (It is actually a lot less useful in retrospect than one would imagine.)

Do you mean we do not need this in checkpointed stream?
Currently, tt is conditioned under 'if ( ctx->save.debug )', both in
live migration and checkpointed stream.

>
> However, the rest of the split looks plausible.
>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-15  1:32     ` Yang Hongyang
@ 2015-05-15  9:09       ` Ian Campbell
  2015-05-15  9:19         ` Yang Hongyang
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2015-05-15  9:09 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
> 
> On 05/14/2015 09:05 PM, Ian Campbell wrote:
> > On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> >> With Remus, the restore flow should be:
> >> the first full migration stream -> { periodically restore stream }
> >>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> CC: Ian Campbell <Ian.Campbell@citrix.com>
> >> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> >> CC: Wei Liu <wei.liu2@citrix.com>
> >> ---
> >>   tools/libxc/xc_sr_common.h  |  14 ++++++
> >>   tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
> >>   2 files changed, 117 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> >> index f8121e7..3bf27f1 100644
> >> --- a/tools/libxc/xc_sr_common.h
> >> +++ b/tools/libxc/xc_sr_common.h
> >> @@ -208,6 +208,20 @@ struct xc_sr_context
> >>               /* Plain VM, or checkpoints over time. */
> >>               bool checkpointed;
> >>
> >> +            /* Currently buffering records between a checkpoint */
> >> +            bool buffer_all_records;
> >> +
> >> +/*
> >> + * With Remus, we buffer the records sent by the primary at checkpoint,
> >> + * in case the primary will fail, we can recover from the last
> >> + * checkpoint state.
> >> + * This should be enough because primary only send dirty pages at
> >> + * checkpoint.
> >
> > I'm not sure how it then follows that 1024 buffers is guaranteed to be
> > enough, unless there is something on the sending side arranging it to be
> > so?
> 
> There are only few records at every checkpoint in my test, mostly under 10,
> probably because I don't do much operations in the Guest. I thought This limit
> can be adjusted later by further testing.

For some reason I thought these buffers included the page data, is that
not true? I was expecting the bulk of the records to be dirty page data.

> Since you and Andy both have doubts on this, I have to reconsider on this,
> perhaps there should be no limit. Even if the 1024 limit works for
> most of the cases, there might be cases that exceed the limit. So I will
> add another member 'allocated_rec_num' in the context, when the
> 'buffered_rec_num' exceed the 'allocated_rec_num', I will reallocate the buffer.
> The initial buffer size will be 1024 records which will work for most cases.

That seems easy enough to be worth doing even if I was wrong about paged
data.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-15  9:09       ` Ian Campbell
@ 2015-05-15  9:19         ` Yang Hongyang
  2015-05-15  9:27           ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  9:19 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram



On 05/15/2015 05:09 PM, Ian Campbell wrote:
> On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
>>
>> On 05/14/2015 09:05 PM, Ian Campbell wrote:
>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>>>> With Remus, the restore flow should be:
>>>> the first full migration stream -> { periodically restore stream }
>>>>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>> CC: Wei Liu <wei.liu2@citrix.com>
>>>> ---
>>>>    tools/libxc/xc_sr_common.h  |  14 ++++++
>>>>    tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>>>    2 files changed, 117 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>>>> index f8121e7..3bf27f1 100644
>>>> --- a/tools/libxc/xc_sr_common.h
>>>> +++ b/tools/libxc/xc_sr_common.h
>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>>>                /* Plain VM, or checkpoints over time. */
>>>>                bool checkpointed;
>>>>
>>>> +            /* Currently buffering records between a checkpoint */
>>>> +            bool buffer_all_records;
>>>> +
>>>> +/*
>>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>>>> + * in case the primary will fail, we can recover from the last
>>>> + * checkpoint state.
>>>> + * This should be enough because primary only send dirty pages at
>>>> + * checkpoint.
>>>
>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
>>> enough, unless there is something on the sending side arranging it to be
>>> so?
>>
>> There are only few records at every checkpoint in my test, mostly under 10,
>> probably because I don't do much operations in the Guest. I thought This limit
>> can be adjusted later by further testing.
>
> For some reason I thought these buffers included the page data, is that
> not true? I was expecting the bulk of the records to be dirty page data.

The page data is not stored in this buffer, but it's pointer stored in
this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.

>
>> Since you and Andy both have doubts on this, I have to reconsider on this,
>> perhaps there should be no limit. Even if the 1024 limit works for
>> most of the cases, there might be cases that exceed the limit. So I will
>> add another member 'allocated_rec_num' in the context, when the
>> 'buffered_rec_num' exceed the 'allocated_rec_num', I will reallocate the buffer.
>> The initial buffer size will be 1024 records which will work for most cases.
>
> That seems easy enough to be worth doing even if I was wrong about paged
> data.

done.

>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-15  9:19         ` Yang Hongyang
@ 2015-05-15  9:27           ` Ian Campbell
  2015-05-15  9:34             ` Andrew Cooper
  2015-05-15  9:34             ` Yang Hongyang
  0 siblings, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2015-05-15  9:27 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote:
> 
> On 05/15/2015 05:09 PM, Ian Campbell wrote:
> > On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
> >>
> >> On 05/14/2015 09:05 PM, Ian Campbell wrote:
> >>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> >>>> With Remus, the restore flow should be:
> >>>> the first full migration stream -> { periodically restore stream }
> >>>>
> >>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
> >>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> >>>> CC: Wei Liu <wei.liu2@citrix.com>
> >>>> ---
> >>>>    tools/libxc/xc_sr_common.h  |  14 ++++++
> >>>>    tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
> >>>>    2 files changed, 117 insertions(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> >>>> index f8121e7..3bf27f1 100644
> >>>> --- a/tools/libxc/xc_sr_common.h
> >>>> +++ b/tools/libxc/xc_sr_common.h
> >>>> @@ -208,6 +208,20 @@ struct xc_sr_context
> >>>>                /* Plain VM, or checkpoints over time. */
> >>>>                bool checkpointed;
> >>>>
> >>>> +            /* Currently buffering records between a checkpoint */
> >>>> +            bool buffer_all_records;
> >>>> +
> >>>> +/*
> >>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
> >>>> + * in case the primary will fail, we can recover from the last
> >>>> + * checkpoint state.
> >>>> + * This should be enough because primary only send dirty pages at
> >>>> + * checkpoint.
> >>>
> >>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
> >>> enough, unless there is something on the sending side arranging it to be
> >>> so?
> >>
> >> There are only few records at every checkpoint in my test, mostly under 10,
> >> probably because I don't do much operations in the Guest. I thought This limit
> >> can be adjusted later by further testing.
> >
> > For some reason I thought these buffers included the page data, is that
> > not true? I was expecting the bulk of the records to be dirty page data.
> 
> The page data is not stored in this buffer, but it's pointer stored in
> this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.

OK, so there are (approximately) as many xc_sr_records as there are
buffered dirty pages? I'd expect this would easily reach 1024 in some
circumstances (e..g run a fork bomb in the domain or something)

> >> Since you and Andy both have doubts on this, I have to reconsider on this,
> >> perhaps there should be no limit. Even if the 1024 limit works for
> >> most of the cases, there might be cases that exceed the limit. So I will
> >> add another member 'allocated_rec_num' in the context, when the
> >> 'buffered_rec_num' exceed the 'allocated_rec_num', I will reallocate the buffer.
> >> The initial buffer size will be 1024 records which will work for most cases.
> >
> > That seems easy enough to be worth doing even if I was wrong about paged
> > data.
> 
> done.
> 
> >
> >
> > .
> >
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-15  8:37         ` Yang Hongyang
@ 2015-05-15  9:31           ` Andrew Cooper
  2015-05-15  9:40             ` Yang Hongyang
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2015-05-15  9:31 UTC (permalink / raw)
  To: Yang Hongyang, Ian Campbell
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On 15/05/15 09:37, Yang Hongyang wrote:
>
>
> On 05/15/2015 04:29 PM, Andrew Cooper wrote:
>> On 15/05/2015 03:12, Yang Hongyang wrote:
>>>
>>>>> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct
>>>>> xc_sr_context *ctx)
>>>>>        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>>>                                        &ctx->save.dirty_bitmap_hbuf);
>>>>>
>>>>> +    /*
>>>>> +     * With Remus, we will enter checkpointed save after live
>>>>> migration.
>>>>> +     * In checkpointed save loop, we skip the live part and pause
>>>>> straight
>>>>> +     * away to send dirty pages between checkpoints.
>>>>> +     */
>>>>> +    if ( !ctx->save.live )
>>>>> +        goto last_iter;
>>>>
>>>> Rather than use goto would it work to refactor everything from here to
>>>> the label into some sort of helper and just call that in the "actually
>>>> live" case?
>>>>
>>>> Or perhaps everything from the label to the end should be a helper
>>>> function which the caller can also use in thecheckpoint case
>>>> instead of
>>>> calling send_domain_memory_live (and which s_d_m_l also calls of
>>>> course).
>>>
>>> I'm going to refactor the send_domain_memory_live() as follows:
>>>
>>> split the send_domain_memory_live() into three helper function:
>>>    - send_memory_live()  do the actually live case
>>>    - suspend_and_send_dirty() suspend the guest and send dirty pages
>>>    - send_memory_verify()
>>>
>>> then:
>>>    - send_domain_memory_live() combination of those three helper
>>> functions
>>>    - send_domain_momory_checkpointed() calls
>>> suspend_and_send_dirty() and
>>>                                        send_memory_verify()
>>>    - send_domain_memory_nonlive() stay as it is
>>> Does it make sense?
>>
>> verify mode is only a debugging tool, and shouldn't be used in general.
>> (It is actually a lot less useful in retrospect than one would imagine.)
>
> Do you mean we do not need this in checkpointed stream?
> Currently, tt is conditioned under 'if ( ctx->save.debug )', both in
> live migration and checkpointed stream.

Ah - you absolutely don't want it in a checkpointed stream.  As soon as
that option has been set, the restoring side can no longer resume the
domain.

Basically, it tells the restore side to start doing memcmp() instead of
memcpy() on the incoming page data.  The initial hope was to detect
memory corruption, but it turns out that there is natural memory
corruption to be had anyway when PV drivers are in use.

(This is another area which is around for compatibility with legacy,
rather than actually of being much use in practice)

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-15  9:27           ` Ian Campbell
@ 2015-05-15  9:34             ` Andrew Cooper
  2015-05-15  9:34             ` Yang Hongyang
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2015-05-15  9:34 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On 15/05/15 10:27, Ian Campbell wrote:
> On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote:
>> On 05/15/2015 05:09 PM, Ian Campbell wrote:
>>> On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
>>>> On 05/14/2015 09:05 PM, Ian Campbell wrote:
>>>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>>>>>> With Remus, the restore flow should be:
>>>>>> the first full migration stream -> { periodically restore stream }
>>>>>>
>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>>>> CC: Wei Liu <wei.liu2@citrix.com>
>>>>>> ---
>>>>>>    tools/libxc/xc_sr_common.h  |  14 ++++++
>>>>>>    tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>>>>>    2 files changed, 117 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>>>>>> index f8121e7..3bf27f1 100644
>>>>>> --- a/tools/libxc/xc_sr_common.h
>>>>>> +++ b/tools/libxc/xc_sr_common.h
>>>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>>>>>                /* Plain VM, or checkpoints over time. */
>>>>>>                bool checkpointed;
>>>>>>
>>>>>> +            /* Currently buffering records between a checkpoint */
>>>>>> +            bool buffer_all_records;
>>>>>> +
>>>>>> +/*
>>>>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>>>>>> + * in case the primary will fail, we can recover from the last
>>>>>> + * checkpoint state.
>>>>>> + * This should be enough because primary only send dirty pages at
>>>>>> + * checkpoint.
>>>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
>>>>> enough, unless there is something on the sending side arranging it to be
>>>>> so?
>>>> There are only few records at every checkpoint in my test, mostly under 10,
>>>> probably because I don't do much operations in the Guest. I thought This limit
>>>> can be adjusted later by further testing.
>>> For some reason I thought these buffers included the page data, is that
>>> not true? I was expecting the bulk of the records to be dirty page data.
>> The page data is not stored in this buffer, but it's pointer stored in
>> this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.
> OK, so there are (approximately) as many xc_sr_records as there are
> buffered dirty pages?

Every PAGE_DATA record can have as many pages in it as it wishes,
subject to the (arbitrary) 8MB max record length.

The sending side will typically put 1024 pages in an individual record.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-15  9:27           ` Ian Campbell
  2015-05-15  9:34             ` Andrew Cooper
@ 2015-05-15  9:34             ` Yang Hongyang
  2015-05-15  9:43               ` Ian Campbell
  1 sibling, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  9:34 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram



On 05/15/2015 05:27 PM, Ian Campbell wrote:
> On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote:
>>
>> On 05/15/2015 05:09 PM, Ian Campbell wrote:
>>> On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
>>>>
>>>> On 05/14/2015 09:05 PM, Ian Campbell wrote:
>>>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>>>>>> With Remus, the restore flow should be:
>>>>>> the first full migration stream -> { periodically restore stream }
>>>>>>
>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>>>> CC: Wei Liu <wei.liu2@citrix.com>
>>>>>> ---
>>>>>>     tools/libxc/xc_sr_common.h  |  14 ++++++
>>>>>>     tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>>>>>     2 files changed, 117 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>>>>>> index f8121e7..3bf27f1 100644
>>>>>> --- a/tools/libxc/xc_sr_common.h
>>>>>> +++ b/tools/libxc/xc_sr_common.h
>>>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>>>>>                 /* Plain VM, or checkpoints over time. */
>>>>>>                 bool checkpointed;
>>>>>>
>>>>>> +            /* Currently buffering records between a checkpoint */
>>>>>> +            bool buffer_all_records;
>>>>>> +
>>>>>> +/*
>>>>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>>>>>> + * in case the primary will fail, we can recover from the last
>>>>>> + * checkpoint state.
>>>>>> + * This should be enough because primary only send dirty pages at
>>>>>> + * checkpoint.
>>>>>
>>>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
>>>>> enough, unless there is something on the sending side arranging it to be
>>>>> so?
>>>>
>>>> There are only few records at every checkpoint in my test, mostly under 10,
>>>> probably because I don't do much operations in the Guest. I thought This limit
>>>> can be adjusted later by further testing.
>>>
>>> For some reason I thought these buffers included the page data, is that
>>> not true? I was expecting the bulk of the records to be dirty page data.
>>
>> The page data is not stored in this buffer, but it's pointer stored in
>> this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.
>
> OK, so there are (approximately) as many xc_sr_records as there are
> buffered dirty pages? I'd expect this would easily reach 1024 in some
> circumstances (e..g run a fork bomb in the domain or something)

No, a record may contain up to 1024 pages, so the record number is less
than dirty page number.

>
>>>> Since you and Andy both have doubts on this, I have to reconsider on this,
>>>> perhaps there should be no limit. Even if the 1024 limit works for
>>>> most of the cases, there might be cases that exceed the limit. So I will
>>>> add another member 'allocated_rec_num' in the context, when the
>>>> 'buffered_rec_num' exceed the 'allocated_rec_num', I will reallocate the buffer.
>>>> The initial buffer size will be 1024 records which will work for most cases.
>>>
>>> That seems easy enough to be worth doing even if I was wrong about paged
>>> data.
>>
>> done.
>>
>>>
>>>
>>> .
>>>
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save
  2015-05-15  9:31           ` Andrew Cooper
@ 2015-05-15  9:40             ` Yang Hongyang
  0 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2015-05-15  9:40 UTC (permalink / raw)
  To: Andrew Cooper, Ian Campbell
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram



On 05/15/2015 05:31 PM, Andrew Cooper wrote:
> On 15/05/15 09:37, Yang Hongyang wrote:
>>
>>
>> On 05/15/2015 04:29 PM, Andrew Cooper wrote:
>>> On 15/05/2015 03:12, Yang Hongyang wrote:
>>>>
>>>>>> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct
>>>>>> xc_sr_context *ctx)
>>>>>>         DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>>>>                                         &ctx->save.dirty_bitmap_hbuf);
>>>>>>
>>>>>> +    /*
>>>>>> +     * With Remus, we will enter checkpointed save after live
>>>>>> migration.
>>>>>> +     * In checkpointed save loop, we skip the live part and pause
>>>>>> straight
>>>>>> +     * away to send dirty pages between checkpoints.
>>>>>> +     */
>>>>>> +    if ( !ctx->save.live )
>>>>>> +        goto last_iter;
>>>>>
>>>>> Rather than use goto would it work to refactor everything from here to
>>>>> the label into some sort of helper and just call that in the "actually
>>>>> live" case?
>>>>>
>>>>> Or perhaps everything from the label to the end should be a helper
>>>>> function which the caller can also use in thecheckpoint case
>>>>> instead of
>>>>> calling send_domain_memory_live (and which s_d_m_l also calls of
>>>>> course).
>>>>
>>>> I'm going to refactor the send_domain_memory_live() as follows:
>>>>
>>>> split the send_domain_memory_live() into three helper function:
>>>>     - send_memory_live()  do the actually live case
>>>>     - suspend_and_send_dirty() suspend the guest and send dirty pages
>>>>     - send_memory_verify()
>>>>
>>>> then:
>>>>     - send_domain_memory_live() combination of those three helper
>>>> functions
>>>>     - send_domain_momory_checkpointed() calls
>>>> suspend_and_send_dirty() and
>>>>                                         send_memory_verify()
>>>>     - send_domain_memory_nonlive() stay as it is
>>>> Does it make sense?
>>>
>>> verify mode is only a debugging tool, and shouldn't be used in general.
>>> (It is actually a lot less useful in retrospect than one would imagine.)
>>
>> Do you mean we do not need this in checkpointed stream?
>> Currently, tt is conditioned under 'if ( ctx->save.debug )', both in
>> live migration and checkpointed stream.
>
> Ah - you absolutely don't want it in a checkpointed stream.  As soon as
> that option has been set, the restoring side can no longer resume the
> domain.

So it needs to be conditioned under
if ( ctx->save.debug && !ctx->save.checkpointed )

if it is a checkpointed stream, we just ignore this flag, is it OK?

>
> Basically, it tells the restore side to start doing memcmp() instead of
> memcpy() on the incoming page data.  The initial hope was to detect
> memory corruption, but it turns out that there is natural memory
> corruption to be had anyway when PV drivers are in use.
>
> (This is another area which is around for compatibility with legacy,
> rather than actually of being much use in practice)
>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
  2015-05-15  9:34             ` Yang Hongyang
@ 2015-05-15  9:43               ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2015-05-15  9:43 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Fri, 2015-05-15 at 17:34 +0800, Yang Hongyang wrote:
> 
> On 05/15/2015 05:27 PM, Ian Campbell wrote:
> > On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote:
> >>
> >> On 05/15/2015 05:09 PM, Ian Campbell wrote:
> >>> On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
> >>>>
> >>>> On 05/14/2015 09:05 PM, Ian Campbell wrote:
> >>>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> >>>>>> With Remus, the restore flow should be:
> >>>>>> the first full migration stream -> { periodically restore stream }
> >>>>>>
> >>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
> >>>>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> >>>>>> CC: Wei Liu <wei.liu2@citrix.com>
> >>>>>> ---
> >>>>>>     tools/libxc/xc_sr_common.h  |  14 ++++++
> >>>>>>     tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
> >>>>>>     2 files changed, 117 insertions(+), 10 deletions(-)
> >>>>>>
> >>>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> >>>>>> index f8121e7..3bf27f1 100644
> >>>>>> --- a/tools/libxc/xc_sr_common.h
> >>>>>> +++ b/tools/libxc/xc_sr_common.h
> >>>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
> >>>>>>                 /* Plain VM, or checkpoints over time. */
> >>>>>>                 bool checkpointed;
> >>>>>>
> >>>>>> +            /* Currently buffering records between a checkpoint */
> >>>>>> +            bool buffer_all_records;
> >>>>>> +
> >>>>>> +/*
> >>>>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
> >>>>>> + * in case the primary will fail, we can recover from the last
> >>>>>> + * checkpoint state.
> >>>>>> + * This should be enough because primary only send dirty pages at
> >>>>>> + * checkpoint.
> >>>>>
> >>>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
> >>>>> enough, unless there is something on the sending side arranging it to be
> >>>>> so?
> >>>>
> >>>> There are only few records at every checkpoint in my test, mostly under 10,
> >>>> probably because I don't do much operations in the Guest. I thought This limit
> >>>> can be adjusted later by further testing.
> >>>
> >>> For some reason I thought these buffers included the page data, is that
> >>> not true? I was expecting the bulk of the records to be dirty page data.
> >>
> >> The page data is not stored in this buffer, but it's pointer stored in
> >> this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.
> >
> > OK, so there are (approximately) as many xc_sr_records as there are
> > buffered dirty pages? I'd expect this would easily reach 1024 in some
> > circumstances (e..g run a fork bomb in the domain or something)
> 
> No, a record may contain up to 1024 pages, so the record number is less
> than dirty page number.

OK so 1024 records equates to ... <sounds of maths> .... around 4GB of
actual data at most (but I suppose not all recs will use the full 1024
pages).

I suppose a guest would be working quite hard to dirty that without
retriggering a checkpoint (even with colo's more relaxed approach to
resync).

In any case, making the array grow is clearly a good thing to do and
you've already done it, so no need to keep thinking about this case ;-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-05-15  9:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-14 10:06 [PATCH Remus v5 0/2] Remus support for Migration-v2 Yang Hongyang
2015-05-14 10:06 ` [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
2015-05-14 12:47   ` Ian Campbell
2015-05-15  2:12     ` Yang Hongyang
2015-05-15  8:29       ` Andrew Cooper
2015-05-15  8:37         ` Yang Hongyang
2015-05-15  9:31           ` Andrew Cooper
2015-05-15  9:40             ` Yang Hongyang
2015-05-14 10:06 ` [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
2015-05-14 10:11   ` Andrew Cooper
2015-05-14 13:05   ` Ian Campbell
2015-05-14 13:17     ` Andrew Cooper
2015-05-14 14:04       ` Ian Campbell
2015-05-15  1:08         ` Yang Hongyang
2015-05-15  1:32     ` Yang Hongyang
2015-05-15  9:09       ` Ian Campbell
2015-05-15  9:19         ` Yang Hongyang
2015-05-15  9:27           ` Ian Campbell
2015-05-15  9:34             ` Andrew Cooper
2015-05-15  9:34             ` Yang Hongyang
2015-05-15  9:43               ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.