All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Remus: add remus support for migration v2
@ 2014-07-09  7:47 Yang Hongyang
  2014-07-09  7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Yang Hongyang @ 2014-07-09  7:47 UTC (permalink / raw)
  To: xen-devel
  Cc: rshriram, andrew.cooper3, Yang Hongyang, Ian.Jackson, Ian.Campbell

This patchset adds remus support for migration v2, it is based on
Andrew's "[PATCH v6 0/13] Migration Stream v2".

Only tested on x86 pv guests. It is a PoC, without performence tuning
and mem compress support.

The code is presented here for comment/query/critism and wider the
discussion of remus support on migration v2.

Yang Hongyang (3):
  remus: add a bool var to indicate checkpointed stream
  remus: implement remus checkpoint in v2 save
  remus: adjust x86 pv restore to support remus

 tools/libxc/saverestore/common.h         |  8 +++
 tools/libxc/saverestore/restore_x86_pv.c | 67 +++++++++++++++-------
 tools/libxc/saverestore/save.c           | 95 ++++++++++++++++++++------------
 tools/libxc/xenguest.h                   |  1 +
 tools/libxl/libxl_dom.c                  |  1 +
 5 files changed, 118 insertions(+), 54 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream
  2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
@ 2014-07-09  7:47 ` Yang Hongyang
  2014-07-09  9:45   ` Andrew Cooper
  2014-07-09  7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Yang Hongyang @ 2014-07-09  7:47 UTC (permalink / raw)
  To: xen-devel
  Cc: rshriram, andrew.cooper3, Yang Hongyang, Ian.Jackson, Ian.Campbell

add a bool variable to migration context to indicate checkpointed stream

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/saverestore/common.h | 1 +
 tools/libxc/saverestore/save.c   | 7 +++++++
 tools/libxc/xenguest.h           | 1 +
 tools/libxl/libxl_dom.c          | 1 +
 4 files changed, 10 insertions(+)

diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 4840d3f..24ba95b 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -152,6 +152,7 @@ struct xc_sr_context
     int fd;
 
     xc_dominfo_t dominfo;
+    bool checkpointed;
 
     union
     {
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
index 653dc8e..d2fa8a6 100644
--- a/tools/libxc/saverestore/save.c
+++ b/tools/libxc/saverestore/save.c
@@ -652,6 +652,13 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
     ctx.save.callbacks = callbacks;
     ctx.save.live  = !!(flags & XCFLAGS_LIVE);
     ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
+    ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
+
+    if ( ctx.checkpointed ) {
+        /* This is a checkpointed save, we need these callbacks */
+        assert(ctx.save.callbacks->checkpoint);
+        assert(ctx.save.callbacks->postcopy);
+    }
 
     IPRINTF("In experimental %s", __func__);
     DPRINTF("fd %d, dom %"PRIu32", max_iters %"PRIu32", max_factor %"PRIu32
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 55755cf..72be3bf 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -28,6 +28,7 @@
 #define XCFLAGS_HVM       (1 << 2)
 #define XCFLAGS_STDVGA    (1 << 3)
 #define XCFLAGS_CHECKPOINT_COMPRESS    (1 << 4)
+#define XCFLAGS_CHECKPOINTED    (1 << 5)
 
 #define X86_64_B_SIZE   64 
 #define X86_32_B_SIZE   32
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 83eb29a..4a3fd73 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1573,6 +1573,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 
     if (r_info != NULL) {
         dss->interval = r_info->interval;
+        dss->xcflags |= XCFLAGS_CHECKPOINTED;
         if (r_info->compression)
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
  2014-07-09  7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
@ 2014-07-09  7:47 ` Yang Hongyang
  2014-07-09 10:53   ` Andrew Cooper
  2014-07-16 15:22   ` Shriram Rajagopalan
  2014-07-09  7:47 ` [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus Yang Hongyang
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Yang Hongyang @ 2014-07-09  7:47 UTC (permalink / raw)
  To: xen-devel
  Cc: rshriram, andrew.cooper3, Yang Hongyang, Ian.Jackson, Ian.Campbell

implement remus checkpoint in v2 save

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/saverestore/common.h |  1 +
 tools/libxc/saverestore/save.c   | 88 ++++++++++++++++++++++++----------------
 2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 24ba95b..1dd9f51 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -153,6 +153,7 @@ struct xc_sr_context
 
     xc_dominfo_t dominfo;
     bool checkpointed;
+    bool firsttime;
 
     union
     {
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
index d2fa8a6..98a5c2f 100644
--- a/tools/libxc/saverestore/save.c
+++ b/tools/libxc/saverestore/save.c
@@ -375,6 +375,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
         goto out;
     }
 
+    if ( ctx->checkpointed && !ctx->firsttime )
+        goto lastiter;
     /* This juggling is required if logdirty is already on, e.g. VRAM tracking */
     if ( xc_shadow_control(xch, ctx->domid,
                            XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
@@ -436,6 +438,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
             break;
     }
 
+lastiter:
     rc = suspend_domain(ctx);
     if ( rc )
         goto out;
@@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
     if ( rc )
         goto err;
 
-    rc = ctx->save.ops.start_of_stream(ctx);
-    if ( rc )
-        goto err;
+    do {
+        rc = ctx->save.ops.start_of_stream(ctx);
+        if ( rc )
+            goto err;
 
-    if ( ctx->save.live )
-    {
-        DPRINTF("Starting live migrate");
-        rc = send_domain_memory_live(ctx);
-    }
-    else
-    {
-        DPRINTF("Starting nonlive save");
-        rc = send_domain_memory_nonlive(ctx);
-    }
+        if ( ctx->save.live )
+        {
+            DPRINTF("Starting live migrate");
+            rc = send_domain_memory_live(ctx);
+        }
+        else
+        {
+            DPRINTF("Starting nonlive save");
+            rc = send_domain_memory_nonlive(ctx);
+        }
 
-    if ( rc )
-        goto err;
+        if ( rc )
+            goto err;
 
-    /* Refresh domain information now it has paused. */
-    if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
-         (ctx->dominfo.domid != ctx->domid) )
-    {
-        PERROR("Unable to refresh domain information");
-        rc = -1;
-        goto err;
-    }
-    else if ( (!ctx->dominfo.shutdown ||
-               ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
-              !ctx->dominfo.paused )
-    {
-        ERROR("Domain has not been suspended");
-        rc = -1;
-        goto err;
-    }
+        /* Refresh domain information now it has paused. */
+        if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
+             (ctx->dominfo.domid != ctx->domid) )
+        {
+            PERROR("Unable to refresh domain information");
+            rc = -1;
+            goto err;
+        }
+        else if ( (!ctx->dominfo.shutdown ||
+                  ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
+                  !ctx->dominfo.paused )
+        {
+            ERROR("Domain has not been suspended");
+            rc = -1;
+            goto err;
+        }
 
-    rc = ctx->save.ops.end_of_stream(ctx);
-    if ( rc )
-        goto err;
+        rc = ctx->save.ops.end_of_stream(ctx);
+        if ( rc )
+            goto err;
+
+        if ( ctx->checkpointed ) {
+            if ( ctx->firsttime )
+                ctx->firsttime = false;
+
+            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+
+            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+            if ( rc > 0 ) {
+                IPRINTF("Next checkpoint\n");
+            } else {
+                ctx->checkpointed = false;
+            }
+        }
+    } while ( ctx->checkpointed );
 
     rc = write_end_record(ctx);
     if ( rc )
@@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
     ctx.save.live  = !!(flags & XCFLAGS_LIVE);
     ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
     ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
+    ctx.firsttime = true;
 
     if ( ctx.checkpointed ) {
         /* This is a checkpointed save, we need these callbacks */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
  2014-07-09  7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
  2014-07-09  7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
@ 2014-07-09  7:47 ` Yang Hongyang
  2014-07-09 11:16   ` Andrew Cooper
  2014-07-09  8:53 ` [RFC PATCH 0/3] Remus: add remus support for migration v2 Ian Campbell
  2014-07-09  9:42 ` Andrew Cooper
  4 siblings, 1 reply; 25+ messages in thread
From: Yang Hongyang @ 2014-07-09  7:47 UTC (permalink / raw)
  To: xen-devel
  Cc: rshriram, andrew.cooper3, Yang Hongyang, Ian.Jackson, Ian.Campbell

cache vcpu context when restore, and set context when stream
complete.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/saverestore/common.h         |  6 +++
 tools/libxc/saverestore/restore_x86_pv.c | 67 ++++++++++++++++++++++----------
 2 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 1dd9f51..ba54f83 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -235,6 +235,12 @@ struct xc_sr_context
 
             /* Read-only mapping of guests shared info page */
             shared_info_any_t *shinfo;
+
+            /*
+             * We need to cache vcpu context when doing checkpointed
+             * restore, otherwise restore will fail
+             */
+            vcpu_guest_context_any_t *vcpu;
         } x86_pv;
     };
 };
diff --git a/tools/libxc/saverestore/restore_x86_pv.c b/tools/libxc/saverestore/restore_x86_pv.c
index 14fc020..7a54047 100644
--- a/tools/libxc/saverestore/restore_x86_pv.c
+++ b/tools/libxc/saverestore/restore_x86_pv.c
@@ -428,8 +428,8 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
 {
     xc_interface *xch = ctx->xch;
     struct xc_sr_rec_x86_pv_vcpu_hdr *vhdr = rec->data;
-    vcpu_guest_context_any_t vcpu;
-    size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof(vcpu.x64) : sizeof(vcpu.x32);
+    vcpu_guest_context_any_t *vcpu;
+    size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof(vcpu->x64) : sizeof(vcpu->x32);
     xen_pfn_t pfn, mfn;
     unsigned long tmp;
     unsigned i;
@@ -455,22 +455,23 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
         goto err;
     }
 
-    memcpy(&vcpu, &vhdr->context, vcpusz);
+    vcpu = (void *)ctx->x86_pv.vcpu + vcpusz * vhdr->vcpu_id;
+    memcpy(vcpu, &vhdr->context, vcpusz);
 
     /* Vcpu 0 is special: Convert the suspend record to an mfn. */
     if ( vhdr->vcpu_id == 0 )
     {
-        rc = process_start_info(ctx, &vcpu);
+        rc = process_start_info(ctx, vcpu);
         if ( rc )
             return rc;
         rc = -1;
     }
 
-    SET_FIELD(&vcpu, flags,
-              GET_FIELD(&vcpu, flags, ctx->x86_pv.width) | VGCF_online,
+    SET_FIELD(vcpu, flags,
+              GET_FIELD(vcpu, flags, ctx->x86_pv.width) | VGCF_online,
               ctx->x86_pv.width);
 
-    tmp = GET_FIELD(&vcpu, gdt_ents, ctx->x86_pv.width);
+    tmp = GET_FIELD(vcpu, gdt_ents, ctx->x86_pv.width);
     if ( tmp > 8192 )
     {
         ERROR("GDT entry count (%lu) out of range", tmp);
@@ -481,7 +482,7 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
     /* Convert GDT frames to mfns. */
     for ( i = 0; (i * 512) < tmp; ++i )
     {
-        pfn = GET_FIELD(&vcpu, gdt_frames[i], ctx->x86_pv.width);
+        pfn = GET_FIELD(vcpu, gdt_frames[i], ctx->x86_pv.width);
         if ( pfn > ctx->x86_pv.max_pfn )
         {
             ERROR("GDT frame %u (pfn %#lx) out of range", i, pfn);
@@ -502,11 +503,11 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
             goto err;
         }
 
-        SET_FIELD(&vcpu, gdt_frames[i], mfn, ctx->x86_pv.width);
+        SET_FIELD(vcpu, gdt_frames[i], mfn, ctx->x86_pv.width);
     }
 
     /* Convert CR3 to an mfn. */
-    pfn = cr3_to_mfn(ctx, GET_FIELD(&vcpu, ctrlreg[3], ctx->x86_pv.width));
+    pfn = cr3_to_mfn(ctx, GET_FIELD(vcpu, ctrlreg[3], ctx->x86_pv.width));
     if ( pfn > ctx->x86_pv.max_pfn )
     {
         ERROR("cr3 (pfn %#lx) out of range", pfn);
@@ -529,12 +530,12 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
         goto err;
     }
 
-    SET_FIELD(&vcpu, ctrlreg[3], mfn_to_cr3(ctx, mfn), ctx->x86_pv.width);
+    SET_FIELD(vcpu, ctrlreg[3], mfn_to_cr3(ctx, mfn), ctx->x86_pv.width);
 
     /* 64bit guests: Convert CR1 (guest pagetables) to mfn. */
-    if ( ctx->x86_pv.levels == 4 && (vcpu.x64.ctrlreg[1] & 1) )
+    if ( ctx->x86_pv.levels == 4 && (vcpu->x64.ctrlreg[1] & 1) )
     {
-        pfn = vcpu.x64.ctrlreg[1] >> PAGE_SHIFT;
+        pfn = vcpu->x64.ctrlreg[1] >> PAGE_SHIFT;
 
         if ( pfn > ctx->x86_pv.max_pfn )
         {
@@ -558,13 +559,7 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
             goto err;
         }
 
-        vcpu.x64.ctrlreg[1] = (uint64_t)mfn << PAGE_SHIFT;
-    }
-
-    if ( xc_vcpu_setcontext(xch, ctx->domid, vhdr->vcpu_id, &vcpu) )
-    {
-        PERROR("Failed to set vcpu%u's basic info", vhdr->vcpu_id);
-        goto err;
+        vcpu->x64.ctrlreg[1] = (uint64_t)mfn << PAGE_SHIFT;
     }
 
     rc = 0;
@@ -881,6 +876,7 @@ static int x86_pv_localise_page(struct xc_sr_context *ctx, uint32_t type, void *
 static int x86_pv_setup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
+    size_t vcpusz;
     int rc;
 
     if ( ctx->restore.guest_type != DHDR_TYPE_X86_PV )
@@ -904,6 +900,9 @@ static int x86_pv_setup(struct xc_sr_context *ctx)
     if ( rc )
         return rc;
 
+    vcpusz = ctx->x86_pv.width == 8 ?
+             sizeof(ctx->x86_pv.vcpu->x64) : sizeof(ctx->x86_pv.vcpu->x32);
+    ctx->x86_pv.vcpu = calloc((ctx->dominfo.max_vcpu_id + 1) , vcpusz);
     return rc;
 }
 
@@ -946,6 +945,29 @@ static int x86_pv_process_record(struct xc_sr_context *ctx, struct xc_sr_record
     }
 }
 
+static int update_vcpu_context(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    vcpu_guest_context_any_t *vcpu;
+    uint32_t vcpu_id;
+    size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof(vcpu->x64) : sizeof(vcpu->x32);
+
+    for ( vcpu_id = 0; vcpu_id <= ctx->dominfo.max_vcpu_id; vcpu_id++ )
+    {
+        vcpu = (void *)ctx->x86_pv.vcpu + vcpu_id * vcpusz;
+        if ( !vcpu )
+            continue;
+
+        if ( xc_vcpu_setcontext(xch, ctx->domid, vcpu_id, vcpu) )
+        {
+            PERROR("Failed to set vcpu%u's basic info", vcpu_id);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 /*
  * restore_ops function.  Pin the pagetables, rewrite the p2m and seed the
  * grant table.
@@ -963,6 +985,10 @@ static int x86_pv_stream_complete(struct xc_sr_context *ctx)
     if ( rc )
         return rc;
 
+    rc = update_vcpu_context(ctx);
+    if ( rc )
+        return rc;
+
     rc = xc_dom_gnttab_seed(xch, ctx->domid,
                             ctx->restore.console_mfn,
                             ctx->restore.xenstore_mfn,
@@ -985,6 +1011,7 @@ static int x86_pv_cleanup(struct xc_sr_context *ctx)
     free(ctx->x86_pv.p2m);
     free(ctx->x86_pv.p2m_pfns);
     free(ctx->x86_pv.pfn_types);
+    free(ctx->x86_pv.vcpu);
 
     if ( ctx->x86_pv.m2p )
         munmap(ctx->x86_pv.m2p, ctx->x86_pv.nr_m2p_frames * PAGE_SIZE);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/3] Remus: add remus support for migration v2
  2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-07-09  7:47 ` [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus Yang Hongyang
@ 2014-07-09  8:53 ` Ian Campbell
  2014-07-09  9:56   ` Hongyang Yang
  2014-07-09  9:42 ` Andrew Cooper
  4 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-07-09  8:53 UTC (permalink / raw)
  To: Yang Hongyang; +Cc: rshriram, andrew.cooper3, Ian.Jackson, xen-devel

On Wed, 2014-07-09 at 15:47 +0800, Yang Hongyang wrote:
> This patchset adds remus support for migration v2, it is based on
> Andrew's "[PATCH v6 0/13] Migration Stream v2".
> 
> Only tested on x86 pv guests. It is a PoC, without performence tuning
> and mem compress support.
> 
> The code is presented here for comment/query/critism and wider the
> discussion of remus support on migration v2.

Thanks.

The main thing which I would have expected to see as the first thing
here is an update to docs/specs/libxc-migration-stream.pandoc which
describes the protocol enhancement which this surely involves.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/3] Remus: add remus support for migration v2
  2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-07-09  8:53 ` [RFC PATCH 0/3] Remus: add remus support for migration v2 Ian Campbell
@ 2014-07-09  9:42 ` Andrew Cooper
  2014-07-09 10:06   ` Hongyang Yang
  4 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-09  9:42 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 09/07/14 08:47, Yang Hongyang wrote:
> This patchset adds remus support for migration v2, it is based on
> Andrew's "[PATCH v6 0/13] Migration Stream v2".
>
> Only tested on x86 pv guests. It is a PoC, without performence tuning
> and mem compress support.
>
> The code is presented here for comment/query/critism and wider the
> discussion of remus support on migration v2.
>
> Yang Hongyang (3):
>   remus: add a bool var to indicate checkpointed stream
>   remus: implement remus checkpoint in v2 save
>   remus: adjust x86 pv restore to support remus
>
>  tools/libxc/saverestore/common.h         |  8 +++
>  tools/libxc/saverestore/restore_x86_pv.c | 67 +++++++++++++++-------
>  tools/libxc/saverestore/save.c           | 95 ++++++++++++++++++++------------
>  tools/libxc/xenguest.h                   |  1 +
>  tools/libxl/libxl_dom.c                  |  1 +
>  5 files changed, 118 insertions(+), 54 deletions(-)
>

Thankyou very much for doing this!

I am surprised quite how little code it it, but that can only be a good
thing.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream
  2014-07-09  7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
@ 2014-07-09  9:45   ` Andrew Cooper
  2014-07-09  9:53     ` Hongyang Yang
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-09  9:45 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 09/07/14 08:47, Yang Hongyang wrote:
> add a bool variable to migration context to indicate checkpointed stream
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/saverestore/common.h | 1 +
>  tools/libxc/saverestore/save.c   | 7 +++++++
>  tools/libxc/xenguest.h           | 1 +
>  tools/libxl/libxl_dom.c          | 1 +
>  4 files changed, 10 insertions(+)
>
> diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
> index 4840d3f..24ba95b 100644
> --- a/tools/libxc/saverestore/common.h
> +++ b/tools/libxc/saverestore/common.h
> @@ -152,6 +152,7 @@ struct xc_sr_context
>      int fd;
>  
>      xc_dominfo_t dominfo;
> +    bool checkpointed;

This appears to only be used on the save side, so should live in the
.save union.  Also, a comment indicating that this is a remus option.

>  
>      union
>      {
> diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
> index 653dc8e..d2fa8a6 100644
> --- a/tools/libxc/saverestore/save.c
> +++ b/tools/libxc/saverestore/save.c
> @@ -652,6 +652,13 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
>      ctx.save.callbacks = callbacks;
>      ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>      ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
> +    ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
> +
> +    if ( ctx.checkpointed ) {

For better or for worse, I have been using the Xen style, as libxc is a
mix of the two.  For consistency with the rest of the migration v2 code,
this brace should be on the next line.

~Andrew

> +        /* This is a checkpointed save, we need these callbacks */
> +        assert(ctx.save.callbacks->checkpoint);
> +        assert(ctx.save.callbacks->postcopy);
> +    }
>  
>      IPRINTF("In experimental %s", __func__);
>      DPRINTF("fd %d, dom %"PRIu32", max_iters %"PRIu32", max_factor %"PRIu32
> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> index 55755cf..72be3bf 100644
> --- a/tools/libxc/xenguest.h
> +++ b/tools/libxc/xenguest.h
> @@ -28,6 +28,7 @@
>  #define XCFLAGS_HVM       (1 << 2)
>  #define XCFLAGS_STDVGA    (1 << 3)
>  #define XCFLAGS_CHECKPOINT_COMPRESS    (1 << 4)
> +#define XCFLAGS_CHECKPOINTED    (1 << 5)
>  
>  #define X86_64_B_SIZE   64 
>  #define X86_32_B_SIZE   32
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 83eb29a..4a3fd73 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1573,6 +1573,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>  
>      if (r_info != NULL) {
>          dss->interval = r_info->interval;
> +        dss->xcflags |= XCFLAGS_CHECKPOINTED;
>          if (r_info->compression)
>              dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
>      }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream
  2014-07-09  9:45   ` Andrew Cooper
@ 2014-07-09  9:53     ` Hongyang Yang
  0 siblings, 0 replies; 25+ messages in thread
From: Hongyang Yang @ 2014-07-09  9:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

Hi Andrew,

   Thanks for the comments!

On 07/09/2014 05:45 PM, Andrew Cooper wrote:
> On 09/07/14 08:47, Yang Hongyang wrote:
>> add a bool variable to migration context to indicate checkpointed stream
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxc/saverestore/common.h | 1 +
>>   tools/libxc/saverestore/save.c   | 7 +++++++
>>   tools/libxc/xenguest.h           | 1 +
>>   tools/libxl/libxl_dom.c          | 1 +
>>   4 files changed, 10 insertions(+)
>>
>> diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
>> index 4840d3f..24ba95b 100644
>> --- a/tools/libxc/saverestore/common.h
>> +++ b/tools/libxc/saverestore/common.h
>> @@ -152,6 +152,7 @@ struct xc_sr_context
>>       int fd;
>>
>>       xc_dominfo_t dominfo;
>> +    bool checkpointed;
>
> This appears to only be used on the save side, so should live in the
> .save union.  Also, a comment indicating that this is a remus option.

I also consider to put this in the .save union, but I assume it will
also be used in restore code... So does ctx.firsttime
There's a checkpointed_stream parameter in xc_domain_restore()...

>
>>
>>       union
>>       {
>> diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
>> index 653dc8e..d2fa8a6 100644
>> --- a/tools/libxc/saverestore/save.c
>> +++ b/tools/libxc/saverestore/save.c
>> @@ -652,6 +652,13 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
>>       ctx.save.callbacks = callbacks;
>>       ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>>       ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
>> +    ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
>> +
>> +    if ( ctx.checkpointed ) {
>
> For better or for worse, I have been using the Xen style, as libxc is a
> mix of the two.  For consistency with the rest of the migration v2 code,
> this brace should be on the next line.

Get it!

>
> ~Andrew
>
>> +        /* This is a checkpointed save, we need these callbacks */
>> +        assert(ctx.save.callbacks->checkpoint);
>> +        assert(ctx.save.callbacks->postcopy);
>> +    }
>>
>>       IPRINTF("In experimental %s", __func__);
>>       DPRINTF("fd %d, dom %"PRIu32", max_iters %"PRIu32", max_factor %"PRIu32
>> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
>> index 55755cf..72be3bf 100644
>> --- a/tools/libxc/xenguest.h
>> +++ b/tools/libxc/xenguest.h
>> @@ -28,6 +28,7 @@
>>   #define XCFLAGS_HVM       (1 << 2)
>>   #define XCFLAGS_STDVGA    (1 << 3)
>>   #define XCFLAGS_CHECKPOINT_COMPRESS    (1 << 4)
>> +#define XCFLAGS_CHECKPOINTED    (1 << 5)
>>
>>   #define X86_64_B_SIZE   64
>>   #define X86_32_B_SIZE   32
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 83eb29a..4a3fd73 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1573,6 +1573,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>>
>>       if (r_info != NULL) {
>>           dss->interval = r_info->interval;
>> +        dss->xcflags |= XCFLAGS_CHECKPOINTED;
>>           if (r_info->compression)
>>               dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
>>       }
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/3] Remus: add remus support for migration v2
  2014-07-09  8:53 ` [RFC PATCH 0/3] Remus: add remus support for migration v2 Ian Campbell
@ 2014-07-09  9:56   ` Hongyang Yang
  0 siblings, 0 replies; 25+ messages in thread
From: Hongyang Yang @ 2014-07-09  9:56 UTC (permalink / raw)
  To: Ian Campbell; +Cc: rshriram, andrew.cooper3, Ian.Jackson, xen-devel



On 07/09/2014 04:53 PM, Ian Campbell wrote:
> On Wed, 2014-07-09 at 15:47 +0800, Yang Hongyang wrote:
>> This patchset adds remus support for migration v2, it is based on
>> Andrew's "[PATCH v6 0/13] Migration Stream v2".
>>
>> Only tested on x86 pv guests. It is a PoC, without performence tuning
>> and mem compress support.
>>
>> The code is presented here for comment/query/critism and wider the
>> discussion of remus support on migration v2.
>
> Thanks.
>
> The main thing which I would have expected to see as the first thing
> here is an update to docs/specs/libxc-migration-stream.pandoc which
> describes the protocol enhancement which this surely involves.

Sure, the docs should be updated first. Will do in next version.

>
> Ian.
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/3] Remus: add remus support for migration v2
  2014-07-09  9:42 ` Andrew Cooper
@ 2014-07-09 10:06   ` Hongyang Yang
  0 siblings, 0 replies; 25+ messages in thread
From: Hongyang Yang @ 2014-07-09 10:06 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell



On 07/09/2014 05:42 PM, Andrew Cooper wrote:
> On 09/07/14 08:47, Yang Hongyang wrote:
>> This patchset adds remus support for migration v2, it is based on
>> Andrew's "[PATCH v6 0/13] Migration Stream v2".
>>
>> Only tested on x86 pv guests. It is a PoC, without performence tuning
>> and mem compress support.
>>
>> The code is presented here for comment/query/critism and wider the
>> discussion of remus support on migration v2.
>>
>> Yang Hongyang (3):
>>    remus: add a bool var to indicate checkpointed stream
>>    remus: implement remus checkpoint in v2 save
>>    remus: adjust x86 pv restore to support remus
>>
>>   tools/libxc/saverestore/common.h         |  8 +++
>>   tools/libxc/saverestore/restore_x86_pv.c | 67 +++++++++++++++-------
>>   tools/libxc/saverestore/save.c           | 95 ++++++++++++++++++++------------
>>   tools/libxc/xenguest.h                   |  1 +
>>   tools/libxl/libxl_dom.c                  |  1 +
>>   5 files changed, 118 insertions(+), 54 deletions(-)
>>
>
> Thankyou very much for doing this!
>
> I am surprised quite how little code it it, but that can only be a good
> thing.

Yes, the whole checkpoint process is quite simple(only the process itself)...
v2 makes it more clear to hack the code.

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-09  7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
@ 2014-07-09 10:53   ` Andrew Cooper
  2014-07-10  3:25     ` Hongyang Yang
  2014-07-16 15:22   ` Shriram Rajagopalan
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-09 10:53 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 09/07/14 08:47, Yang Hongyang wrote:
> implement remus checkpoint in v2 save
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/saverestore/common.h |  1 +
>  tools/libxc/saverestore/save.c   | 88 ++++++++++++++++++++++++----------------
>  2 files changed, 55 insertions(+), 34 deletions(-)
>
> diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
> index 24ba95b..1dd9f51 100644
> --- a/tools/libxc/saverestore/common.h
> +++ b/tools/libxc/saverestore/common.h
> @@ -153,6 +153,7 @@ struct xc_sr_context
>  
>      xc_dominfo_t dominfo;
>      bool checkpointed;
> +    bool firsttime;

This is also only used on the save side.

>  
>      union
>      {
> diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
> index d2fa8a6..98a5c2f 100644
> --- a/tools/libxc/saverestore/save.c
> +++ b/tools/libxc/saverestore/save.c
> @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
>          goto out;
>      }
>  
> +    if ( ctx->checkpointed && !ctx->firsttime )
> +        goto lastiter;
>      /* This juggling is required if logdirty is already on, e.g. VRAM tracking */
>      if ( xc_shadow_control(xch, ctx->domid,
>                             XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
> @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
>              break;
>      }
>  
> +lastiter:
>      rc = suspend_domain(ctx);
>      if ( rc )
>          goto out;
> @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>      if ( rc )
>          goto err;
>  
> -    rc = ctx->save.ops.start_of_stream(ctx);
> -    if ( rc )
> -        goto err;
> +    do {
> +        rc = ctx->save.ops.start_of_stream(ctx);
> +        if ( rc )
> +            goto err;

I am not sure start_of_stream() wants to be inside the loop.  For PV
guests, it sends the X86_PV_INFO which is only expected to be sent
once.  The X86_PV_P2M_FRAMES record is deliberately safe to send
multiple times (in the hope that someone might evenutally fix the
ballooning issues), but is a waste of time to send like this, as its
content wont be changing.

>  
> -    if ( ctx->save.live )
> -    {
> -        DPRINTF("Starting live migrate");
> -        rc = send_domain_memory_live(ctx);
> -    }
> -    else
> -    {
> -        DPRINTF("Starting nonlive save");
> -        rc = send_domain_memory_nonlive(ctx);
> -    }
> +        if ( ctx->save.live )
> +        {
> +            DPRINTF("Starting live migrate");
> +            rc = send_domain_memory_live(ctx);
> +        }
> +        else
> +        {
> +            DPRINTF("Starting nonlive save");
> +            rc = send_domain_memory_nonlive(ctx);
> +        }
>  
> -    if ( rc )
> -        goto err;
> +        if ( rc )
> +            goto err;
>  
> -    /* Refresh domain information now it has paused. */
> -    if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
> -         (ctx->dominfo.domid != ctx->domid) )
> -    {
> -        PERROR("Unable to refresh domain information");
> -        rc = -1;
> -        goto err;
> -    }
> -    else if ( (!ctx->dominfo.shutdown ||
> -               ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
> -              !ctx->dominfo.paused )
> -    {
> -        ERROR("Domain has not been suspended");
> -        rc = -1;
> -        goto err;
> -    }
> +        /* Refresh domain information now it has paused. */
> +        if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
> +             (ctx->dominfo.domid != ctx->domid) )
> +        {
> +            PERROR("Unable to refresh domain information");
> +            rc = -1;
> +            goto err;
> +        }
> +        else if ( (!ctx->dominfo.shutdown ||
> +                  ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
> +                  !ctx->dominfo.paused )
> +        {
> +            ERROR("Domain has not been suspended");
> +            rc = -1;
> +            goto err;
> +        }
>  
> -    rc = ctx->save.ops.end_of_stream(ctx);
> -    if ( rc )
> -        goto err;
> +        rc = ctx->save.ops.end_of_stream(ctx);
> +        if ( rc )
> +            goto err;
> +
> +        if ( ctx->checkpointed ) {
> +            if ( ctx->firsttime )
> +                ctx->firsttime = false;
> +
> +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);

Can postcopy() fail?

~Andrew

> +
> +            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
> +            if ( rc > 0 ) {
> +                IPRINTF("Next checkpoint\n");
> +            } else {
> +                ctx->checkpointed = false;
> +            }
> +        }
> +    } while ( ctx->checkpointed );
>  
>      rc = write_end_record(ctx);
>      if ( rc )
> @@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
>      ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>      ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
>      ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
> +    ctx.firsttime = true;
>  
>      if ( ctx.checkpointed ) {
>          /* This is a checkpointed save, we need these callbacks */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-09  7:47 ` [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus Yang Hongyang
@ 2014-07-09 11:16   ` Andrew Cooper
  2014-07-09 11:26     ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-09 11:16 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 09/07/14 08:47, Yang Hongyang wrote:
> cache vcpu context when restore, and set context when stream
> complete.

Can you explain why this is needed?  I can't see why it should be required.

>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/saverestore/common.h         |  6 +++
>  tools/libxc/saverestore/restore_x86_pv.c | 67 ++++++++++++++++++++++----------
>  2 files changed, 53 insertions(+), 20 deletions(-)
>
> diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
> index 1dd9f51..ba54f83 100644
> --- a/tools/libxc/saverestore/common.h
> +++ b/tools/libxc/saverestore/common.h
> @@ -235,6 +235,12 @@ struct xc_sr_context
>  
>              /* Read-only mapping of guests shared info page */
>              shared_info_any_t *shinfo;
> +
> +            /*
> +             * We need to cache vcpu context when doing checkpointed
> +             * restore, otherwise restore will fail
> +             */
> +            vcpu_guest_context_any_t *vcpu;

"vcpus" seems more appropriate.

>          } x86_pv;
>      };
>  };
> diff --git a/tools/libxc/saverestore/restore_x86_pv.c b/tools/libxc/saverestore/restore_x86_pv.c
> index 14fc020..7a54047 100644
> --- a/tools/libxc/saverestore/restore_x86_pv.c
> +++ b/tools/libxc/saverestore/restore_x86_pv.c
> @@ -428,8 +428,8 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
>  {
>      xc_interface *xch = ctx->xch;
>      struct xc_sr_rec_x86_pv_vcpu_hdr *vhdr = rec->data;
> -    vcpu_guest_context_any_t vcpu;
> -    size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof(vcpu.x64) : sizeof(vcpu.x32);
> +    vcpu_guest_context_any_t *vcpu;
> +    size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof(vcpu->x64) : sizeof(vcpu->x32);

Constructs like this are used in several places.  A

static inline size_t pv_vcpu_size(const struct xc_sr_context *ctx)

helper in common_x86.h would be a neat improvement.

>      xen_pfn_t pfn, mfn;
>      unsigned long tmp;
>      unsigned i;
> @@ -455,22 +455,23 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
>          goto err;
>      }
>  
> -    memcpy(&vcpu, &vhdr->context, vcpusz);
> +    vcpu = (void *)ctx->x86_pv.vcpu + vcpusz * vhdr->vcpu_id;

vhdr->vcpu_id is essentially untrusted input at this point.  It needs to
be validated against dominfo.max_vcpu_id if it is to be used like this.

It was safe before as the first place it was used was with the set
hypercall which would validate in Xen.

> +    memcpy(vcpu, &vhdr->context, vcpusz);

If you are caching the context, the rest of this function can also be
deferred until the end, so it is run once rather than N times.

>  
>      /* Vcpu 0 is special: Convert the suspend record to an mfn. */
>      if ( vhdr->vcpu_id == 0 )
>      {
> -        rc = process_start_info(ctx, &vcpu);
> +        rc = process_start_info(ctx, vcpu);
>          if ( rc )
>              return rc;
>          rc = -1;
>      }
>  
> -    SET_FIELD(&vcpu, flags,
> -              GET_FIELD(&vcpu, flags, ctx->x86_pv.width) | VGCF_online,
> +    SET_FIELD(vcpu, flags,
> +              GET_FIELD(vcpu, flags, ctx->x86_pv.width) | VGCF_online,
>                ctx->x86_pv.width);
>  
> -    tmp = GET_FIELD(&vcpu, gdt_ents, ctx->x86_pv.width);
> +    tmp = GET_FIELD(vcpu, gdt_ents, ctx->x86_pv.width);
>      if ( tmp > 8192 )
>      {
>          ERROR("GDT entry count (%lu) out of range", tmp);
> @@ -481,7 +482,7 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
>      /* Convert GDT frames to mfns. */
>      for ( i = 0; (i * 512) < tmp; ++i )
>      {
> -        pfn = GET_FIELD(&vcpu, gdt_frames[i], ctx->x86_pv.width);
> +        pfn = GET_FIELD(vcpu, gdt_frames[i], ctx->x86_pv.width);
>          if ( pfn > ctx->x86_pv.max_pfn )
>          {
>              ERROR("GDT frame %u (pfn %#lx) out of range", i, pfn);
> @@ -502,11 +503,11 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
>              goto err;
>          }
>  
> -        SET_FIELD(&vcpu, gdt_frames[i], mfn, ctx->x86_pv.width);
> +        SET_FIELD(vcpu, gdt_frames[i], mfn, ctx->x86_pv.width);
>      }
>  
>      /* Convert CR3 to an mfn. */
> -    pfn = cr3_to_mfn(ctx, GET_FIELD(&vcpu, ctrlreg[3], ctx->x86_pv.width));
> +    pfn = cr3_to_mfn(ctx, GET_FIELD(vcpu, ctrlreg[3], ctx->x86_pv.width));
>      if ( pfn > ctx->x86_pv.max_pfn )
>      {
>          ERROR("cr3 (pfn %#lx) out of range", pfn);
> @@ -529,12 +530,12 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
>          goto err;
>      }
>  
> -    SET_FIELD(&vcpu, ctrlreg[3], mfn_to_cr3(ctx, mfn), ctx->x86_pv.width);
> +    SET_FIELD(vcpu, ctrlreg[3], mfn_to_cr3(ctx, mfn), ctx->x86_pv.width);
>  
>      /* 64bit guests: Convert CR1 (guest pagetables) to mfn. */
> -    if ( ctx->x86_pv.levels == 4 && (vcpu.x64.ctrlreg[1] & 1) )
> +    if ( ctx->x86_pv.levels == 4 && (vcpu->x64.ctrlreg[1] & 1) )
>      {
> -        pfn = vcpu.x64.ctrlreg[1] >> PAGE_SHIFT;
> +        pfn = vcpu->x64.ctrlreg[1] >> PAGE_SHIFT;
>  
>          if ( pfn > ctx->x86_pv.max_pfn )
>          {
> @@ -558,13 +559,7 @@ static int handle_x86_pv_vcpu_basic(struct xc_sr_context *ctx, struct xc_sr_reco
>              goto err;
>          }
>  
> -        vcpu.x64.ctrlreg[1] = (uint64_t)mfn << PAGE_SHIFT;
> -    }
> -
> -    if ( xc_vcpu_setcontext(xch, ctx->domid, vhdr->vcpu_id, &vcpu) )
> -    {
> -        PERROR("Failed to set vcpu%u's basic info", vhdr->vcpu_id);
> -        goto err;
> +        vcpu->x64.ctrlreg[1] = (uint64_t)mfn << PAGE_SHIFT;
>      }
>  
>      rc = 0;
> @@ -881,6 +876,7 @@ static int x86_pv_localise_page(struct xc_sr_context *ctx, uint32_t type, void *
>  static int x86_pv_setup(struct xc_sr_context *ctx)
>  {
>      xc_interface *xch = ctx->xch;
> +    size_t vcpusz;
>      int rc;
>  
>      if ( ctx->restore.guest_type != DHDR_TYPE_X86_PV )
> @@ -904,6 +900,9 @@ static int x86_pv_setup(struct xc_sr_context *ctx)
>      if ( rc )
>          return rc;
>  
> +    vcpusz = ctx->x86_pv.width == 8 ?
> +             sizeof(ctx->x86_pv.vcpu->x64) : sizeof(ctx->x86_pv.vcpu->x32);
> +    ctx->x86_pv.vcpu = calloc((ctx->dominfo.max_vcpu_id + 1) , vcpusz);

Extraneous brackets and space.

>      return rc;
>  }
>  
> @@ -946,6 +945,29 @@ static int x86_pv_process_record(struct xc_sr_context *ctx, struct xc_sr_record
>      }
>  }
>  
> +static int update_vcpu_context(struct xc_sr_context *ctx)
> +{
> +    xc_interface *xch = ctx->xch;
> +    vcpu_guest_context_any_t *vcpu;
> +    uint32_t vcpu_id;
> +    size_t vcpusz = ctx->x86_pv.width == 8 ? sizeof(vcpu->x64) : sizeof(vcpu->x32);
> +
> +    for ( vcpu_id = 0; vcpu_id <= ctx->dominfo.max_vcpu_id; vcpu_id++ )
> +    {
> +        vcpu = (void *)ctx->x86_pv.vcpu + vcpu_id * vcpusz;
> +        if ( !vcpu )
> +            continue;

This check can never fail.

> +
> +        if ( xc_vcpu_setcontext(xch, ctx->domid, vcpu_id, vcpu) )

There is a latent bug introduced here.  If X86_PV_VCPU_BASIC records
have not been seen for all vcpus, this code will attempt to set a
context of a block of zeroes, and fail because of a bad cr3.

~Andrew

> +        {
> +            PERROR("Failed to set vcpu%u's basic info", vcpu_id);
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  /*
>   * restore_ops function.  Pin the pagetables, rewrite the p2m and seed the
>   * grant table.
> @@ -963,6 +985,10 @@ static int x86_pv_stream_complete(struct xc_sr_context *ctx)
>      if ( rc )
>          return rc;
>  
> +    rc = update_vcpu_context(ctx);
> +    if ( rc )
> +        return rc;
> +
>      rc = xc_dom_gnttab_seed(xch, ctx->domid,
>                              ctx->restore.console_mfn,
>                              ctx->restore.xenstore_mfn,
> @@ -985,6 +1011,7 @@ static int x86_pv_cleanup(struct xc_sr_context *ctx)
>      free(ctx->x86_pv.p2m);
>      free(ctx->x86_pv.p2m_pfns);
>      free(ctx->x86_pv.pfn_types);
> +    free(ctx->x86_pv.vcpu);
>  
>      if ( ctx->x86_pv.m2p )
>          munmap(ctx->x86_pv.m2p, ctx->x86_pv.nr_m2p_frames * PAGE_SIZE);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-09 11:16   ` Andrew Cooper
@ 2014-07-09 11:26     ` Andrew Cooper
  2014-07-10  3:30       ` Hongyang Yang
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-09 11:26 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 09/07/14 12:16, Andrew Cooper wrote:
> On 09/07/14 08:47, Yang Hongyang wrote:
>> cache vcpu context when restore, and set context when stream
>> complete.
> Can you explain why this is needed?  I can't see why it should be required.

Actually, as part of reviewing this I have worked out why this is needed.

It is a latent bug in the migration v2 series with all the x86 pv vcpu
state (not just the basic state), which is not triggered by a well
behaved sender.

I shall fix it up in the base series.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-09 10:53   ` Andrew Cooper
@ 2014-07-10  3:25     ` Hongyang Yang
  2014-07-10  8:49       ` Ian Campbell
  2014-07-10  9:24       ` Andrew Cooper
  0 siblings, 2 replies; 25+ messages in thread
From: Hongyang Yang @ 2014-07-10  3:25 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell



On 07/09/2014 06:53 PM, Andrew Cooper wrote:
> On 09/07/14 08:47, Yang Hongyang wrote:
>> implement remus checkpoint in v2 save
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxc/saverestore/common.h |  1 +
>>   tools/libxc/saverestore/save.c   | 88 ++++++++++++++++++++++++----------------
>>   2 files changed, 55 insertions(+), 34 deletions(-)
>>
>> diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
>> index 24ba95b..1dd9f51 100644
>> --- a/tools/libxc/saverestore/common.h
>> +++ b/tools/libxc/saverestore/common.h
>> @@ -153,6 +153,7 @@ struct xc_sr_context
>>
>>       xc_dominfo_t dominfo;
>>       bool checkpointed;
>> +    bool firsttime;
>
> This is also only used on the save side.

Yes, the restore side won't use this by now, but I'm not sure it will
be used later, maybe it can be moved to .save union now, and when we need
to use it in restore side, we can then move it out.

In v2, the checkpointed_stream parameter in xc_domain_restore() seems
not necessary, can we remove it? cause remove it will breaks the API...

>
>>
>>       union
>>       {
>> diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
>> index d2fa8a6..98a5c2f 100644
>> --- a/tools/libxc/saverestore/save.c
>> +++ b/tools/libxc/saverestore/save.c
>> @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
>>           goto out;
>>       }
>>
>> +    if ( ctx->checkpointed && !ctx->firsttime )
>> +        goto lastiter;
>>       /* This juggling is required if logdirty is already on, e.g. VRAM tracking */
>>       if ( xc_shadow_control(xch, ctx->domid,
>>                              XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
>> @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
>>               break;
>>       }
>>
>> +lastiter:
>>       rc = suspend_domain(ctx);
>>       if ( rc )
>>           goto out;
>> @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>>       if ( rc )
>>           goto err;
>>
>> -    rc = ctx->save.ops.start_of_stream(ctx);
>> -    if ( rc )
>> -        goto err;
>> +    do {
>> +        rc = ctx->save.ops.start_of_stream(ctx);
>> +        if ( rc )
>> +            goto err;
>
> I am not sure start_of_stream() wants to be inside the loop.  For PV
> guests, it sends the X86_PV_INFO which is only expected to be sent
> once.  The X86_PV_P2M_FRAMES record is deliberately safe to send
> multiple times (in the hope that someone might evenutally fix the
> ballooning issues), but is a waste of time to send like this, as its
> content wont be changing.

It you make sure all records that has been sent in start_of_stream()
wont be changing, then we can surely move this out of the loop.

>
>>
>> -    if ( ctx->save.live )
>> -    {
>> -        DPRINTF("Starting live migrate");
>> -        rc = send_domain_memory_live(ctx);
>> -    }
>> -    else
>> -    {
>> -        DPRINTF("Starting nonlive save");
>> -        rc = send_domain_memory_nonlive(ctx);
>> -    }
>> +        if ( ctx->save.live )
>> +        {
>> +            DPRINTF("Starting live migrate");
>> +            rc = send_domain_memory_live(ctx);
>> +        }
>> +        else
>> +        {
>> +            DPRINTF("Starting nonlive save");
>> +            rc = send_domain_memory_nonlive(ctx);
>> +        }
>>
>> -    if ( rc )
>> -        goto err;
>> +        if ( rc )
>> +            goto err;
>>
>> -    /* Refresh domain information now it has paused. */
>> -    if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
>> -         (ctx->dominfo.domid != ctx->domid) )
>> -    {
>> -        PERROR("Unable to refresh domain information");
>> -        rc = -1;
>> -        goto err;
>> -    }
>> -    else if ( (!ctx->dominfo.shutdown ||
>> -               ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
>> -              !ctx->dominfo.paused )
>> -    {
>> -        ERROR("Domain has not been suspended");
>> -        rc = -1;
>> -        goto err;
>> -    }
>> +        /* Refresh domain information now it has paused. */
>> +        if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
>> +             (ctx->dominfo.domid != ctx->domid) )
>> +        {
>> +            PERROR("Unable to refresh domain information");
>> +            rc = -1;
>> +            goto err;
>> +        }
>> +        else if ( (!ctx->dominfo.shutdown ||
>> +                  ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
>> +                  !ctx->dominfo.paused )
>> +        {
>> +            ERROR("Domain has not been suspended");
>> +            rc = -1;
>> +            goto err;
>> +        }
>>
>> -    rc = ctx->save.ops.end_of_stream(ctx);
>> -    if ( rc )
>> -        goto err;
>> +        rc = ctx->save.ops.end_of_stream(ctx);
>> +        if ( rc )
>> +            goto err;
>> +
>> +        if ( ctx->checkpointed ) {
>> +            if ( ctx->firsttime )
>> +                ctx->firsttime = false;
>> +
>> +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>
> Can postcopy() fail?

It can, this was inherited from the legacy code, but should be improved
here by adding error check.

>
> ~Andrew
>
>> +
>> +            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>> +            if ( rc > 0 ) {
>> +                IPRINTF("Next checkpoint\n");
>> +            } else {
>> +                ctx->checkpointed = false;
>> +            }
>> +        }
>> +    } while ( ctx->checkpointed );
>>
>>       rc = write_end_record(ctx);
>>       if ( rc )
>> @@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
>>       ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>>       ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
>>       ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
>> +    ctx.firsttime = true;
>>
>>       if ( ctx.checkpointed ) {
>>           /* This is a checkpointed save, we need these callbacks */
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-09 11:26     ` Andrew Cooper
@ 2014-07-10  3:30       ` Hongyang Yang
  2014-07-10  9:25         ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Hongyang Yang @ 2014-07-10  3:30 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell



On 07/09/2014 07:26 PM, Andrew Cooper wrote:
> On 09/07/14 12:16, Andrew Cooper wrote:
>> On 09/07/14 08:47, Yang Hongyang wrote:
>>> cache vcpu context when restore, and set context when stream
>>> complete.
>> Can you explain why this is needed?  I can't see why it should be required.
>
> Actually, as part of reviewing this I have worked out why this is needed.
>
> It is a latent bug in the migration v2 series with all the x86 pv vcpu
> state (not just the basic state), which is not triggered by a well
> behaved sender.
>
> I shall fix it up in the base series.

That's great, remember the bug I talked to you on IRC last time? This
patch was targeted to avoid the bug, but I don't know why this bug
happened, which I can tell is that if we don't cache the state,
we will get mapping error when restore the CPU state next time,
can you explain it in detail? Thanks in advance.

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-10  3:25     ` Hongyang Yang
@ 2014-07-10  8:49       ` Ian Campbell
  2014-07-10  9:24       ` Andrew Cooper
  1 sibling, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-07-10  8:49 UTC (permalink / raw)
  To: Hongyang Yang; +Cc: rshriram, Andrew Cooper, Ian.Jackson, xen-devel

On Thu, 2014-07-10 at 11:25 +0800, Hongyang Yang wrote:

> In v2, the checkpointed_stream parameter in xc_domain_restore() seems
> not necessary, can we remove it? cause remove it will breaks the API...

The libxc APIs can be changed if necessary, yes.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-10  3:25     ` Hongyang Yang
  2014-07-10  8:49       ` Ian Campbell
@ 2014-07-10  9:24       ` Andrew Cooper
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2014-07-10  9:24 UTC (permalink / raw)
  To: Hongyang Yang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 10/07/14 04:25, Hongyang Yang wrote:
>
>
> On 07/09/2014 06:53 PM, Andrew Cooper wrote:
>> On 09/07/14 08:47, Yang Hongyang wrote:
>>> implement remus checkpoint in v2 save
>>>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> ---
>>>   tools/libxc/saverestore/common.h |  1 +
>>>   tools/libxc/saverestore/save.c   | 88
>>> ++++++++++++++++++++++++----------------
>>>   2 files changed, 55 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/tools/libxc/saverestore/common.h
>>> b/tools/libxc/saverestore/common.h
>>> index 24ba95b..1dd9f51 100644
>>> --- a/tools/libxc/saverestore/common.h
>>> +++ b/tools/libxc/saverestore/common.h
>>> @@ -153,6 +153,7 @@ struct xc_sr_context
>>>
>>>       xc_dominfo_t dominfo;
>>>       bool checkpointed;
>>> +    bool firsttime;
>>
>> This is also only used on the save side.
>
> Yes, the restore side won't use this by now, but I'm not sure it will
> be used later, maybe it can be moved to .save union now, and when we need
> to use it in restore side, we can then move it out.

I would prefer things like this to move into the most specific place
they can live.  It helps spot issues.  e.g. it is obvious that anything
using ctx->save.$FOO on the restore path is wrong.

>
> In v2, the checkpointed_stream parameter in xc_domain_restore() seems
> not necessary, can we remove it? cause remove it will breaks the API...

libxc is free to change.  I plan to drop as many arguments as possible
when the legacy migration code is removed.

>
>>
>>>
>>>       union
>>>       {
>>> diff --git a/tools/libxc/saverestore/save.c
>>> b/tools/libxc/saverestore/save.c
>>> index d2fa8a6..98a5c2f 100644
>>> --- a/tools/libxc/saverestore/save.c
>>> +++ b/tools/libxc/saverestore/save.c
>>> @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct
>>> xc_sr_context *ctx)
>>>           goto out;
>>>       }
>>>
>>> +    if ( ctx->checkpointed && !ctx->firsttime )
>>> +        goto lastiter;
>>>       /* This juggling is required if logdirty is already on, e.g.
>>> VRAM tracking */
>>>       if ( xc_shadow_control(xch, ctx->domid,
>>>                              XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
>>> @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct
>>> xc_sr_context *ctx)
>>>               break;
>>>       }
>>>
>>> +lastiter:
>>>       rc = suspend_domain(ctx);
>>>       if ( rc )
>>>           goto out;
>>> @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx,
>>> uint16_t guest_type)
>>>       if ( rc )
>>>           goto err;
>>>
>>> -    rc = ctx->save.ops.start_of_stream(ctx);
>>> -    if ( rc )
>>> -        goto err;
>>> +    do {
>>> +        rc = ctx->save.ops.start_of_stream(ctx);
>>> +        if ( rc )
>>> +            goto err;
>>
>> I am not sure start_of_stream() wants to be inside the loop.  For PV
>> guests, it sends the X86_PV_INFO which is only expected to be sent
>> once.  The X86_PV_P2M_FRAMES record is deliberately safe to send
>> multiple times (in the hope that someone might evenutally fix the
>> ballooning issues), but is a waste of time to send like this, as its
>> content wont be changing.
>
> It you make sure all records that has been sent in start_of_stream()
> wont be changing, then we can surely move this out of the loop.

The X86_PV_INFO record must only be sent once.  Sending it repeatedly
with the same contents is not disasterous, but sending it with different
contents certainly is.

There is nothing the restorer can do other than bail if it sees that the
sending has switches between being a 32bit or a 64bit VM.


If you need some non-memory records resending at the start of each
checkpoint iteration then feel free to add in a new function to
save_ops, but currently I don't believe it is needed.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-10  3:30       ` Hongyang Yang
@ 2014-07-10  9:25         ` Andrew Cooper
  2014-07-10  9:32           ` Hongyang Yang
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-10  9:25 UTC (permalink / raw)
  To: Hongyang Yang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 10/07/14 04:30, Hongyang Yang wrote:
>
>
> On 07/09/2014 07:26 PM, Andrew Cooper wrote:
>> On 09/07/14 12:16, Andrew Cooper wrote:
>>> On 09/07/14 08:47, Yang Hongyang wrote:
>>>> cache vcpu context when restore, and set context when stream
>>>> complete.
>>> Can you explain why this is needed?  I can't see why it should be
>>> required.
>>
>> Actually, as part of reviewing this I have worked out why this is
>> needed.
>>
>> It is a latent bug in the migration v2 series with all the x86 pv vcpu
>> state (not just the basic state), which is not triggered by a well
>> behaved sender.
>>
>> I shall fix it up in the base series.
>
> That's great, remember the bug I talked to you on IRC last time? This
> patch was targeted to avoid the bug, but I don't know why this bug
> happened, which I can tell is that if we don't cache the state,
> we will get mapping error when restore the CPU state next time,
> can you explain it in detail? Thanks in advance.

Once you have loaded cr3 (and cr1 for 64bit guests) once, the pages
containing pagetable data turn into real pagetables, after which the
restorer can no longer map them RW and update their contents.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-10  9:25         ` Andrew Cooper
@ 2014-07-10  9:32           ` Hongyang Yang
  2014-07-10  9:42             ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Hongyang Yang @ 2014-07-10  9:32 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell



On 07/10/2014 05:25 PM, Andrew Cooper wrote:
> On 10/07/14 04:30, Hongyang Yang wrote:
>>
>>
>> On 07/09/2014 07:26 PM, Andrew Cooper wrote:
>>> On 09/07/14 12:16, Andrew Cooper wrote:
>>>> On 09/07/14 08:47, Yang Hongyang wrote:
>>>>> cache vcpu context when restore, and set context when stream
>>>>> complete.
>>>> Can you explain why this is needed?  I can't see why it should be
>>>> required.
>>>
>>> Actually, as part of reviewing this I have worked out why this is
>>> needed.
>>>
>>> It is a latent bug in the migration v2 series with all the x86 pv vcpu
>>> state (not just the basic state), which is not triggered by a well
>>> behaved sender.
>>>
>>> I shall fix it up in the base series.
>>
>> That's great, remember the bug I talked to you on IRC last time? This
>> patch was targeted to avoid the bug, but I don't know why this bug
>> happened, which I can tell is that if we don't cache the state,
>> we will get mapping error when restore the CPU state next time,
>> can you explain it in detail? Thanks in advance.
>
> Once you have loaded cr3 (and cr1 for 64bit guests) once, the pages
> containing pagetable data turn into real pagetables, after which the
> restorer can no longer map them RW and update their contents.

That's the point, thank you for the explanation! I was wondering how
you will fix it up? defer the load of the cr3 by cacheing the cpu state
or something else? maybe pin/unpin pagetables will also help?

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-10  9:32           ` Hongyang Yang
@ 2014-07-10  9:42             ` Andrew Cooper
  2014-07-10  9:47               ` Hongyang Yang
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-10  9:42 UTC (permalink / raw)
  To: Hongyang Yang, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell

On 10/07/14 10:32, Hongyang Yang wrote:
>
>
> On 07/10/2014 05:25 PM, Andrew Cooper wrote:
>> On 10/07/14 04:30, Hongyang Yang wrote:
>>>
>>>
>>> On 07/09/2014 07:26 PM, Andrew Cooper wrote:
>>>> On 09/07/14 12:16, Andrew Cooper wrote:
>>>>> On 09/07/14 08:47, Yang Hongyang wrote:
>>>>>> cache vcpu context when restore, and set context when stream
>>>>>> complete.
>>>>> Can you explain why this is needed?  I can't see why it should be
>>>>> required.
>>>>
>>>> Actually, as part of reviewing this I have worked out why this is
>>>> needed.
>>>>
>>>> It is a latent bug in the migration v2 series with all the x86 pv vcpu
>>>> state (not just the basic state), which is not triggered by a well
>>>> behaved sender.
>>>>
>>>> I shall fix it up in the base series.
>>>
>>> That's great, remember the bug I talked to you on IRC last time? This
>>> patch was targeted to avoid the bug, but I don't know why this bug
>>> happened, which I can tell is that if we don't cache the state,
>>> we will get mapping error when restore the CPU state next time,
>>> can you explain it in detail? Thanks in advance.
>>
>> Once you have loaded cr3 (and cr1 for 64bit guests) once, the pages
>> containing pagetable data turn into real pagetables, after which the
>> restorer can no longer map them RW and update their contents.
>
> That's the point, thank you for the explanation! I was wondering how
> you will fix it up? defer the load of the cr3 by cacheing the cpu state
> or something else? maybe pin/unpin pagetables will also help?

The pagetable pinning is already deferred until after the end record. 
The vcpu basic records need deferring until the after the end record,
but the rest of the vcpu state should also be deferred (even if only to
avoid performing the hypercalls repeatedly).

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus
  2014-07-10  9:42             ` Andrew Cooper
@ 2014-07-10  9:47               ` Hongyang Yang
  0 siblings, 0 replies; 25+ messages in thread
From: Hongyang Yang @ 2014-07-10  9:47 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: rshriram, Ian.Jackson, Ian.Campbell



On 07/10/2014 05:42 PM, Andrew Cooper wrote:
> On 10/07/14 10:32, Hongyang Yang wrote:
>>
>>
>> On 07/10/2014 05:25 PM, Andrew Cooper wrote:
>>> On 10/07/14 04:30, Hongyang Yang wrote:
>>>>
>>>>
>>>> On 07/09/2014 07:26 PM, Andrew Cooper wrote:
>>>>> On 09/07/14 12:16, Andrew Cooper wrote:
>>>>>> On 09/07/14 08:47, Yang Hongyang wrote:
>>>>>>> cache vcpu context when restore, and set context when stream
>>>>>>> complete.
>>>>>> Can you explain why this is needed?  I can't see why it should be
>>>>>> required.
>>>>>
>>>>> Actually, as part of reviewing this I have worked out why this is
>>>>> needed.
>>>>>
>>>>> It is a latent bug in the migration v2 series with all the x86 pv vcpu
>>>>> state (not just the basic state), which is not triggered by a well
>>>>> behaved sender.
>>>>>
>>>>> I shall fix it up in the base series.
>>>>
>>>> That's great, remember the bug I talked to you on IRC last time? This
>>>> patch was targeted to avoid the bug, but I don't know why this bug
>>>> happened, which I can tell is that if we don't cache the state,
>>>> we will get mapping error when restore the CPU state next time,
>>>> can you explain it in detail? Thanks in advance.
>>>
>>> Once you have loaded cr3 (and cr1 for 64bit guests) once, the pages
>>> containing pagetable data turn into real pagetables, after which the
>>> restorer can no longer map them RW and update their contents.
>>
>> That's the point, thank you for the explanation! I was wondering how
>> you will fix it up? defer the load of the cr3 by cacheing the cpu state
>> or something else? maybe pin/unpin pagetables will also help?
>
> The pagetable pinning is already deferred until after the end record.
> The vcpu basic records need deferring until the after the end record,
> but the rest of the vcpu state should also be deferred (even if only to
> avoid performing the hypercalls repeatedly).

IC, thank you! Will wait for your next version.

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-09  7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
  2014-07-09 10:53   ` Andrew Cooper
@ 2014-07-16 15:22   ` Shriram Rajagopalan
  2014-07-16 15:38     ` Andrew Cooper
  1 sibling, 1 reply; 25+ messages in thread
From: Shriram Rajagopalan @ 2014-07-16 15:22 UTC (permalink / raw)
  To: Yang Hongyang; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6150 bytes --]

On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:

> implement remus checkpoint in v2 save
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/saverestore/common.h |  1 +
>  tools/libxc/saverestore/save.c   | 88
> ++++++++++++++++++++++++----------------
>  2 files changed, 55 insertions(+), 34 deletions(-)
>
> diff --git a/tools/libxc/saverestore/common.h
> b/tools/libxc/saverestore/common.h
> index 24ba95b..1dd9f51 100644
> --- a/tools/libxc/saverestore/common.h
> +++ b/tools/libxc/saverestore/common.h
> @@ -153,6 +153,7 @@ struct xc_sr_context
>
>      xc_dominfo_t dominfo;
>      bool checkpointed;
> +    bool firsttime;
>
>      union
>      {
> diff --git a/tools/libxc/saverestore/save.c
> b/tools/libxc/saverestore/save.c
> index d2fa8a6..98a5c2f 100644
> --- a/tools/libxc/saverestore/save.c
> +++ b/tools/libxc/saverestore/save.c
> @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct
> xc_sr_context *ctx)
>          goto out;
>      }
>
> +    if ( ctx->checkpointed && !ctx->firsttime )
> +        goto lastiter;
>

nit: last_iter

I suggest adding a comment before this code block to explain what has
happened so
far above why we are jumping to the last_iter block skipping the rest of
the stuff below.

I also suggest maintaining some stats structure somewhere (number of dirty
pages,
time when checkpoint was initiated, etc.). They could be simply debug
prints that
can be enabled on demand.

A better alternative would be to somehow pass this stats structure back to
libxl,
such that the user can enable/disable stats printing using the xl remus
command
for example.


>      /* This juggling is required if logdirty is already on, e.g. VRAM
> tracking */
>      if ( xc_shadow_control(xch, ctx->domid,
>                             XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
> @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct
> xc_sr_context *ctx)
>              break;
>      }
>
> +lastiter:
>      rc = suspend_domain(ctx);
>      if ( rc )
>          goto out;
>

I hope the postsuspend callbacks are called somewhere?


> @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t
> guest_type)
>      if ( rc )
>          goto err;
>
> -    rc = ctx->save.ops.start_of_stream(ctx);
> -    if ( rc )
> -        goto err;
> +    do {
> +        rc = ctx->save.ops.start_of_stream(ctx);
> +        if ( rc )
> +            goto err;
>
> -    if ( ctx->save.live )
> -    {
> -        DPRINTF("Starting live migrate");
> -        rc = send_domain_memory_live(ctx);
> -    }
> -    else
> -    {
> -        DPRINTF("Starting nonlive save");
> -        rc = send_domain_memory_nonlive(ctx);
> -    }
> +        if ( ctx->save.live )
> +        {
> +            DPRINTF("Starting live migrate");
>

I suggest using the ctx->firsttime bool var before this printf, so that
when debug print is enabled under remus, the console is not
flooded with the same statement that makes no sense past the
first iteration.


> +            rc = send_domain_memory_live(ctx);
> +        }
> +        else
> +        {
> +            DPRINTF("Starting nonlive save");
> +            rc = send_domain_memory_nonlive(ctx);
> +        }
>
> -    if ( rc )
> -        goto err;
> +        if ( rc )
> +            goto err;
>
> -    /* Refresh domain information now it has paused. */
> -    if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
> -         (ctx->dominfo.domid != ctx->domid) )
> -    {
> -        PERROR("Unable to refresh domain information");
> -        rc = -1;
> -        goto err;
> -    }
> -    else if ( (!ctx->dominfo.shutdown ||
> -               ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
> -              !ctx->dominfo.paused )
> -    {
> -        ERROR("Domain has not been suspended");
> -        rc = -1;
> -        goto err;
> -    }
> +        /* Refresh domain information now it has paused. */
> +        if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1)
> ||
> +             (ctx->dominfo.domid != ctx->domid) )
> +        {
> +            PERROR("Unable to refresh domain information");
> +            rc = -1;
> +            goto err;
> +        }
> +        else if ( (!ctx->dominfo.shutdown ||
> +                  ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
> +                  !ctx->dominfo.paused )
> +        {
> +            ERROR("Domain has not been suspended");
> +            rc = -1;
> +            goto err;
> +        }
>
>
A silly question: Shouldn't this check for 'suspended status' be before the
call to
send_domain_memory_live() [under remus]. I am assuming that the
send_domain_memory_live() function is the one that sends the dirty page data
out on the wire.



> -    rc = ctx->save.ops.end_of_stream(ctx);
> -    if ( rc )
> -        goto err;
> +        rc = ctx->save.ops.end_of_stream(ctx);
> +        if ( rc )
> +            goto err;
> +
> +        if ( ctx->checkpointed ) {
> +            if ( ctx->firsttime )
> +                ctx->firsttime = false;
> +
> +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
> +
> +            rc =
> ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
> +            if ( rc > 0 ) {
> +                IPRINTF("Next checkpoint\n");
>

I suggest maintaining checkpoint numbers instead. Much more helpful.
Probably even add
a gettimeofday call to print out the time the new checkpoint is started.
Once again, its useful
to be able to control this printout from libxl

+            } else {
> +                ctx->checkpointed = false;
> +            }
> +        }
> +    } while ( ctx->checkpointed );
>
>      rc = write_end_record(ctx);
>      if ( rc )
> @@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd,
> uint32_t dom, uint32_t max_ite
>      ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>      ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
>      ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
> +    ctx.firsttime = true;
>
>      if ( ctx.checkpointed ) {
>          /* This is a checkpointed save, we need these callbacks */
> --
> 1.9.1
>
>

[-- Attachment #1.2: Type: text/html, Size: 8715 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-16 15:22   ` Shriram Rajagopalan
@ 2014-07-16 15:38     ` Andrew Cooper
  2014-07-16 16:02       ` Shriram Rajagopalan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-07-16 15:38 UTC (permalink / raw)
  To: rshriram, Yang Hongyang; +Cc: Ian Jackson, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 8507 bytes --]

On 16/07/14 16:22, Shriram Rajagopalan wrote:
>
>
>
> On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@cn.fujitsu.com
> <mailto:yanghy@cn.fujitsu.com>> wrote:
>
>     implement remus checkpoint in v2 save
>
>     Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com
>     <mailto:yanghy@cn.fujitsu.com>>
>     ---
>      tools/libxc/saverestore/common.h |  1 +
>      tools/libxc/saverestore/save.c   | 88
>     ++++++++++++++++++++++++----------------
>      2 files changed, 55 insertions(+), 34 deletions(-)
>
>     diff --git a/tools/libxc/saverestore/common.h
>     b/tools/libxc/saverestore/common.h
>     index 24ba95b..1dd9f51 100644
>     --- a/tools/libxc/saverestore/common.h
>     +++ b/tools/libxc/saverestore/common.h
>     @@ -153,6 +153,7 @@ struct xc_sr_context
>
>          xc_dominfo_t dominfo;
>          bool checkpointed;
>     +    bool firsttime;
>
>          union
>          {
>     diff --git a/tools/libxc/saverestore/save.c
>     b/tools/libxc/saverestore/save.c
>     index d2fa8a6..98a5c2f 100644
>     --- a/tools/libxc/saverestore/save.c
>     +++ b/tools/libxc/saverestore/save.c
>     @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct
>     xc_sr_context *ctx)
>              goto out;
>          }
>
>     +    if ( ctx->checkpointed && !ctx->firsttime )
>     +        goto lastiter;
>
>
> nit: last_iter
>
> I suggest adding a comment before this code block to explain what has
> happened so
> far above why we are jumping to the last_iter block skipping the rest
> of the stuff below.
>
> I also suggest maintaining some stats structure somewhere (number of
> dirty pages,
> time when checkpoint was initiated, etc.). They could be simply debug
> prints that
> can be enabled on demand.
>
> A better alternative would be to somehow pass this stats structure
> back to libxl,
> such that the user can enable/disable stats printing using the xl
> remus command
> for example.

This is partly my fault.  I so far haven’t got any of the "progress
report" set of logging functionality wired up.  Once that is done, libxl
should be fine, as it can already filter the progress logging from the
regular logging.

>
>
>          /* This juggling is required if logdirty is already on, e.g.
>     VRAM tracking */
>          if ( xc_shadow_control(xch, ctx->domid,
>                                 XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
>     @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct
>     xc_sr_context *ctx)
>                  break;
>          }
>
>     +lastiter:
>          rc = suspend_domain(ctx);
>          if ( rc )
>              goto out;
>
>
> I hope the postsuspend callbacks are called somewhere?
>
>
>     @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx,
>     uint16_t guest_type)
>          if ( rc )
>              goto err;
>
>     -    rc = ctx->save.ops.start_of_stream(ctx);
>     -    if ( rc )
>     -        goto err;
>     +    do {
>     +        rc = ctx->save.ops.start_of_stream(ctx);
>     +        if ( rc )
>     +            goto err;
>
>     -    if ( ctx->save.live )
>     -    {
>     -        DPRINTF("Starting live migrate");
>     -        rc = send_domain_memory_live(ctx);
>     -    }
>     -    else
>     -    {
>     -        DPRINTF("Starting nonlive save");
>     -        rc = send_domain_memory_nonlive(ctx);
>     -    }
>     +        if ( ctx->save.live )
>     +        {
>     +            DPRINTF("Starting live migrate");
>
>
> I suggest using the ctx->firsttime bool var before this printf, so that
> when debug print is enabled under remus, the console is not
> flooded with the same statement that makes no sense past the
> first iteration.

A thought has just crossed my mind.  It might be better to have a
"send_domain_remus()" function which internally deals with the
iterations and looping etc.

e.g.

if ( remus )
    rc = send_domain_remus(ctx);
else if ( ctx->save.live )
    rc = send_domain_live(ctx);
else
    rc = send_domain_memory_nonlive(ctx);

In my upcoming series which fixes the deferred vcpu state problem I have
split some of the common bits out of send_domain_memory_{live,nonline}()
which might be useful.

>
>
>     +            rc = send_domain_memory_live(ctx);
>     +        }
>     +        else
>     +        {
>     +            DPRINTF("Starting nonlive save");
>     +            rc = send_domain_memory_nonlive(ctx);
>     +        }
>
>     -    if ( rc )
>     -        goto err;
>     +        if ( rc )
>     +            goto err;
>
>     -    /* Refresh domain information now it has paused. */
>     -    if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) !=
>     1) ||
>     -         (ctx->dominfo.domid != ctx->domid) )
>     -    {
>     -        PERROR("Unable to refresh domain information");
>     -        rc = -1;
>     -        goto err;
>     -    }
>     -    else if ( (!ctx->dominfo.shutdown ||
>     -               ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
>     -              !ctx->dominfo.paused )
>     -    {
>     -        ERROR("Domain has not been suspended");
>     -        rc = -1;
>     -        goto err;
>     -    }
>     +        /* Refresh domain information now it has paused. */
>     +        if ( (xc_domain_getinfo(xch, ctx->domid, 1,
>     &ctx->dominfo) != 1) ||
>     +             (ctx->dominfo.domid != ctx->domid) )
>     +        {
>     +            PERROR("Unable to refresh domain information");
>     +            rc = -1;
>     +            goto err;
>     +        }
>     +        else if ( (!ctx->dominfo.shutdown ||
>     +                  ctx->dominfo.shutdown_reason !=
>     SHUTDOWN_suspend ) &&
>     +                  !ctx->dominfo.paused )
>     +        {
>     +            ERROR("Domain has not been suspended");
>     +            rc = -1;
>     +            goto err;
>     +        }
>
>
> A silly question: Shouldn't this check for 'suspended status' be
> before the call to
> send_domain_memory_live() [under remus]. I am assuming that the
> send_domain_memory_live() function is the one that sends the dirty
> page data
> out on the wire.

Even for non-live migrates, we have to ensure that the VM has entered
the suspend state.  A PV guest cannot possibly recover from resume if it
didn't voluntarily suspend as it will loose its reference to the start
info page (whos mfn lives in vcpu0.edx for the duration of the migrate
and must be updated on the receiving side).  Any VM which doesn't
sufficiently quiesce its fronends risks ring and memory corruption on
resume.

In this case, the send_domain_memory_XXX functions do take care of
pausing the domain at the appropriate time, but this here is a sanity check.

~Andrew

>
>
>
>     -    rc = ctx->save.ops.end_of_stream(ctx);
>     -    if ( rc )
>     -        goto err;
>     +        rc = ctx->save.ops.end_of_stream(ctx);
>     +        if ( rc )
>     +            goto err;
>     +
>     +        if ( ctx->checkpointed ) {
>     +            if ( ctx->firsttime )
>     +                ctx->firsttime = false;
>     +
>     +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>     +
>     +            rc =
>     ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>     +            if ( rc > 0 ) {
>     +                IPRINTF("Next checkpoint\n");
>
>
> I suggest maintaining checkpoint numbers instead. Much more helpful.
> Probably even add
> a gettimeofday call to print out the time the new checkpoint is
> started. Once again, its useful
> to be able to control this printout from libxl
>
>     +            } else {
>     +                ctx->checkpointed = false;
>     +            }
>     +        }
>     +    } while ( ctx->checkpointed );
>
>          rc = write_end_record(ctx);
>          if ( rc )
>     @@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int
>     io_fd, uint32_t dom, uint32_t max_ite
>          ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>          ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
>          ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
>     +    ctx.firsttime = true;
>
>          if ( ctx.checkpointed ) {
>              /* This is a checkpointed save, we need these callbacks */
>     --
>     1.9.1
>
>


[-- Attachment #1.2: Type: text/html, Size: 15189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-16 15:38     ` Andrew Cooper
@ 2014-07-16 16:02       ` Shriram Rajagopalan
  2014-07-16 16:33         ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Shriram Rajagopalan @ 2014-07-16 16:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Yang Hongyang, Ian Jackson, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 8741 bytes --]

On Wed, Jul 16, 2014 at 11:38 AM, Andrew Cooper <andrew.cooper3@citrix.com>
wrote:

>  On 16/07/14 16:22, Shriram Rajagopalan wrote:
>
>
>
>
> On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@cn.fujitsu.com>
> wrote:
>
>> implement remus checkpoint in v2 save
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>  tools/libxc/saverestore/common.h |  1 +
>>  tools/libxc/saverestore/save.c   | 88
>> ++++++++++++++++++++++++----------------
>>  2 files changed, 55 insertions(+), 34 deletions(-)
>>
>> diff --git a/tools/libxc/saverestore/common.h
>> b/tools/libxc/saverestore/common.h
>> index 24ba95b..1dd9f51 100644
>> --- a/tools/libxc/saverestore/common.h
>> +++ b/tools/libxc/saverestore/common.h
>> @@ -153,6 +153,7 @@ struct xc_sr_context
>>
>>      xc_dominfo_t dominfo;
>>      bool checkpointed;
>> +    bool firsttime;
>>
>>      union
>>      {
>> diff --git a/tools/libxc/saverestore/save.c
>> b/tools/libxc/saverestore/save.c
>> index d2fa8a6..98a5c2f 100644
>> --- a/tools/libxc/saverestore/save.c
>> +++ b/tools/libxc/saverestore/save.c
>> @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct
>> xc_sr_context *ctx)
>>          goto out;
>>      }
>>
>> +    if ( ctx->checkpointed && !ctx->firsttime )
>> +        goto lastiter;
>>
>
>  nit: last_iter
>
>  I suggest adding a comment before this code block to explain what has
> happened so
> far above why we are jumping to the last_iter block skipping the rest of
> the stuff below.
>
>  I also suggest maintaining some stats structure somewhere (number of
> dirty pages,
> time when checkpoint was initiated, etc.). They could be simply debug
> prints that
> can be enabled on demand.
>
>  A better alternative would be to somehow pass this stats structure back
> to libxl,
> such that the user can enable/disable stats printing using the xl remus
> command
> for example.
>
>
> This is partly my fault.  I so far haven’t got any of the "progress
> report" set of logging functionality wired up.  Once that is done, libxl
> should be fine, as it can already filter the progress logging from the
> regular logging.
>
>
>
>
>>      /* This juggling is required if logdirty is already on, e.g. VRAM
>> tracking */
>>      if ( xc_shadow_control(xch, ctx->domid,
>>                             XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
>> @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct
>> xc_sr_context *ctx)
>>              break;
>>      }
>>
>> +lastiter:
>>      rc = suspend_domain(ctx);
>>      if ( rc )
>>          goto out;
>>
>
>  I hope the postsuspend callbacks are called somewhere?
>
>
>> @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t
>> guest_type)
>>      if ( rc )
>>          goto err;
>>
>> -    rc = ctx->save.ops.start_of_stream(ctx);
>> -    if ( rc )
>> -        goto err;
>> +    do {
>> +        rc = ctx->save.ops.start_of_stream(ctx);
>> +        if ( rc )
>> +            goto err;
>>
>> -    if ( ctx->save.live )
>> -    {
>> -        DPRINTF("Starting live migrate");
>> -        rc = send_domain_memory_live(ctx);
>> -    }
>> -    else
>> -    {
>> -        DPRINTF("Starting nonlive save");
>> -        rc = send_domain_memory_nonlive(ctx);
>> -    }
>> +        if ( ctx->save.live )
>> +        {
>> +            DPRINTF("Starting live migrate");
>>
>
>  I suggest using the ctx->firsttime bool var before this printf, so that
> when debug print is enabled under remus, the console is not
> flooded with the same statement that makes no sense past the
> first iteration.
>
>
> A thought has just crossed my mind.  It might be better to have a
> "send_domain_remus()" function which internally deals with the iterations
> and looping etc.
>
> e.g.
>
> if ( remus )
>     rc = send_domain_remus(ctx);
> else if ( ctx->save.live )
>     rc = send_domain_live(ctx);
> else
>     rc = send_domain_memory_nonlive(ctx);
>
> In my upcoming series which fixes the deferred vcpu state problem I have
> split some of the common bits out of send_domain_memory_{live,nonline}()
> which might be useful.
>
>
>
Yes, that seems useful. Although, we should be prepared to deal with some
minimal code duplication.


>
>
>> +            rc = send_domain_memory_live(ctx);
>> +        }
>> +        else
>> +        {
>> +            DPRINTF("Starting nonlive save");
>> +            rc = send_domain_memory_nonlive(ctx);
>> +        }
>>
>> -    if ( rc )
>> -        goto err;
>> +        if ( rc )
>> +            goto err;
>>
>> -    /* Refresh domain information now it has paused. */
>> -    if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
>> -         (ctx->dominfo.domid != ctx->domid) )
>> -    {
>> -        PERROR("Unable to refresh domain information");
>> -        rc = -1;
>> -        goto err;
>> -    }
>> -    else if ( (!ctx->dominfo.shutdown ||
>> -               ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
>> -              !ctx->dominfo.paused )
>> -    {
>> -        ERROR("Domain has not been suspended");
>> -        rc = -1;
>> -        goto err;
>> -    }
>> +        /* Refresh domain information now it has paused. */
>> +        if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1)
>> ||
>> +             (ctx->dominfo.domid != ctx->domid) )
>> +        {
>> +            PERROR("Unable to refresh domain information");
>> +            rc = -1;
>> +            goto err;
>> +        }
>> +        else if ( (!ctx->dominfo.shutdown ||
>> +                  ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
>> +                  !ctx->dominfo.paused )
>> +        {
>> +            ERROR("Domain has not been suspended");
>> +            rc = -1;
>> +            goto err;
>> +        }
>>
>>
>  A silly question: Shouldn't this check for 'suspended status' be before
> the call to
> send_domain_memory_live() [under remus]. I am assuming that the
> send_domain_memory_live() function is the one that sends the dirty page
> data
> out on the wire.
>
>
> Even for non-live migrates, we have to ensure that the VM has entered the
> suspend state.  A PV guest cannot possibly recover from resume if it didn't
> voluntarily suspend as it will loose its reference to the start info page
> (whos mfn lives in vcpu0.edx for the duration of the migrate and must be
> updated on the receiving side).  Any VM which doesn't sufficiently quiesce
> its fronends risks ring and memory corruption on resume.
>
In this case, the send_domain_memory_XXX functions do take care of pausing
> the domain at the appropriate time, but this here is a sanity check.
>
>
True. I am assuming that the send_domain_memory_XXX functions do some
heavy lifting (copying out dirty pages onto the fd, etc).  So, shouldn't
the sanity
check occur right after issuing a suspend op, i.e., in the
send_domain_memory_XXX functions,
before attempting to send the domain memory on the wire?

Assuming that the check is already there in those functions, then this
block
above would be redundant.






>  ~Andrew
>
>
>
>
>
>> -    rc = ctx->save.ops.end_of_stream(ctx);
>> -    if ( rc )
>> -        goto err;
>> +        rc = ctx->save.ops.end_of_stream(ctx);
>> +        if ( rc )
>> +            goto err;
>> +
>> +        if ( ctx->checkpointed ) {
>> +            if ( ctx->firsttime )
>> +                ctx->firsttime = false;
>> +
>> +            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>> +
>> +            rc =
>> ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>> +            if ( rc > 0 ) {
>> +                IPRINTF("Next checkpoint\n");
>>
>
>  I suggest maintaining checkpoint numbers instead. Much more helpful.
> Probably even add
> a gettimeofday call to print out the time the new checkpoint is started.
> Once again, its useful
> to be able to control this printout from libxl
>
>  +            } else {
>> +                ctx->checkpointed = false;
>> +            }
>> +        }
>> +    } while ( ctx->checkpointed );
>>
>>      rc = write_end_record(ctx);
>>      if ( rc )
>> @@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd,
>> uint32_t dom, uint32_t max_ite
>>      ctx.save.live  = !!(flags & XCFLAGS_LIVE);
>>      ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
>>      ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
>> +    ctx.firsttime = true;
>>
>>      if ( ctx.checkpointed ) {
>>          /* This is a checkpointed save, we need these callbacks */
>> --
>> 1.9.1
>>
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 16485 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
  2014-07-16 16:02       ` Shriram Rajagopalan
@ 2014-07-16 16:33         ` Andrew Cooper
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2014-07-16 16:33 UTC (permalink / raw)
  To: rshriram; +Cc: Yang Hongyang, Ian Jackson, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3999 bytes --]

On 16/07/14 17:02, Shriram Rajagopalan wrote:


>  
>
>>      
>>
>>         +            rc = send_domain_memory_live(ctx);
>>         +        }
>>         +        else
>>         +        {
>>         +            DPRINTF("Starting nonlive save");
>>         +            rc = send_domain_memory_nonlive(ctx);
>>         +        }
>>
>>         -    if ( rc )
>>         -        goto err;
>>         +        if ( rc )
>>         +            goto err;
>>
>>         -    /* Refresh domain information now it has paused. */
>>         -    if ( (xc_domain_getinfo(xch, ctx->domid, 1,
>>         &ctx->dominfo) != 1) ||
>>         -         (ctx->dominfo.domid != ctx->domid) )
>>         -    {
>>         -        PERROR("Unable to refresh domain information");
>>         -        rc = -1;
>>         -        goto err;
>>         -    }
>>         -    else if ( (!ctx->dominfo.shutdown ||
>>         -               ctx->dominfo.shutdown_reason !=
>>         SHUTDOWN_suspend ) &&
>>         -              !ctx->dominfo.paused )
>>         -    {
>>         -        ERROR("Domain has not been suspended");
>>         -        rc = -1;
>>         -        goto err;
>>         -    }
>>         +        /* Refresh domain information now it has paused. */
>>         +        if ( (xc_domain_getinfo(xch, ctx->domid, 1,
>>         &ctx->dominfo) != 1) ||
>>         +             (ctx->dominfo.domid != ctx->domid) )
>>         +        {
>>         +            PERROR("Unable to refresh domain information");
>>         +            rc = -1;
>>         +            goto err;
>>         +        }
>>         +        else if ( (!ctx->dominfo.shutdown ||
>>         +                  ctx->dominfo.shutdown_reason !=
>>         SHUTDOWN_suspend ) &&
>>         +                  !ctx->dominfo.paused )
>>         +        {
>>         +            ERROR("Domain has not been suspended");
>>         +            rc = -1;
>>         +            goto err;
>>         +        }
>>
>>
>>     A silly question: Shouldn't this check for 'suspended status' be
>>     before the call to 
>>     send_domain_memory_live() [under remus]. I am assuming that the
>>     send_domain_memory_live() function is the one that sends the
>>     dirty page data
>>     out on the wire.
>
>     Even for non-live migrates, we have to ensure that the VM has
>     entered the suspend state.  A PV guest cannot possibly recover
>     from resume if it didn't voluntarily suspend as it will loose its
>     reference to the start info page (whos mfn lives in vcpu0.edx for
>     the duration of the migrate and must be updated on the receiving
>     side).  Any VM which doesn't sufficiently quiesce its fronends
>     risks ring and memory corruption on resume.
>
>     In this case, the send_domain_memory_XXX functions do take care of
>     pausing the domain at the appropriate time, but this here is a
>     sanity check.
>
>
> True. I am assuming that the send_domain_memory_XXX functions do some
> heavy lifting (copying out dirty pages onto the fd, etc).  So,
> shouldn't the sanity 
> check occur right after issuing a suspend op, i.e., in the
> send_domain_memory_XXX functions,
> before attempting to send the domain memory on the wire?
>
> Assuming that the check is already there in those functions, then this
> block 
> above would be redundant.
>
>

You are correct.  The code is in the way it is because of the order in
which I implemented things when building live migration from first
principles.

suspend_domain() confirming that the domain has indeed suspended is a
recent addition, and does indeed negate the refresh of the domain
information in save().  I will fix that in the upcoming series.

However, I think keeping the sanity check that the domain has actually
been paused is still a good idea.  It is a stated precondition of the
following functions and I know for certain that debugging the tail of
migration with an unpaused VM causes some very subtle bugs on resume.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 9380 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-07-16 16:33 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
2014-07-09  7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
2014-07-09  9:45   ` Andrew Cooper
2014-07-09  9:53     ` Hongyang Yang
2014-07-09  7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
2014-07-09 10:53   ` Andrew Cooper
2014-07-10  3:25     ` Hongyang Yang
2014-07-10  8:49       ` Ian Campbell
2014-07-10  9:24       ` Andrew Cooper
2014-07-16 15:22   ` Shriram Rajagopalan
2014-07-16 15:38     ` Andrew Cooper
2014-07-16 16:02       ` Shriram Rajagopalan
2014-07-16 16:33         ` Andrew Cooper
2014-07-09  7:47 ` [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus Yang Hongyang
2014-07-09 11:16   ` Andrew Cooper
2014-07-09 11:26     ` Andrew Cooper
2014-07-10  3:30       ` Hongyang Yang
2014-07-10  9:25         ` Andrew Cooper
2014-07-10  9:32           ` Hongyang Yang
2014-07-10  9:42             ` Andrew Cooper
2014-07-10  9:47               ` Hongyang Yang
2014-07-09  8:53 ` [RFC PATCH 0/3] Remus: add remus support for migration v2 Ian Campbell
2014-07-09  9:56   ` Hongyang Yang
2014-07-09  9:42 ` Andrew Cooper
2014-07-09 10:06   ` Hongyang Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.