All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2014-07-18 11:38 Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 01/25] copy the correct page to memory Wen Congyang
                   ` (27 more replies)
  0 siblings, 28 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Virtual machine (VM) replication is a well known technique for providing
application-agnostic software-implemented hardware fault tolerance -
"non-stop service". Currently, remus provides this function, but it buffers
all output packets, and the latency is unacceptable.

In xen summit 2012, We introduce a new VM replication solution: colo
(COarse-grain LOck-stepping virtual machine). The presentation is in
the following URL:
http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service

Here is the summary of the solution:
>From the client's point of view, as long as the client observes identical
responses from the primary and secondary VMs, according to the service
semantics, then the secondary vm is a valid replica of the primary
vm, and can successfully take over when a hardware failure of the
primary vm is detected.

This patchset is RFC, and implements the frame of colo:
1. Both primary vm and secondary vm are running
2. do checkoint

This patchset is based on remus-v15, and use migration v1. Only supports hvm
guest now.

TODO list:
1. rebase to remus-v17 or newer
2. support migration v2
3. nic/disk replication
4. support pvm

Patch 1-3: bugfix
Patch 4-6: temporarily update remus to reuse remus device codes
Patch 7-14: update some APIs which will be used by colo
Patch 15-22: colo related codes
Patch 23: Hack patch, just for test
Patch 24-25: bugfix. We find this bug before rebasing colo to newest xen.
          But we don't trigger this bug now.
Patch 26: A patch for qemu-xen

Hong Tao (1):
  copy the correct page to memory

Wen Congyang (24):
  csum the correct page
  don't zero out ioreq page
  don't touch remus in remus_device
  rename remus device to checkpoint device
  adjust the indentation
  Refactor domain_suspend_callback_common()
  Update libxl__domain_resume() for colo
  Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
  Introduce a new internal API libxl__domain_unpause()
  Update libxl__domain_unpause() to support qemu-xen
  support to resume uncooperative HVM guests
  update datecopier to support sending data only
  introduce a new API to aync read data from fd
  Update libxl_save_msgs_gen.pl to support return data from xl to xc
  Allow slave sends data to master
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/get_dirty_pfn/resume/checkpoint code
  xc_domain_save: flush cache before calling callbacks->postcopy() in
    colo mode
  COLO: xc related codes
  send store mfn and console mfn to xl before resuming secondary vm
  implement the cmdline for COLO
  HACK: do checkpoint per 20ms
  fix vm entry fail
  sync mmu before resuming secondary vm

 docs/man/xl.pod.1                                  |   9 +-
 tools/libxc/xc_domain.c                            |   9 +
 tools/libxc/xc_domain_restore.c                    |  74 +-
 tools/libxc/xc_domain_save.c                       |  66 +-
 tools/libxc/xc_resume.c                            |  20 +-
 tools/libxc/xenctrl.h                              |   2 +
 tools/libxc/xenguest.h                             |  40 +
 tools/libxl/Makefile                               |   3 +-
 tools/libxl/libxl.c                                | 102 ++-
 tools/libxl/libxl.h                                |   3 +-
 tools/libxl/libxl_aoutils.c                        |  81 +-
 ...xl_remus_device.c => libxl_checkpoint_device.c} | 266 ++++---
 tools/libxl/libxl_colo.h                           |  48 ++
 tools/libxl/libxl_colo_restore.c                   | 882 +++++++++++++++++++++
 tools/libxl/libxl_colo_save.c                      | 602 ++++++++++++++
 tools/libxl/libxl_create.c                         | 131 ++-
 tools/libxl/libxl_dom.c                            | 424 ++++++----
 tools/libxl/libxl_internal.h                       | 262 ++++--
 tools/libxl/libxl_netbuffer.c                      |  85 +-
 tools/libxl/libxl_nonetbuffer.c                    |  14 +-
 tools/libxl/libxl_qmp.c                            |  10 +
 tools/libxl/libxl_remus_disk_drbd.c                |  54 +-
 tools/libxl/libxl_save_callout.c                   |  37 +-
 tools/libxl/libxl_save_helper.c                    |  17 +
 tools/libxl/libxl_save_msgs_gen.pl                 |  74 +-
 tools/libxl/libxl_types.idl                        |  12 +-
 tools/libxl/xl_cmdimpl.c                           |  54 +-
 tools/libxl/xl_cmdtable.c                          |   3 +-
 xen/arch/x86/domctl.c                              |  15 +
 xen/arch/x86/hvm/save.c                            |   6 +
 xen/arch/x86/hvm/vmx/vmcs.c                        |   8 +
 xen/arch/x86/hvm/vmx/vmx.c                         |   8 +
 xen/include/asm-x86/hvm/hvm.h                      |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h                 |   1 +
 xen/include/public/domctl.h                        |   1 +
 xen/include/xen/hvm/save.h                         |   2 +
 36 files changed, 2895 insertions(+), 531 deletions(-)
 rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (47%)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_restore.c
 create mode 100644 tools/libxl/libxl_colo_save.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC Patch 01/25] copy the correct page to memory
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 02/25] csum the correct page Wen Congyang
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Hong Tao, Yang Hongyang, Lai Jiangshan

From: Hong Tao <bobby.hong@huawei.com>

apply_batch() only handles MAX_BATCH_SIZE pages at one time. If
there is some bogus/unmapped/allocate-only/broken page, we will
skip it. So when we call apply_batch() again, the first page's
index is curbatch - invalid_pages. invalid_pages stores the number
of bogus/unmapped/allocate-only/broken pages we have found.

In many cases, invalid_pages is 0, so we don't catch this error.

Signed-off-by: Hong Tao <bobby.hong@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_restore.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 071ab6a..f7ca4ad 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1106,7 +1106,7 @@ static int pagebuf_get(xc_interface *xch, struct restore_ctx *ctx,
 static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
                        xen_pfn_t* region_mfn, unsigned long* pfn_type, int pae_extended_cr3,
                        struct xc_mmu* mmu,
-                       pagebuf_t* pagebuf, int curbatch)
+                       pagebuf_t* pagebuf, int curbatch, int *invalid_pages)
 {
     int i, j, curpage, nr_mfns;
     int k, scount;
@@ -1121,6 +1121,12 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
     struct domain_info_context *dinfo = &ctx->dinfo;
     int* pfn_err = NULL;
     int rc = -1;
+    int local_invalid_pages = 0;
+    /* We have handled curbatch pages before this batch, and there are
+     * *invalid_pages pages that are not in pagebuf->pages. So the first
+     * page for this page is (curbatch - *invalid_pages) page.
+     */
+    int first_page = curbatch - *invalid_pages;
 
     unsigned long mfn, pfn, pagetype;
 
@@ -1293,10 +1299,13 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
         pfn      = pagebuf->pfn_types[i + curbatch] & ~XEN_DOMCTL_PFINFO_LTAB_MASK;
         pagetype = pagebuf->pfn_types[i + curbatch] &  XEN_DOMCTL_PFINFO_LTAB_MASK;
 
-        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB 
+        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
              || pagetype == XEN_DOMCTL_PFINFO_XALLOC)
+        {
+            local_invalid_pages++;
             /* a bogus/unmapped/allocate-only page: skip it */
             continue;
+        }
 
         if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
         {
@@ -1306,6 +1315,8 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
                       "dom=%d, pfn=%lx\n", dom, pfn);
                 goto err_mapped;
             }
+
+            local_invalid_pages++;
             continue;
         }
 
@@ -1344,7 +1355,7 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
             }
         }
         else
-            memcpy(page, pagebuf->pages + (curpage + curbatch) * PAGE_SIZE,
+            memcpy(page, pagebuf->pages + (first_page + curpage) * PAGE_SIZE,
                    PAGE_SIZE);
 
         pagetype &= XEN_DOMCTL_PFINFO_LTABTYPE_MASK;
@@ -1418,6 +1429,7 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
     } /* end of 'batch' for loop */
 
     rc = nraces;
+    *invalid_pages += local_invalid_pages;
 
   err_mapped:
     munmap(region_base, j*PAGE_SIZE);
@@ -1621,7 +1633,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
  loadpages:
     for ( ; ; )
     {
-        int j, curbatch;
+        int j, curbatch, invalid_pages;
 
         xc_report_progress_step(xch, n, dinfo->p2m_size);
 
@@ -1665,11 +1677,13 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
 
         /* break pagebuf into batches */
         curbatch = 0;
+        invalid_pages = 0;
         while ( curbatch < j ) {
             int brc;
 
             brc = apply_batch(xch, dom, ctx, region_mfn, pfn_type,
-                              pae_extended_cr3, mmu, &pagebuf, curbatch);
+                              pae_extended_cr3, mmu, &pagebuf, curbatch,
+                              &invalid_pages);
             if ( brc < 0 )
                 goto out;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 02/25] csum the correct page
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 01/25] copy the correct page to memory Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 03/25] don't zero out ioreq page Wen Congyang
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

In verify mode, we map the guest memory, and the guest page is
region_base + i * PAGE_SIZE. So we should csum page (region_base
+ i * PAGE_SIZE), not (region_base + (i+curbatch) * PAGE_SIZE)

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_restore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index f7ca4ad..89b84fc 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1405,7 +1405,7 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx,
 
                 DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
                         "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
-                        csum_page(region_base + (i + curbatch)*PAGE_SIZE),
+                        csum_page(region_base + i * PAGE_SIZE),
                         csum_page(buf));
 
                 for ( v = 0; v < 4; v++ )
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 03/25] don't zero out ioreq page
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 01/25] copy the correct page to memory Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 02/25] csum the correct page Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 04/25] don't touch remus in remus_device Wen Congyang
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Paul Durrant, Yang Hongyang, Lai Jiangshan

ioreq page may contain some pending I/O requests, and we need to
handle the pending I/O req after migration.

TODO:
1. update qemu to handle the pending I/O req

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
 tools/libxc/xc_domain_restore.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 89b84fc..32a3e72 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -2301,9 +2301,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     }
 
     /* These comms pages need to be zeroed at the start of day */
-    if ( xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[0]) ||
-         xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[1]) ||
-         xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[2]) )
+    if ( xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[2]) )
     {
         PERROR("error zeroing magic pages");
         goto out;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 04/25] don't touch remus in remus_device
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (2 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 03/25] don't zero out ioreq page Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 05/25] rename remus device to checkpoint device Wen Congyang
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

remus device is an abstract layer to do checkpoint.
COLO can also use it to do checkpoint. But there is
still some codes in remus device which touch remus:
1. netbufscript is for remus, move it to domain_suspend_state
2. diskbuf: not a suitable name for checkpoint. Use
   enabled_device_kinds to instead of it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c              |  9 ++++++---
 tools/libxl/libxl_internal.h     | 19 ++++++++++++-------
 tools/libxl/libxl_netbuffer.c    |  7 +++++--
 tools/libxl/libxl_remus_device.c |  4 ++--
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index cd038d1..94087ff 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -832,10 +832,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         }
 
         if (info->netbufscript) {
-            rds->netbufscript =
+            dss->netbufscript =
                 libxl__strdup(gc, info->netbufscript);
         } else {
-            rds->netbufscript =
+            dss->netbufscript =
                 GCSPRINTF("%s/remus-netbuf-setup",
                 libxl__xen_script_dir_path());
         }
@@ -844,9 +844,12 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     rds->ao = ao;
     rds->egc = egc;
     rds->domid = domid;
-    rds->diskbuf = info->diskbuf;
     rds->callback = libxl__remus_setup_done;
     rds->ops = remus_ops;
+    if (info->diskbuf)
+        rds->enabled_device_kinds |= LIBXL__REMUS_DEVICE_DISK;
+    if (info->netbuf)
+        rds->enabled_device_kinds |= LIBXL__REMUS_DEVICE_NIC;
 
     /* Point of no return */
     libxl__remus_devices_setup(egc, rds);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 441a51f..e327604 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2505,8 +2505,8 @@ typedef struct libxl__save_helper_state {
  */
 
 typedef enum libxl__remus_device_kind {
-    LIBXL__REMUS_DEVICE_NIC,
-    LIBXL__REMUS_DEVICE_DISK,
+    LIBXL__REMUS_DEVICE_NIC = (1 << 0),
+    LIBXL__REMUS_DEVICE_DISK= (1 << 1),
 } libxl__remus_device_kind;
 
 typedef struct libxl__remus_device libxl__remus_device;
@@ -2581,8 +2581,7 @@ struct libxl__remus_device_state {
     libxl__remus_callback *callback;
     /* the last ops must be NULL */
     const libxl__remus_device_subkind_ops **ops;
-    const char *netbufscript;
-    bool diskbuf;
+    int enabled_device_kinds;
 
     /* private */
     /* devices that have been set up */
@@ -2692,9 +2691,15 @@ struct libxl__domain_suspend_state {
     libxl__ev_xswatch guest_watch;
     libxl__ev_time guest_timeout;
     const char *dm_savefile;
-    libxl__remus_device_state rds;
-    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
-    int interval; /* checkpoint interval (for Remus) */
+    /* for Remus */
+    struct {
+        libxl__remus_device_state rds;
+        const char *netbufscript;
+        /* used for Remus checkpoint */
+        libxl__ev_time checkpoint_timeout;
+        /* checkpoint interval */
+        int interval;
+    };
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     void (*callback_common_done)(libxl__egc*,
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 39025f4..2387563 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -28,6 +28,7 @@
 struct libxl__remus_netbuf_state {
     libxl__ao *ao;
     uint32_t domid;
+    const char *netbufscript;
 
     struct nl_sock *nlsock;
     struct nl_cache *qdisc_cache;
@@ -51,6 +52,7 @@ static int nic_init(libxl__remus_device_state *rds)
 {
     int rc, ret;
     libxl__remus_netbuf_state *ns;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     STATE_AO_GC(rds->ao);
 
@@ -83,6 +85,7 @@ static int nic_init(libxl__remus_device_state *rds)
 
     ns->ao = rds->ao;
     ns->domid = rds->domid;
+    ns->netbufscript = dss->netbufscript;
 
     rc = 0;
 
@@ -258,7 +261,7 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
 
     /* Convenience aliases */
     libxl__remus_netbuf_state *ns = CTX->rns;
-    char *const script = libxl__strdup(gc, dev->rds->netbufscript);
+    char *const script = libxl__strdup(gc, ns->netbufscript);
     const uint32_t domid = ns->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
@@ -381,7 +384,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            dev->rds->netbufscript, vif, hotplug_error);
+            CTX->rns->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index c9df7f0..2acf3bb 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -86,10 +86,10 @@ void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_device_state *rds)
     rds->num_nics = 0;
     rds->num_disks = 0;
 
-    if (rds->netbufscript)
+    if (rds->enabled_device_kinds & LIBXL__REMUS_DEVICE_NIC)
         rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
 
-    if (rds->diskbuf)
+    if (rds->enabled_device_kinds & LIBXL__REMUS_DEVICE_NIC)
         rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
 
     if (rds->num_nics == 0 && rds->num_disks == 0)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 05/25] rename remus device to checkpoint device
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (3 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 04/25] don't touch remus in remus_device Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 06/25] adjust the indentation Wen Congyang
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

This patch is auto generated by the following commands:
1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c
2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' tools/libxl/Makefile
3. perl -pi -e 's/libxl__remus_device/libxl__checkpoint_device/g' tools/libxl/*.[ch]
4. perl -pi -e 's/remus_device_checkpoint_api/checkpoint_device_api/g' tools/libxl/*.[ch]
5. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch]
6. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] tools/libxl/*.idl
7. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] tools/libxl/*.idl
8. perl -pi -e 's/\bremus\b/checkpoint/g' tools/libxl/libxl_checkpoint_device.[ch]

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/Makefile                               |   2 +-
 tools/libxl/libxl.c                                |  36 +--
 ...xl_remus_device.c => libxl_checkpoint_device.c} | 258 ++++++++++-----------
 tools/libxl/libxl_dom.c                            |  46 ++--
 tools/libxl/libxl_internal.h                       | 102 ++++----
 tools/libxl/libxl_netbuffer.c                      |  80 +++----
 tools/libxl/libxl_nonetbuffer.c                    |  14 +-
 tools/libxl/libxl_remus_disk_drbd.c                |  54 ++---
 tools/libxl/libxl_types.idl                        |   4 +-
 9 files changed, 298 insertions(+), 298 deletions(-)
 rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (49%)

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index ba10ab7..a33497d 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 94087ff..fc60bb1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -782,13 +782,13 @@ out:
 }
 
 static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__remus_device_state *rds, int rc);
+                                    libxl__checkpoint_device_state *cds, int rc);
 static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__remus_device_state *rds, int rc);
+                                      libxl__checkpoint_device_state *cds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
-static const libxl__remus_device_subkind_ops *remus_ops[] = {
+static const libxl__checkpoint_device_subkind_ops *remus_ops[] = {
     &remus_device_nic,
     &remus_device_drbd_disk,
     NULL,
@@ -823,7 +823,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Convenience aliases */
-    libxl__remus_device_state *const rds = &dss->rds;
+    libxl__checkpoint_device_state *const cds = &dss->cds;
 
     if (info->netbuf) {
         if (!libxl__netbuffer_enabled(gc)) {
@@ -841,18 +841,18 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         }
     }
 
-    rds->ao = ao;
-    rds->egc = egc;
-    rds->domid = domid;
-    rds->callback = libxl__remus_setup_done;
-    rds->ops = remus_ops;
+    cds->ao = ao;
+    cds->egc = egc;
+    cds->domid = domid;
+    cds->callback = libxl__remus_setup_done;
+    cds->ops = remus_ops;
     if (info->diskbuf)
-        rds->enabled_device_kinds |= LIBXL__REMUS_DEVICE_DISK;
+        cds->enabled_device_kinds |= LIBXL__CHECKPOINT_DEVICE_DISK;
     if (info->netbuf)
-        rds->enabled_device_kinds |= LIBXL__REMUS_DEVICE_NIC;
+        cds->enabled_device_kinds |= LIBXL__CHECKPOINT_DEVICE_NIC;
 
     /* Point of no return */
-    libxl__remus_devices_setup(egc, rds);
+    libxl__checkpoint_devices_setup(egc, cds);
     return AO_INPROGRESS;
 
  out:
@@ -860,9 +860,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 }
 
 static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__remus_device_state *rds, int rc)
+                                    libxl__checkpoint_device_state *cds, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -872,14 +872,14 @@ static void libxl__remus_setup_done(libxl__egc *egc,
 
     LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
         dss->domid, rc);
-    rds->callback = libxl__remus_setup_failed;
-    libxl__remus_devices_teardown(egc, rds);
+    cds->callback = libxl__remus_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, cds);
 }
 
 static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__remus_device_state *rds, int rc)
+                                      libxl__checkpoint_device_state *cds, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_checkpoint_device.c
similarity index 49%
rename from tools/libxl/libxl_remus_device.c
rename to tools/libxl/libxl_checkpoint_device.c
index 2acf3bb..87ee412 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -19,13 +19,13 @@
 
 /*----- helper functions -----*/
 
-static int init_device_subkind(libxl__remus_device_state *rds)
+static int init_device_subkind(libxl__checkpoint_device_state *cds)
 {
     int rc;
-    const libxl__remus_device_subkind_ops **ops;
+    const libxl__checkpoint_device_subkind_ops **ops;
 
-    for (ops = rds->ops; *ops; ops++) {
-        rc = (*ops)->init(rds);
+    for (ops = cds->ops; *ops; ops++) {
+        rc = (*ops)->init(cds);
         if (rc) {
             goto out;
         }
@@ -37,17 +37,17 @@ out:
 
 }
 
-static void destroy_device_subkind(libxl__remus_device_state *rds)
+static void destroy_device_subkind(libxl__checkpoint_device_state *cds)
 {
-    const libxl__remus_device_subkind_ops **ops;
+    const libxl__checkpoint_device_subkind_ops **ops;
 
-    for (ops = rds->ops; *ops; ops++)
-        (*ops)->destroy(rds);
+    for (ops = cds->ops; *ops; ops++)
+        (*ops)->destroy(cds);
 }
 
-static bool all_devices_handled(libxl__remus_device_state *rds)
+static bool all_devices_handled(libxl__checkpoint_device_state *cds)
 {
-    return rds->num_devices == (rds->num_nics + rds->num_disks);
+    return cds->num_devices == (cds->num_nics + cds->num_disks);
 }
 
 /*----- setup() and teardown() -----*/
@@ -55,85 +55,85 @@ static bool all_devices_handled(libxl__remus_device_state *rds)
 /* callbacks */
 
 static void device_match_cb(libxl__egc *egc,
-                            libxl__remus_device *dev,
+                            libxl__checkpoint_device *dev,
                             int rc);
 static void device_setup_cb(libxl__egc *egc,
-                            libxl__remus_device *dev,
+                            libxl__checkpoint_device *dev,
                             int rc);
 static void device_teardown_cb(libxl__egc *egc,
-                               libxl__remus_device *dev,
+                               libxl__checkpoint_device *dev,
                                int rc);
 
-/* remus device setup and teardown */
+/* checkpoint device setup and teardown */
 
-static void libxl__remus_device_init(libxl__egc *egc,
-                                     libxl__remus_device_state *rds,
-                                     libxl__remus_device_kind kind,
+static void libxl__checkpoint_device_init(libxl__egc *egc,
+                                     libxl__checkpoint_device_state *cds,
+                                     libxl__checkpoint_device_kind kind,
                                      void *libxl_dev);
-void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_device_state *rds)
+void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_device_state *cds)
 {
     int i;
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    if (!rds->ops[0])
+    if (!cds->ops[0])
         goto out;
 
-    rds->saved_rc = init_device_subkind(rds);
-    if (rds->saved_rc)
+    cds->saved_rc = init_device_subkind(cds);
+    if (cds->saved_rc)
         goto out;
 
-    rds->num_devices = 0;
-    rds->num_nics = 0;
-    rds->num_disks = 0;
+    cds->num_devices = 0;
+    cds->num_nics = 0;
+    cds->num_disks = 0;
 
-    if (rds->enabled_device_kinds & LIBXL__REMUS_DEVICE_NIC)
-        rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
+    if (cds->enabled_device_kinds & LIBXL__CHECKPOINT_DEVICE_NIC)
+        cds->nics = libxl_device_nic_list(CTX, cds->domid, &cds->num_nics);
 
-    if (rds->enabled_device_kinds & LIBXL__REMUS_DEVICE_NIC)
-        rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
+    if (cds->enabled_device_kinds & LIBXL__CHECKPOINT_DEVICE_NIC)
+        cds->disks = libxl_device_disk_list(CTX, cds->domid, &cds->num_disks);
 
-    if (rds->num_nics == 0 && rds->num_disks == 0)
+    if (cds->num_nics == 0 && cds->num_disks == 0)
         goto out;
 
-    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
+    GCNEW_ARRAY(cds->dev, cds->num_nics + cds->num_disks);
 
-    for (i = 0; i < rds->num_nics; i++) {
-        libxl__remus_device_init(egc, rds,
-                                 LIBXL__REMUS_DEVICE_NIC, &rds->nics[i]);
+    for (i = 0; i < cds->num_nics; i++) {
+        libxl__checkpoint_device_init(egc, cds,
+                                 LIBXL__CHECKPOINT_DEVICE_NIC, &cds->nics[i]);
     }
 
-    for (i = 0; i < rds->num_disks; i++) {
-        libxl__remus_device_init(egc, rds,
-                                 LIBXL__REMUS_DEVICE_DISK, &rds->disks[i]);
+    for (i = 0; i < cds->num_disks; i++) {
+        libxl__checkpoint_device_init(egc, cds,
+                                 LIBXL__CHECKPOINT_DEVICE_DISK, &cds->disks[i]);
     }
 
     return;
 
 out:
-    rds->callback(egc, rds, rds->saved_rc);
+    cds->callback(egc, cds, cds->saved_rc);
     return;
 }
 
-static void libxl__remus_device_init(libxl__egc *egc,
-                                     libxl__remus_device_state *rds,
-                                     libxl__remus_device_kind kind,
+static void libxl__checkpoint_device_init(libxl__egc *egc,
+                                     libxl__checkpoint_device_state *cds,
+                                     libxl__checkpoint_device_kind kind,
                                      void *libxl_dev)
 {
-    libxl__remus_device *dev = NULL;
+    libxl__checkpoint_device *dev = NULL;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
     GCNEW(dev);
     dev->backend_dev = libxl_dev;
     dev->kind = kind;
-    dev->rds = rds;
+    dev->cds = cds;
 
     libxl__async_exec_init(&dev->aes);
     libxl__ev_child_init(&dev->child);
 
     /* match the ops begin */
     dev->ops_index = 0;
-    dev->ops = rds->ops[dev->ops_index];
-    for (; dev->ops; dev->ops = rds->ops[++dev->ops_index]) {
+    dev->ops = cds->ops[dev->ops_index];
+    for (; dev->ops; dev->ops = cds->ops[++dev->ops_index]) {
         if (dev->ops->kind == dev->kind) {
             if (dev->ops->match) {
                 dev->callback = device_match_cb;
@@ -152,41 +152,41 @@ static void libxl__remus_device_init(libxl__egc *egc,
     }
 
     if (!dev->ops) {
-        rds->num_devices++;
-        rds->saved_rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
-        if (all_devices_handled(rds))
-            rds->callback(egc, rds, rds->saved_rc);
+        cds->num_devices++;
+        cds->saved_rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
+        if (all_devices_handled(cds))
+            cds->callback(egc, cds, cds->saved_rc);
     }
 }
 
 static void device_match_cb(libxl__egc *egc,
-                            libxl__remus_device *dev,
+                            libxl__checkpoint_device *dev,
                             int rc)
 {
-    libxl__remus_device_state *const rds = dev->rds;
+    libxl__checkpoint_device_state *const cds = dev->cds;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    if (rds->saved_rc) {
+    if (cds->saved_rc) {
         /* there's already an error happened, we do not need to continue */
-        rds->num_devices++;
-        if (all_devices_handled(rds))
-            rds->callback(egc, rds, rds->saved_rc);
+        cds->num_devices++;
+        if (all_devices_handled(cds))
+            cds->callback(egc, cds, cds->saved_rc);
         return;
     }
 
     if (rc) {
         /* the ops does not match, try next ops */
-        dev->ops = rds->ops[++dev->ops_index];
-        if (!dev->ops || rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
+        dev->ops = cds->ops[++dev->ops_index];
+        if (!dev->ops || rc != ERROR_CHECKPOINT_DEVOPS_NOT_MATCH) {
             /* the device can not be matched */
-            rds->num_devices++;
-            rds->saved_rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
-            if (all_devices_handled(rds))
-                rds->callback(egc, rds, rds->saved_rc);
+            cds->num_devices++;
+            cds->saved_rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
+            if (all_devices_handled(cds))
+                cds->callback(egc, cds, cds->saved_rc);
             return;
         }
-        for ( ; dev->ops; dev->ops = rds->ops[++dev->ops_index]) {
+        for ( ; dev->ops; dev->ops = cds->ops[++dev->ops_index]) {
             if (dev->ops->kind == dev->kind) {
                 /*
                  * we have entered match process, that means this *kind* of
@@ -205,15 +205,15 @@ static void device_match_cb(libxl__egc *egc,
 }
 
 static void device_setup_cb(libxl__egc *egc,
-                            libxl__remus_device *dev,
+                            libxl__checkpoint_device *dev,
                             int rc)
 {
     /* Convenience aliases */
-    libxl__remus_device_state *const rds = dev->rds;
+    libxl__checkpoint_device_state *const cds = dev->cds;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->num_devices++;
+    cds->num_devices++;
     /*
      * the netbuf script was designed as below:
      * 1. when setup failed, the script won't teardown the device itself.
@@ -222,36 +222,36 @@ static void device_setup_cb(libxl__egc *egc,
      * we add devices that have been set up to the array no matter
      * the setup process succeed or failed because we need to ensure
      * the device been teardown while setup failed. If any of the
-     * device setup failed, we will quit remus, but before we exit,
+     * device setup failed, we will quit checkpoint, but before we exit,
      * we will teardown the devices that have been added to **dev
      */
-    rds->dev[rds->num_set_up++] = dev;
+    cds->dev[cds->num_set_up++] = dev;
     /* we preserve the first error that happened */
-    if (rc && !rds->saved_rc)
-        rds->saved_rc = rc;
+    if (rc && !cds->saved_rc)
+        cds->saved_rc = rc;
 
-    if (all_devices_handled(rds))
-        rds->callback(egc, rds, rds->saved_rc);
+    if (all_devices_handled(cds))
+        cds->callback(egc, cds, cds->saved_rc);
 }
 
-void libxl__remus_devices_teardown(libxl__egc *egc, libxl__remus_device_state *rds)
+void libxl__checkpoint_devices_teardown(libxl__egc *egc, libxl__checkpoint_device_state *cds)
 {
     int i, num_set_up;
-    libxl__remus_device *dev;
+    libxl__checkpoint_device *dev;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->saved_rc = 0;
+    cds->saved_rc = 0;
 
-    if (rds->num_set_up == 0) {
-        destroy_device_subkind(rds);
+    if (cds->num_set_up == 0) {
+        destroy_device_subkind(cds);
         goto out;
     }
 
-    /* we will decrease rds->num_set_up in the teardown callback */
-    num_set_up = rds->num_set_up;
+    /* we will decrease cds->num_set_up in the teardown callback */
+    num_set_up = cds->num_set_up;
     for (i = 0; i < num_set_up; i++) {
-        dev = rds->dev[i];
+        dev = cds->dev[i];
         dev->callback = device_teardown_cb;
         dev->ops->teardown(dev);
     }
@@ -259,43 +259,43 @@ void libxl__remus_devices_teardown(libxl__egc *egc, libxl__remus_device_state *r
     return;
 
 out:
-    rds->callback(egc, rds, rds->saved_rc);
+    cds->callback(egc, cds, cds->saved_rc);
     return;
 }
 
 static void device_teardown_cb(libxl__egc *egc,
-                               libxl__remus_device *dev,
+                               libxl__checkpoint_device *dev,
                                int rc)
 {
     int i;
-    libxl__remus_device_state *const rds = dev->rds;
+    libxl__checkpoint_device_state *const cds = dev->cds;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* we preserve the first error that happened */
-    if (rc && !rds->saved_rc)
-        rds->saved_rc = rc;
+    if (rc && !cds->saved_rc)
+        cds->saved_rc = rc;
 
     /* ignore teardown errors to teardown as many devs as possible*/
-    rds->num_set_up--;
+    cds->num_set_up--;
 
-    if (rds->num_set_up == 0) {
+    if (cds->num_set_up == 0) {
         /* clean nic */
-        for (i = 0; i < rds->num_nics; i++)
-            libxl_device_nic_dispose(&rds->nics[i]);
-        free(rds->nics);
-        rds->nics = NULL;
-        rds->num_nics = 0;
+        for (i = 0; i < cds->num_nics; i++)
+            libxl_device_nic_dispose(&cds->nics[i]);
+        free(cds->nics);
+        cds->nics = NULL;
+        cds->num_nics = 0;
 
         /* clean disk */
-        for (i = 0; i < rds->num_disks; i++)
-            libxl_device_disk_dispose(&rds->disks[i]);
-        free(rds->disks);
-        rds->disks = NULL;
-        rds->num_disks = 0;
-
-        destroy_device_subkind(rds);
-        rds->callback(egc, rds, rds->saved_rc);
+        for (i = 0; i < cds->num_disks; i++)
+            libxl_device_disk_dispose(&cds->disks[i]);
+        free(cds->disks);
+        cds->disks = NULL;
+        cds->num_disks = 0;
+
+        destroy_device_subkind(cds);
+        cds->callback(egc, cds, cds->saved_rc);
     }
 }
 
@@ -304,64 +304,64 @@ static void device_teardown_cb(libxl__egc *egc,
 /* callbacks */
 
 static void device_checkpoint_cb(libxl__egc *egc,
-                                 libxl__remus_device *dev,
+                                 libxl__checkpoint_device *dev,
                                  int rc);
 
 /* API implementations */
 
-#define define_remus_device_checkpoint_api(api)                             \
-void libxl__remus_devices_##api(libxl__egc *egc,                            \
-                                libxl__remus_device_state *rds)             \
+#define define_checkpoint_device_api(api)                             \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                            \
+                                libxl__checkpoint_device_state *cds)             \
 {                                                                           \
     int i;                                                                  \
-    libxl__remus_device *dev;                                               \
+    libxl__checkpoint_device *dev;                                               \
                                                                             \
-    STATE_AO_GC(rds->ao);                                                   \
+    STATE_AO_GC(cds->ao);                                                   \
                                                                             \
-    rds->num_devices = 0;                                                   \
-    rds->saved_rc = 0;                                                      \
+    cds->num_devices = 0;                                                   \
+    cds->saved_rc = 0;                                                      \
                                                                             \
-    if (rds->num_set_up == 0)                                               \
+    if (cds->num_set_up == 0)                                               \
         goto out;                                                           \
                                                                             \
-    for (i = 0; i < rds->num_set_up; i++) {                                 \
-        dev = rds->dev[i];                                                  \
+    for (i = 0; i < cds->num_set_up; i++) {                                 \
+        dev = cds->dev[i];                                                  \
         dev->callback = device_checkpoint_cb;                               \
         if (dev->ops->api) {                                                \
             dev->ops->api(dev);                                             \
         } else {                                                            \
-            rds->num_devices++;                                             \
-            if (rds->num_devices == rds->num_set_up)                        \
-                rds->callback(egc, rds, rds->saved_rc);                     \
+            cds->num_devices++;                                             \
+            if (cds->num_devices == cds->num_set_up)                        \
+                cds->callback(egc, cds, cds->saved_rc);                     \
         }                                                                   \
     }                                                                       \
                                                                             \
     return;                                                                 \
                                                                             \
 out:                                                                        \
-    rds->callback(egc, rds, rds->saved_rc);                                 \
+    cds->callback(egc, cds, cds->saved_rc);                                 \
 }
 
-define_remus_device_checkpoint_api(postsuspend);
+define_checkpoint_device_api(postsuspend);
 
-define_remus_device_checkpoint_api(preresume);
+define_checkpoint_device_api(preresume);
 
-define_remus_device_checkpoint_api(commit);
+define_checkpoint_device_api(commit);
 
 static void device_checkpoint_cb(libxl__egc *egc,
-                                 libxl__remus_device *dev,
+                                 libxl__checkpoint_device *dev,
                                  int rc)
 {
     /* Convenience aliases */
-    libxl__remus_device_state *const rds = dev->rds;
+    libxl__checkpoint_device_state *const cds = dev->cds;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->num_devices++;
+    cds->num_devices++;
 
     if (rc)
-        rds->saved_rc = ERROR_FAIL;
+        cds->saved_rc = ERROR_FAIL;
 
-    if (rds->num_devices == rds->num_set_up)
-        rds->callback(egc, rds, rds->saved_rc);
+    if (cds->num_devices == cds->num_set_up)
+        cds->callback(egc, cds, cds->saved_rc);
 }
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 6eabd4c..cf79b74 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1462,10 +1462,10 @@ static void domain_suspend_callback_common_done(libxl__egc *egc,
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok);
 static void remus_device_postsuspend_cb(libxl__egc *egc,
-                                        libxl__remus_device_state *rds,
+                                        libxl__checkpoint_device_state *cds,
                                         int rc);
 static void remus_device_preresume_cb(libxl__egc *egc,
-                                      libxl__remus_device_state *rds,
+                                      libxl__checkpoint_device_state *cds,
                                       int rc);
 
 static void libxl__remus_domain_suspend_callback(void *data)
@@ -1484,9 +1484,9 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     if (!ok)
         goto out;
 
-    libxl__remus_device_state *const rds = &dss->rds;
-    rds->callback = remus_device_postsuspend_cb;
-    libxl__remus_devices_postsuspend(egc, rds);
+    libxl__checkpoint_device_state *const cds = &dss->cds;
+    cds->callback = remus_device_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
     return;
 
 out:
@@ -1494,11 +1494,11 @@ out:
 }
 
 static void remus_device_postsuspend_cb(libxl__egc *egc,
-                                        libxl__remus_device_state *rds,
+                                        libxl__checkpoint_device_state *cds,
                                         int rc)
 {
     int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     if (!rc)
         ok = 1;
@@ -1512,17 +1512,17 @@ static void libxl__remus_domain_resume_callback(void *data)
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    libxl__remus_device_state *const rds = &dss->rds;
-    rds->callback = remus_device_preresume_cb;
-    libxl__remus_devices_preresume(egc, rds);
+    libxl__checkpoint_device_state *const cds = &dss->cds;
+    cds->callback = remus_device_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
 }
 
 static void remus_device_preresume_cb(libxl__egc *egc,
-                                      libxl__remus_device_state *rds,
+                                      libxl__checkpoint_device_state *cds,
                                       int rc)
 {
     int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -1538,7 +1538,7 @@ static void remus_device_preresume_cb(libxl__egc *egc,
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
 static void remus_device_commit_cb(libxl__egc *egc,
-                                   libxl__remus_device_state *rds,
+                                   libxl__checkpoint_device_state *cds,
                                    int rc);
 static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs);
@@ -1562,7 +1562,7 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
     /* Convenience aliases */
-    libxl__remus_device_state *const rds = &dss->rds;
+    libxl__checkpoint_device_state *const cds = &dss->cds;
 
     STATE_AO_GC(dss->ao);
 
@@ -1571,8 +1571,8 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
         goto out;
     }
 
-    rds->callback = remus_device_commit_cb;
-    libxl__remus_devices_commit(egc, rds);
+    cds->callback = remus_device_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
 
     return;
 
@@ -1581,10 +1581,10 @@ out:
 }
 
 static void remus_device_commit_cb(libxl__egc *egc,
-                                   libxl__remus_device_state *rds,
+                                   libxl__checkpoint_device_state *cds,
                                    int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     STATE_AO_GC(dss->ao);
 
@@ -1833,7 +1833,7 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
 }
 
 static void libxl__remus_teardown_done(libxl__egc *egc,
-                                       libxl__remus_device_state *rds,
+                                       libxl__checkpoint_device_state *cds,
                                        int rc);
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
@@ -1858,8 +1858,8 @@ static void domain_suspend_done(libxl__egc *egc,
          */
         LOGE(WARN, "Domain suspend terminated with rc %d, \
              teardown Remus devices...", rc);
-        dss->rds.callback = libxl__remus_teardown_done;
-        libxl__remus_devices_teardown(egc, &dss->rds);
+        dss->cds.callback = libxl__remus_teardown_done;
+        libxl__checkpoint_devices_teardown(egc, &dss->cds);
         return;
     }
 
@@ -1867,10 +1867,10 @@ static void domain_suspend_done(libxl__egc *egc,
 }
 
 static void libxl__remus_teardown_done(libxl__egc *egc,
-                                       libxl__remus_device_state *rds,
+                                       libxl__checkpoint_device_state *cds,
                                        int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e327604..f18503e 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2478,13 +2478,13 @@ typedef struct libxl__save_helper_state {
 /*----- remus device related state structure -----*/
 /* remus device is an abstract layer of remus devices(nic, disk,
  * etc).It provides the following APIs for libxl:
- *   >libxl__remus_devices_setup
+ *   >libxl__checkpoint_devices_setup
  *     setup remus devices, like attach qdisc, enable disk buffering, etc
- *   >libxl__remus_devices_teardown
+ *   >libxl__checkpoint_devices_teardown
  *     teardown devices
- *   >libxl__remus_devices_postsuspend
- *   >libxl__remus_devices_preresume
- *   >libxl__remus_devices_commit
+ *   >libxl__checkpoint_devices_postsuspend
+ *   >libxl__checkpoint_devices_preresume
+ *   >libxl__checkpoint_devices_commit
  *     above three are for checkpoint.
  * through remus device layer, the remus execution flow will be like
  * this:
@@ -2493,7 +2493,7 @@ typedef struct libxl__save_helper_state {
  *                     ...
  *                      |-> remus device teardown, failover or abort
  * the remus device layer provides an interface
- *   libxl__remus_device_subkind_ops
+ *   libxl__checkpoint_device_subkind_ops
  * which a remus device must implement. the whole remus structure:
  *                           |remus|
  *                              |
@@ -2501,21 +2501,21 @@ typedef struct libxl__save_helper_state {
  *                              |
  *               |nic| |drbd disks| |qemu disks| ...
  * a device(nic, drbd disks, qemu disks, etc) must implement
- * libxl__remus_device_subkind_ops to support remus.
+ * libxl__checkpoint_device_subkind_ops to support remus.
  */
 
-typedef enum libxl__remus_device_kind {
-    LIBXL__REMUS_DEVICE_NIC = (1 << 0),
-    LIBXL__REMUS_DEVICE_DISK= (1 << 1),
-} libxl__remus_device_kind;
+typedef enum libxl__checkpoint_device_kind {
+    LIBXL__CHECKPOINT_DEVICE_NIC = (1 << 0),
+    LIBXL__CHECKPOINT_DEVICE_DISK= (1 << 1),
+} libxl__checkpoint_device_kind;
 
-typedef struct libxl__remus_device libxl__remus_device;
-typedef struct libxl__remus_device_state libxl__remus_device_state;
-typedef struct libxl__remus_device_subkind_ops libxl__remus_device_subkind_ops;
+typedef struct libxl__checkpoint_device libxl__checkpoint_device;
+typedef struct libxl__checkpoint_device_state libxl__checkpoint_device_state;
+typedef struct libxl__checkpoint_device_subkind_ops libxl__checkpoint_device_subkind_ops;
 
-struct libxl__remus_device_subkind_ops {
+struct libxl__checkpoint_device_subkind_ops {
     /* the device kind this ops belongs to... */
-    libxl__remus_device_kind kind;
+    libxl__checkpoint_device_kind kind;
 
     /*
      * init() and destroy() APIs are produced by a device subkind and
@@ -2524,8 +2524,8 @@ struct libxl__remus_device_subkind_ops {
      * the APIs init/destroy device subkind's private data which stored
      * in CTX. must implement.
      */
-    int (*init)(libxl__remus_device_state *rds);
-    void (*destroy)(libxl__remus_device_state *rds);
+    int (*init)(libxl__checkpoint_device_state *cds);
+    void (*destroy)(libxl__checkpoint_device_state *cds);
 
     /*
      * checkpoint callbacks, these are async ops, call dev->callback
@@ -2535,9 +2535,9 @@ struct libxl__remus_device_subkind_ops {
      * These callbacks can be implemented synchronously, call
      * dev->callback at last directly.
      */
-    void (*postsuspend)(libxl__remus_device *dev);
-    void (*preresume)(libxl__remus_device *dev);
-    void (*commit)(libxl__remus_device *dev);
+    void (*postsuspend)(libxl__checkpoint_device *dev);
+    void (*preresume)(libxl__checkpoint_device *dev);
+    void (*commit)(libxl__checkpoint_device *dev);
 
     /*
      * This API determines whether the subkind matchs the specific device. In
@@ -2553,7 +2553,7 @@ struct libxl__remus_device_subkind_ops {
      * It's an async op and must be implemented asynchronously,
      * call dev->callback when done.
      */
-    void (*match)(libxl__remus_device *dev);
+    void (*match)(libxl__checkpoint_device *dev);
 
     /*
      * setup() and teardown() are refer to the actual remus device,
@@ -2562,31 +2562,31 @@ struct libxl__remus_device_subkind_ops {
      * These callbacks can be implemented synchronously, call
      * dev->callback at last directly.
      */
-    void (*setup)(libxl__remus_device *dev);
-    void (*teardown)(libxl__remus_device *dev);
+    void (*setup)(libxl__checkpoint_device *dev);
+    void (*teardown)(libxl__checkpoint_device *dev);
 };
 
 typedef void libxl__remus_callback(libxl__egc *,
-                                   libxl__remus_device_state *, int rc);
+                                   libxl__checkpoint_device_state *, int rc);
 
 /*
  * This structure is for remus device layer, it records remus devices
  * that have been set up.
  */
-struct libxl__remus_device_state {
-    /* must set by caller of libxl__remus_device_(setup|teardown) */
+struct libxl__checkpoint_device_state {
+    /* must set by caller of libxl__checkpoint_device_(setup|teardown) */
     libxl__ao *ao;
     libxl__egc *egc;
     uint32_t domid;
     libxl__remus_callback *callback;
     /* the last ops must be NULL */
-    const libxl__remus_device_subkind_ops **ops;
+    const libxl__checkpoint_device_subkind_ops **ops;
     int enabled_device_kinds;
 
     /* private */
     /* devices that have been set up */
     int saved_rc;
-    libxl__remus_device **dev;
+    libxl__checkpoint_device **dev;
 
     libxl_device_nic *nics;
     int num_nics;
@@ -2599,19 +2599,19 @@ struct libxl__remus_device_state {
     int num_set_up;
 };
 
-typedef void libxl__remus_device_callback(libxl__egc *,
-                                          libxl__remus_device *,
+typedef void libxl__checkpoint_device_callback(libxl__egc *,
+                                          libxl__checkpoint_device *,
                                           int rc);
 /*
  * This structure is init and setup by remus device abstruct layer,
  * and pass to remus device ops
  */
-struct libxl__remus_device {
+struct libxl__checkpoint_device {
     /*----- shared between abstract and concrete layers -----*/
     /* set by remus device abstruct layer */
     /* libxl__device_* which this remus device related to */
     const void *backend_dev;
-    libxl__remus_device_kind kind;
+    libxl__checkpoint_device_kind kind;
 
     /*----- private for abstract layer only -----*/
     /*
@@ -2619,9 +2619,9 @@ struct libxl__remus_device {
      * for the device.
      */
     int ops_index;
-    const libxl__remus_device_subkind_ops *ops;
-    libxl__remus_device_callback *callback;
-    libxl__remus_device_state *rds;
+    const libxl__checkpoint_device_subkind_ops *ops;
+    libxl__checkpoint_device_callback *callback;
+    libxl__checkpoint_device_state *cds;
 
     /*----- private for concrete (device-specific) layer -----*/
     /* *kind* of device's private data */
@@ -2636,20 +2636,20 @@ struct libxl__remus_device {
     libxl__ev_child child;
 };
 
-/* the following 5 APIs are async ops, call rds->callback when done */
-_hidden void libxl__remus_devices_setup(libxl__egc *egc,
-                                        libxl__remus_device_state *rds);
-_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
-                                           libxl__remus_device_state *rds);
-_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
-                                              libxl__remus_device_state *rds);
-_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
-                                            libxl__remus_device_state *rds);
-_hidden void libxl__remus_devices_commit(libxl__egc *egc,
-                                         libxl__remus_device_state *rds);
-
-extern const libxl__remus_device_subkind_ops remus_device_nic;
-extern const libxl__remus_device_subkind_ops remus_device_drbd_disk;
+/* the following 5 APIs are async ops, call cds->callback when done */
+_hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                        libxl__checkpoint_device_state *cds);
+_hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                           libxl__checkpoint_device_state *cds);
+_hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
+                                              libxl__checkpoint_device_state *cds);
+_hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
+                                            libxl__checkpoint_device_state *cds);
+_hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
+                                         libxl__checkpoint_device_state *cds);
+
+extern const libxl__checkpoint_device_subkind_ops remus_device_nic;
+extern const libxl__checkpoint_device_subkind_ops remus_device_drbd_disk;
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
@@ -2693,7 +2693,7 @@ struct libxl__domain_suspend_state {
     const char *dm_savefile;
     /* for Remus */
     struct {
-        libxl__remus_device_state rds;
+        libxl__checkpoint_device_state cds;
         const char *netbufscript;
         /* used for Remus checkpoint */
         libxl__ev_time checkpoint_timeout;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 2387563..c9a6b6f 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -34,12 +34,12 @@ struct libxl__remus_netbuf_state {
     struct nl_cache *qdisc_cache;
 };
 
-typedef struct libxl__remus_device_nic {
+typedef struct libxl__checkpoint_device_nic {
     int devid;
     const char *vif;
     const char *ifb;
     struct rtnl_qdisc *qdisc;
-} libxl__remus_device_nic;
+} libxl__checkpoint_device_nic;
 
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
@@ -48,13 +48,13 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 
 /*----- init() and destroy() -----*/
 
-static int nic_init(libxl__remus_device_state *rds)
+static int nic_init(libxl__checkpoint_device_state *cds)
 {
     int rc, ret;
     libxl__remus_netbuf_state *ns;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     GCNEW(ns);
     CTX->rns = ns;
@@ -83,8 +83,8 @@ static int nic_init(libxl__remus_device_state *rds)
         goto out;
     }
 
-    ns->ao = rds->ao;
-    ns->domid = rds->domid;
+    ns->ao = cds->ao;
+    ns->domid = cds->domid;
     ns->netbufscript = dss->netbufscript;
 
     rc = 0;
@@ -93,9 +93,9 @@ out:
     return rc;
 }
 
-static void nic_destroy(libxl__remus_device_state *rds)
+static void nic_destroy(libxl__checkpoint_device_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
     libxl__remus_netbuf_state *ns = CTX->rns;
 
     if (!ns)
@@ -126,14 +126,14 @@ static void nic_destroy(libxl__remus_device_state *rds)
  * it must ONLY be used for remus because if driver domains
  * were in use it would constitute a security vulnerability.
  */
-static const char *get_vifname(libxl__remus_device *dev,
+static const char *get_vifname(libxl__checkpoint_device *dev,
                                const libxl_device_nic *nic)
 {
     const char *vifname = NULL;
     const char *path;
     int rc;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Convenience aliases */
     libxl__remus_netbuf_state *netbuf_state = CTX->rns;
@@ -151,7 +151,7 @@ static const char *get_vifname(libxl__remus_device *dev,
     return vifname;
 }
 
-static void free_qdisc(libxl__remus_device_nic *remus_nic)
+static void free_qdisc(libxl__checkpoint_device_nic *remus_nic)
 {
     if (remus_nic->qdisc == NULL)
         return;
@@ -161,7 +161,7 @@ static void free_qdisc(libxl__remus_device_nic *remus_nic)
 }
 
 static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
-                      libxl__remus_device_nic *remus_nic)
+                      libxl__checkpoint_device_nic *remus_nic)
 {
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
@@ -251,13 +251,13 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
  * $REMUS_IFB (for teardown)
  * setup/teardown as command line arg.
  */
-static void setup_async_exec(libxl__remus_device *dev, char *op)
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
 {
     int arraysize, nr = 0;
     char **env = NULL, **args = NULL;
     libxl__async_exec_state *aes = &dev->aes;
-    libxl__remus_device_nic *remus_nic = dev->data;
-    STATE_AO_GC(dev->rds->ao);
+    libxl__checkpoint_device_nic *remus_nic = dev->data;
+    STATE_AO_GC(dev->cds->ao);
 
     /* Convenience aliases */
     libxl__remus_netbuf_state *ns = CTX->rns;
@@ -288,7 +288,7 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
     args[nr++] = NULL;
     assert(nr == arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", args[0], args[1]);
     aes->env = env;
     aes->args = args;
@@ -305,13 +305,13 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
 
 /* setup() and teardown() */
 
-static void nic_setup(libxl__remus_device *dev)
+static void nic_setup(libxl__checkpoint_device *dev)
 {
     int rc;
-    libxl__remus_device_nic *remus_nic;
+    libxl__checkpoint_device_nic *remus_nic;
     const libxl_device_nic *nic = dev->backend_dev;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     GCNEW(remus_nic);
     dev->data = remus_nic;
@@ -330,7 +330,7 @@ static void nic_setup(libxl__remus_device *dev)
     return;
 
 out:
-    dev->callback(dev->rds->egc, dev, rc);
+    dev->callback(dev->cds->egc, dev, rc);
 }
 
 /*
@@ -341,12 +341,12 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
                                    libxl__async_exec_state *aes,
                                    int status)
 {
-    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
-    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__checkpoint_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__checkpoint_device_nic *remus_nic = dev->data;
     const char *out_path_base, *hotplug_error = NULL;
     int rc;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Convenience aliases */
     libxl__remus_netbuf_state *netbuf_state = CTX->rns;
@@ -401,10 +401,10 @@ out:
     dev->callback(egc, dev, rc);
 }
 
-static void nic_teardown(libxl__remus_device *dev)
+static void nic_teardown(libxl__checkpoint_device *dev)
 {
     int rc;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     setup_async_exec(dev, "teardown");
 
@@ -415,7 +415,7 @@ static void nic_teardown(libxl__remus_device *dev)
     return;
 
 out:
-    dev->callback(dev->rds->egc, dev, rc);
+    dev->callback(dev->cds->egc, dev, rc);
 }
 
 static void netbuf_teardown_script_cb(libxl__egc *egc,
@@ -423,8 +423,8 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
                                       int status)
 {
     int rc;
-    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
-    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__checkpoint_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__checkpoint_device_nic *remus_nic = dev->data;
 
     if (status)
         rc = ERROR_FAIL;
@@ -446,7 +446,7 @@ enum {
 
 /* API implementations */
 
-static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+static int remus_netbuf_op(libxl__checkpoint_device_nic *remus_nic,
                            libxl__remus_netbuf_state *netbuf_state,
                            int buffer_op)
 {
@@ -483,30 +483,30 @@ out:
     return rc;
 }
 
-static void nic_postsuspend(libxl__remus_device *dev)
+static void nic_postsuspend(libxl__checkpoint_device *dev)
 {
     int rc;
-    libxl__remus_device_nic *remus_nic = dev->data;
-    STATE_AO_GC(dev->rds->ao);
+    libxl__checkpoint_device_nic *remus_nic = dev->data;
+    STATE_AO_GC(dev->cds->ao);
     libxl__remus_netbuf_state *ns = CTX->rns;
 
     rc = remus_netbuf_op(remus_nic, ns, tc_buffer_start);
-    dev->callback(dev->rds->egc, dev, rc);
+    dev->callback(dev->cds->egc, dev, rc);
 }
 
-static void nic_commit(libxl__remus_device *dev)
+static void nic_commit(libxl__checkpoint_device *dev)
 {
     int rc;
-    libxl__remus_device_nic *remus_nic = dev->data;
-    STATE_AO_GC(dev->rds->ao);
+    libxl__checkpoint_device_nic *remus_nic = dev->data;
+    STATE_AO_GC(dev->cds->ao);
     libxl__remus_netbuf_state *ns = CTX->rns;
 
     rc = remus_netbuf_op(remus_nic, ns, tc_buffer_release);
-    dev->callback(dev->rds->egc, dev, rc);
+    dev->callback(dev->cds->egc, dev, rc);
 }
 
-const libxl__remus_device_subkind_ops remus_device_nic = {
-    .kind = LIBXL__REMUS_DEVICE_NIC,
+const libxl__checkpoint_device_subkind_ops remus_device_nic = {
+    .kind = LIBXL__CHECKPOINT_DEVICE_NIC,
     .init = nic_init,
     .destroy = nic_destroy,
     .setup = nic_setup,
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index fc29c36..b4a067e 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,25 +22,25 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
-static void nic_match(libxl__remus_device *dev)
+static void nic_match(libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+    dev->callback(dev->cds->egc, dev, ERROR_FAIL);
 }
 
-static int nic_init(libxl__remus_device_state *rds)
+static int nic_init(libxl__checkpoint_device_state *cds)
 {
     return 0;
 }
 
-static void nic_destroy(libxl__remus_device_state *rds)
+static void nic_destroy(libxl__checkpoint_device_state *cds)
 {
     return;
 }
 
-const libxl__remus_device_subkind_ops remus_device_nic = {
-    .kind = LIBXL__REMUS_DEVICE_NIC,
+const libxl__checkpoint_device_subkind_ops remus_device_nic = {
+    .kind = LIBXL__CHECKPOINT_DEVICE_NIC,
     .init = nic_init,
     .destroy = nic_destroy,
     .match = nic_match,
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index ae5a9d6..dd47286 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -33,12 +33,12 @@ typedef struct libxl__remus_drbd_disk {
 } libxl__remus_drbd_disk;
 
 /*----- helper functions, for async calls -----*/
-static void drbd_async_call(libxl__remus_device *dev,
-                            void func(libxl__remus_device *),
+static void drbd_async_call(libxl__checkpoint_device *dev,
+                            void func(libxl__checkpoint_device *),
                             libxl__ev_child_callback callback)
 {
     int pid = -1;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Fork and call */
     pid = libxl__ev_child_fork(gc, &dev->child, callback);
@@ -57,15 +57,15 @@ static void drbd_async_call(libxl__remus_device *dev,
     return;
 
 out:
-    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+    dev->callback(dev->cds->egc, dev, ERROR_FAIL);
 }
 
 /*----- init() and destroy() -----*/
-static int drbd_init(libxl__remus_device_state *rds)
+static int drbd_init(libxl__checkpoint_device_state *cds)
 {
     libxl__remus_drbd_state *drbd_state;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     GCNEW(drbd_state);
     CTX->drbd_state = drbd_state;
@@ -76,7 +76,7 @@ static int drbd_init(libxl__remus_device_state *rds)
     return 0;
 }
 
-static void drbd_destroy(libxl__remus_device_state *rds)
+static void drbd_destroy(libxl__checkpoint_device_state *cds)
 {
     return;
 }
@@ -90,12 +90,12 @@ static void match_async_exec_cb(libxl__egc *egc,
 
 /* implementations */
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int arraysize, nr = 0;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aes;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     libxl__remus_drbd_state *drbd_state = CTX->drbd_state;
     /* setup env & args */
@@ -129,29 +129,29 @@ out:
     dev->callback(egc, dev, ERROR_FAIL);
 }
 
-static void drbd_match(libxl__remus_device *dev)
+static void drbd_match(libxl__checkpoint_device *dev)
 {
-    match_async_exec(dev->rds->egc, dev);
+    match_async_exec(dev->cds->egc, dev);
 }
 
 static void match_async_exec_cb(libxl__egc *egc,
                                 libxl__async_exec_state *aes,
                                 int status)
 {
-    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aes, *dev, aes);
 
     if (status) {
-        dev->callback(egc, dev, ERROR_REMUS_DEVOPS_NOT_MATCH);
+        dev->callback(egc, dev, ERROR_CHECKPOINT_DEVOPS_NOT_MATCH);
     } else {
         dev->callback(egc, dev, 0);
     }
 }
 
-static void drbd_setup(libxl__remus_device *dev)
+static void drbd_setup(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *drbd_disk;
     const libxl_device_disk *disk = dev->backend_dev;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     GCNEW(drbd_disk);
     dev->data = drbd_disk;
@@ -159,17 +159,17 @@ static void drbd_setup(libxl__remus_device *dev)
     drbd_disk->ackwait = 0;
     drbd_disk->ctl_fd = open(drbd_disk->path, O_RDONLY);
     if (drbd_disk->ctl_fd < 0)
-        dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+        dev->callback(dev->cds->egc, dev, ERROR_FAIL);
     else
-        dev->callback(dev->rds->egc, dev, 0);
+        dev->callback(dev->cds->egc, dev, 0);
 }
 
-static void drbd_teardown(libxl__remus_device *dev)
+static void drbd_teardown(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *drbd_disk = dev->data;
 
     close(drbd_disk->ctl_fd);
-    dev->callback(dev->rds->egc, dev, 0);
+    dev->callback(dev->cds->egc, dev, 0);
 }
 
 /*----- checkpointing APIs -----*/
@@ -182,7 +182,7 @@ static void chekpoint_async_call_done(libxl__egc *egc,
 /* API implementations */
 
 /* this op will not wait and block, so implement as sync op */
-static void drbd_postsuspend(libxl__remus_device *dev)
+static void drbd_postsuspend(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *rdd = dev->data;
 
@@ -191,10 +191,10 @@ static void drbd_postsuspend(libxl__remus_device *dev)
             rdd->ackwait = 1;
     }
 
-    dev->callback(dev->rds->egc, dev, 0);
+    dev->callback(dev->cds->egc, dev, 0);
 }
 
-static void drbd_preresume_async(libxl__remus_device *dev)
+static void drbd_preresume_async(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *rdd = dev->data;
     int ackwait = rdd->ackwait;
@@ -207,7 +207,7 @@ static void drbd_preresume_async(libxl__remus_device *dev)
     _exit(ackwait);
 }
 
-static void drbd_preresume(libxl__remus_device *dev)
+static void drbd_preresume(libxl__checkpoint_device *dev)
 {
     drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
 }
@@ -216,9 +216,9 @@ static void chekpoint_async_call_done(libxl__egc *egc,
                                       libxl__ev_child *child,
                                       pid_t pid, int status)
 {
-    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__checkpoint_device *dev = CONTAINER_OF(child, *dev, child);
     libxl__remus_drbd_disk *rdd = dev->data;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     if (WIFEXITED(status)) {
         rdd->ackwait = WEXITSTATUS(status);
@@ -228,8 +228,8 @@ static void chekpoint_async_call_done(libxl__egc *egc,
     }
 }
 
-const libxl__remus_device_subkind_ops remus_device_drbd_disk = {
-    .kind = LIBXL__REMUS_DEVICE_DISK,
+const libxl__checkpoint_device_subkind_ops remus_device_drbd_disk = {
+    .kind = LIBXL__CHECKPOINT_DEVICE_DISK,
     .init = drbd_init,
     .destroy = drbd_destroy,
     .match = drbd_match,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 6551109..dc9f78e 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -58,8 +58,8 @@ libxl_error = Enumeration("error", [
     (-12, "OSEVENT_REG_FAIL"),
     (-13, "BUFFERFULL"),
     (-14, "UNKNOWN_CHILD"),
-    (-15, "REMUS_DEVOPS_NOT_MATCH"),
-    (-16, "REMUS_DEVICE_NOT_SUPPORTED"),
+    (-15, "CHECKPOINT_DEVOPS_NOT_MATCH"),
+    (-16, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 06/25] adjust the indentation
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (4 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 05/25] rename remus device to checkpoint device Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 07/25] Refactor domain_suspend_callback_common() Wen Congyang
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_checkpoint_device.c | 32 ++++++++++++++++++--------------
 tools/libxl/libxl_internal.h          | 14 +++++++-------
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 87ee412..b1575f7 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -67,10 +67,11 @@ static void device_teardown_cb(libxl__egc *egc,
 /* checkpoint device setup and teardown */
 
 static void libxl__checkpoint_device_init(libxl__egc *egc,
-                                     libxl__checkpoint_device_state *cds,
-                                     libxl__checkpoint_device_kind kind,
-                                     void *libxl_dev);
-void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_device_state *cds)
+                                          libxl__checkpoint_device_state *cds,
+                                          libxl__checkpoint_device_kind kind,
+                                          void *libxl_dev);
+void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                     libxl__checkpoint_device_state *cds)
 {
     int i;
     STATE_AO_GC(cds->ao);
@@ -99,12 +100,14 @@ void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_device_s
 
     for (i = 0; i < cds->num_nics; i++) {
         libxl__checkpoint_device_init(egc, cds,
-                                 LIBXL__CHECKPOINT_DEVICE_NIC, &cds->nics[i]);
+                                      LIBXL__CHECKPOINT_DEVICE_NIC,
+                                      &cds->nics[i]);
     }
 
     for (i = 0; i < cds->num_disks; i++) {
         libxl__checkpoint_device_init(egc, cds,
-                                 LIBXL__CHECKPOINT_DEVICE_DISK, &cds->disks[i]);
+                                      LIBXL__CHECKPOINT_DEVICE_DISK,
+                                      &cds->disks[i]);
     }
 
     return;
@@ -115,9 +118,9 @@ out:
 }
 
 static void libxl__checkpoint_device_init(libxl__egc *egc,
-                                     libxl__checkpoint_device_state *cds,
-                                     libxl__checkpoint_device_kind kind,
-                                     void *libxl_dev)
+                                          libxl__checkpoint_device_state *cds,
+                                          libxl__checkpoint_device_kind kind,
+                                          void *libxl_dev)
 {
     libxl__checkpoint_device *dev = NULL;
 
@@ -234,7 +237,8 @@ static void device_setup_cb(libxl__egc *egc,
         cds->callback(egc, cds, cds->saved_rc);
 }
 
-void libxl__checkpoint_devices_teardown(libxl__egc *egc, libxl__checkpoint_device_state *cds)
+void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                        libxl__checkpoint_device_state *cds)
 {
     int i, num_set_up;
     libxl__checkpoint_device *dev;
@@ -309,12 +313,12 @@ static void device_checkpoint_cb(libxl__egc *egc,
 
 /* API implementations */
 
-#define define_checkpoint_device_api(api)                             \
-void libxl__checkpoint_devices_##api(libxl__egc *egc,                            \
-                                libxl__checkpoint_device_state *cds)             \
+#define define_checkpoint_device_api(api)                                   \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                       \
+                                libxl__checkpoint_device_state *cds)        \
 {                                                                           \
     int i;                                                                  \
-    libxl__checkpoint_device *dev;                                               \
+    libxl__checkpoint_device *dev;                                          \
                                                                             \
     STATE_AO_GC(cds->ao);                                                   \
                                                                             \
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f18503e..bb2aaed 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2600,8 +2600,8 @@ struct libxl__checkpoint_device_state {
 };
 
 typedef void libxl__checkpoint_device_callback(libxl__egc *,
-                                          libxl__checkpoint_device *,
-                                          int rc);
+                                               libxl__checkpoint_device *,
+                                               int rc);
 /*
  * This structure is init and setup by remus device abstruct layer,
  * and pass to remus device ops
@@ -2638,15 +2638,15 @@ struct libxl__checkpoint_device {
 
 /* the following 5 APIs are async ops, call cds->callback when done */
 _hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
-                                        libxl__checkpoint_device_state *cds);
+                                             libxl__checkpoint_device_state *cds);
 _hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
-                                           libxl__checkpoint_device_state *cds);
+                                                libxl__checkpoint_device_state *cds);
 _hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
-                                              libxl__checkpoint_device_state *cds);
+                                                   libxl__checkpoint_device_state *cds);
 _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
-                                            libxl__checkpoint_device_state *cds);
+                                                 libxl__checkpoint_device_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
-                                         libxl__checkpoint_device_state *cds);
+                                              libxl__checkpoint_device_state *cds);
 
 extern const libxl__checkpoint_device_subkind_ops remus_device_nic;
 extern const libxl__checkpoint_device_subkind_ops remus_device_drbd_disk;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 07/25] Refactor domain_suspend_callback_common()
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (5 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 06/25] adjust the indentation Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 08/25] Update libxl__domain_resume() for colo Wen Congyang
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

libxl__domain_suspend() is to save the guest. I think
we should call it libxl__domain_save(), but I don't
rename it.

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. suspend both primay vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
To suspend secondary vm, we need an independent API to
suspend vm.

The core function to suspend vm is domain_suspend_callback_common().
So use a new structure libxl__domain_suspend_state2 to
instead of libxl__domain_suspend_state. The dss's members that
will be used in domain_suspend_callback_common() are
moved to dss2.

We introduce a new API libxl__domain_suspend2() too.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom.c      | 235 ++++++++++++++++++++++++-------------------
 tools/libxl/libxl_internal.h |  39 +++++--
 2 files changed, 159 insertions(+), 115 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index cf79b74..035d25a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -797,7 +797,7 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
+                                libxl__domain_suspend_state2 *dss2, int ok);
 
 /*----- complicated callback, called by xc_domain_save -----*/
 
@@ -1015,16 +1015,17 @@ static void switch_logdirty_done(libxl__egc *egc,
 /*----- callbacks, called by xc_domain_save -----*/
 
 int libxl__domain_suspend_device_model(libxl__gc *gc,
-                                       libxl__domain_suspend_state *dss)
+                                       libxl__domain_suspend_state2 *dss2)
 {
     int ret = 0;
-    uint32_t const domid = dss->domid;
-    const char *const filename = dss->dm_savefile;
+    uint32_t const domid = dss2->domid;
+    const char *const filename = dss2->dm_savefile;
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
         LOG(DEBUG, "Saving device model state to %s", filename);
-        libxl__qemu_traditional_cmd(gc, domid, "save");
+        if (dss2->save_dm)
+            libxl__qemu_traditional_cmd(gc, domid, "save");
         libxl__wait_for_device_model_deprecated(gc, domid, "paused", NULL, NULL, NULL);
         break;
     }
@@ -1032,9 +1033,11 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
         if (libxl__qmp_stop(gc, domid))
             return ERROR_FAIL;
         /* Save DM state into filename */
-        ret = libxl__qmp_save(gc, domid, filename);
-        if (ret)
-            unlink(filename);
+        if (dss2->save_dm) {
+            ret = libxl__qmp_save(gc, domid, filename);
+            if (ret)
+                unlink(filename);
+        }
         break;
     default:
         return ERROR_INVAL;
@@ -1064,9 +1067,9 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
 }
 
 static void domain_suspend_common_wait_guest(libxl__egc *egc,
-                                             libxl__domain_suspend_state *dss);
+                                             libxl__domain_suspend_state2 *dss2);
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss);
+                                         libxl__domain_suspend_state2 *dss2);
 
 static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
       libxl__xswait_state *xswa, int rc, const char *state);
@@ -1075,14 +1078,14 @@ static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path);
 static void suspend_common_wait_guest_check(libxl__egc *egc,
-        libxl__domain_suspend_state *dss);
+                                            libxl__domain_suspend_state2 *dss2);
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs);
 
 static void domain_suspend_common_failed(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss);
+                                         libxl__domain_suspend_state2 *dss2);
 static void domain_suspend_common_done(libxl__egc *egc,
-                                       libxl__domain_suspend_state *dss,
+                                       libxl__domain_suspend_state2 *dss2,
                                        bool ok);
 
 static bool domain_suspend_pvcontrol_acked(const char *state) {
@@ -1091,36 +1094,36 @@ static bool domain_suspend_pvcontrol_acked(const char *state) {
     return strcmp(state,"suspend");
 }
 
-/* calls dss->callback_common_done when done */
+/* calls dss2->callback_common_done when done */
 static void domain_suspend_callback_common(libxl__egc *egc,
-                                           libxl__domain_suspend_state *dss)
+                                           libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     uint64_t hvm_s_state = 0, hvm_pvdrv = 0;
     int ret, rc;
 
     /* Convenience aliases */
-    const uint32_t domid = dss->domid;
+    const uint32_t domid = dss2->domid;
 
-    if (dss->hvm) {
+    if (dss2->hvm) {
         xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, &hvm_pvdrv);
         xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, &hvm_s_state);
     }
 
-    if ((hvm_s_state == 0) && (dss->guest_evtchn.port >= 0)) {
+    if ((hvm_s_state == 0) && (dss2->guest_evtchn.port >= 0)) {
         LOG(DEBUG, "issuing %s suspend request via event channel",
-            dss->hvm ? "PVHVM" : "PV");
-        ret = xc_evtchn_notify(CTX->xce, dss->guest_evtchn.port);
+            dss2->hvm ? "PVHVM" : "PV");
+        ret = xc_evtchn_notify(CTX->xce, dss2->guest_evtchn.port);
         if (ret < 0) {
             LOG(ERROR, "xc_evtchn_notify failed ret=%d", ret);
             goto err;
         }
 
-        dss->guest_evtchn.callback = domain_suspend_common_wait_guest_evtchn;
-        rc = libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
+        dss2->guest_evtchn.callback = domain_suspend_common_wait_guest_evtchn;
+        rc = libxl__ev_evtchn_wait(gc, &dss2->guest_evtchn);
         if (rc) goto err;
 
-        rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
+        rc = libxl__ev_time_register_rel(gc, &dss2->guest_timeout,
                                          suspend_common_wait_guest_timeout,
                                          60*1000);
         if (rc) goto err;
@@ -1128,7 +1131,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
         return;
     }
 
-    if (dss->hvm && (!hvm_pvdrv || hvm_s_state)) {
+    if (dss2->hvm && (!hvm_pvdrv || hvm_s_state)) {
         LOG(DEBUG, "Calling xc_domain_shutdown on HVM domain");
         ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
         if (ret < 0) {
@@ -1136,55 +1139,55 @@ static void domain_suspend_callback_common(libxl__egc *egc,
             goto err;
         }
         /* The guest does not (need to) respond to this sort of request. */
-        dss->guest_responded = 1;
-        domain_suspend_common_wait_guest(egc, dss);
+        dss2->guest_responded = 1;
+        domain_suspend_common_wait_guest(egc, dss2);
         return;
     }
 
     LOG(DEBUG, "issuing %s suspend request via XenBus control node",
-        dss->hvm ? "PVHVM" : "PV");
+        dss2->hvm ? "PVHVM" : "PV");
 
     libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend");
 
-    dss->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
-    if (!dss->pvcontrol.path) goto err;
+    dss2->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
+    if (!dss2->pvcontrol.path) goto err;
 
-    dss->pvcontrol.ao = ao;
-    dss->pvcontrol.what = "guest acknowledgement of suspend request";
-    dss->pvcontrol.timeout_ms = 60 * 1000;
-    dss->pvcontrol.callback = domain_suspend_common_pvcontrol_suspending;
-    libxl__xswait_start(gc, &dss->pvcontrol);
+    dss2->pvcontrol.ao = ao;
+    dss2->pvcontrol.what = "guest acknowledgement of suspend request";
+    dss2->pvcontrol.timeout_ms = 60 * 1000;
+    dss2->pvcontrol.callback = domain_suspend_common_pvcontrol_suspending;
+    libxl__xswait_start(gc, &dss2->pvcontrol);
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
-        libxl__ev_evtchn *evev)
+                                                    libxl__ev_evtchn *evev)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(evev, *dss, guest_evtchn);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(evev, *dss2, guest_evtchn);
+    STATE_AO_GC(dss2->ao);
     /* If we should be done waiting, suspend_common_wait_guest_check
      * will end up calling domain_suspend_common_guest_suspended or
      * domain_suspend_common_failed, both of which cancel the evtchn
      * wait.  So re-enable it now. */
-    libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
-    suspend_common_wait_guest_check(egc, dss);
+    libxl__ev_evtchn_wait(gc, &dss2->guest_evtchn);
+    suspend_common_wait_guest_check(egc, dss2);
 }
 
 static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
       libxl__xswait_state *xswa, int rc, const char *state)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(xswa, *dss, pvcontrol);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(xswa, *dss2, pvcontrol);
+    STATE_AO_GC(dss2->ao);
     xs_transaction_t t = 0;
 
     if (!rc && !domain_suspend_pvcontrol_acked(state))
         /* keep waiting */
         return;
 
-    libxl__xswait_stop(gc, &dss->pvcontrol);
+    libxl__xswait_stop(gc, &dss2->pvcontrol);
 
     if (rc == ERROR_TIMEDOUT) {
         /*
@@ -1227,56 +1230,56 @@ static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
     LOG(DEBUG, "guest acknowledged suspend request");
 
     libxl__xs_transaction_abort(gc, &t);
-    dss->guest_responded = 1;
-    domain_suspend_common_wait_guest(egc,dss);
+    dss2->guest_responded = 1;
+    domain_suspend_common_wait_guest(egc,dss2);
     return;
 
  err:
     libxl__xs_transaction_abort(gc, &t);
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
     return;
 }
 
 static void domain_suspend_common_wait_guest(libxl__egc *egc,
-                                             libxl__domain_suspend_state *dss)
+                                             libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     int rc;
 
     LOG(DEBUG, "wait for the guest to suspend");
 
-    rc = libxl__ev_xswatch_register(gc, &dss->guest_watch,
+    rc = libxl__ev_xswatch_register(gc, &dss2->guest_watch,
                                     suspend_common_wait_guest_watch,
                                     "@releaseDomain");
     if (rc) goto err;
 
-    rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
+    rc = libxl__ev_time_register_rel(gc, &dss2->guest_timeout,
                                      suspend_common_wait_guest_timeout,
                                      60*1000);
     if (rc) goto err;
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(xsw, *dss, guest_watch);
-    suspend_common_wait_guest_check(egc, dss);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(xsw, *dss2, guest_watch);
+    suspend_common_wait_guest_check(egc, dss2);
 }
 
 static void suspend_common_wait_guest_check(libxl__egc *egc,
-        libxl__domain_suspend_state *dss)
+                                            libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     xc_domaininfo_t info;
     int ret;
     int shutdown_reason;
 
     /* Convenience aliases */
-    const uint32_t domid = dss->domid;
+    const uint32_t domid = dss2->domid;
 
     ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info);
     if (ret < 0) {
@@ -1303,59 +1306,59 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
     }
 
     LOG(DEBUG, "guest has suspended");
-    domain_suspend_common_guest_suspended(egc, dss);
+    domain_suspend_common_guest_suspended(egc, dss2);
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, guest_timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(ev, *dss2, guest_timeout);
+    STATE_AO_GC(dss2->ao);
     LOG(ERROR, "guest did not suspend, timed out");
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss)
+                                            libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     int ret;
 
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
-    libxl__ev_time_deregister(gc, &dss->guest_timeout);
+    libxl__ev_evtchn_cancel(gc, &dss2->guest_evtchn);
+    libxl__ev_xswatch_deregister(gc, &dss2->guest_watch);
+    libxl__ev_time_deregister(gc, &dss2->guest_timeout);
 
-    if (dss->hvm) {
-        ret = libxl__domain_suspend_device_model(gc, dss);
+    if (dss2->hvm) {
+        ret = libxl__domain_suspend_device_model(gc, dss2);
         if (ret) {
             LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret);
-            domain_suspend_common_failed(egc, dss);
+            domain_suspend_common_failed(egc, dss2);
             return;
         }
     }
-    domain_suspend_common_done(egc, dss, 1);
+    domain_suspend_common_done(egc, dss2, 1);
 }
 
 static void domain_suspend_common_failed(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss)
+                                         libxl__domain_suspend_state2 *dss2)
 {
-    domain_suspend_common_done(egc, dss, 0);
+    domain_suspend_common_done(egc, dss2, 0);
 }
 
 static void domain_suspend_common_done(libxl__egc *egc,
-                                       libxl__domain_suspend_state *dss,
+                                       libxl__domain_suspend_state2 *dss2,
                                        bool ok)
 {
     EGC_GC;
-    assert(!libxl__xswait_inuse(&dss->pvcontrol));
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
-    libxl__ev_time_deregister(gc, &dss->guest_timeout);
-    dss->callback_common_done(egc, dss, ok);
+    assert(!libxl__xswait_inuse(&dss2->pvcontrol));
+    libxl__ev_evtchn_cancel(gc, &dss2->guest_evtchn);
+    libxl__ev_xswatch_deregister(gc, &dss2->guest_watch);
+    libxl__ev_time_deregister(gc, &dss2->guest_timeout);
+    dss2->callback_common_done(egc, dss2, ok);
 }
 
 static inline char *physmap_path(libxl__gc *gc, uint32_t domid,
@@ -1448,19 +1451,24 @@ static void libxl__domain_suspend_callback(void *data)
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
 
-    dss->callback_common_done = domain_suspend_callback_common_done;
-    domain_suspend_callback_common(egc, dss);
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *dss2 = &dss->dss2;
+
+    dss2->callback_common_done = domain_suspend_callback_common_done;
+    domain_suspend_callback_common(egc, dss2);
 }
 
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok)
+                                libxl__domain_suspend_state2 *dss2, int ok)
 {
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
 /*----- remus callbacks -----*/
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
+                                libxl__domain_suspend_state2 *dss2, int ok);
 static void remus_device_postsuspend_cb(libxl__egc *egc,
                                         libxl__checkpoint_device_state *cds,
                                         int rc);
@@ -1474,13 +1482,18 @@ static void libxl__remus_domain_suspend_callback(void *data)
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
 
-    dss->callback_common_done = remus_domain_suspend_callback_common_done;
-    domain_suspend_callback_common(egc, dss);
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *const dss2 = &dss->dss2;
+
+    dss2->callback_common_done = remus_domain_suspend_callback_common_done;
+    domain_suspend_callback_common(egc, dss2);
 }
 
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok)
+                                libxl__domain_suspend_state2 *dss2, int ok)
 {
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
     if (!ok)
         goto out;
 
@@ -1622,6 +1635,11 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
 }
 
 /*----- main code for suspending, in order of execution -----*/
+void libxl__domain_suspend2(libxl__egc *egc,
+                            libxl__domain_suspend_state2 *dss2)
+{
+    domain_suspend_callback_common(egc, dss2);
+}
 
 void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 {
@@ -1637,20 +1655,23 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     const libxl_domain_remus_info *const r_info = dss->remus;
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->shs.callbacks.save.a;
+    libxl__domain_suspend_state2 *dss2 = &dss->dss2;
 
     logdirty_init(&dss->logdirty);
-    libxl__xswait_init(&dss->pvcontrol);
-    libxl__ev_evtchn_init(&dss->guest_evtchn);
-    libxl__ev_xswatch_init(&dss->guest_watch);
-    libxl__ev_time_init(&dss->guest_timeout);
+    libxl__xswait_init(&dss2->pvcontrol);
+    libxl__ev_evtchn_init(&dss2->guest_evtchn);
+    libxl__ev_xswatch_init(&dss2->guest_watch);
+    libxl__ev_time_init(&dss2->guest_timeout);
 
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
         dss->hvm = 1;
+        dss2->hvm = 1;
         break;
     }
     case LIBXL_DOMAIN_TYPE_PV:
         dss->hvm = 0;
+        dss2->hvm = 0;
         break;
     default:
         abort();
@@ -1660,10 +1681,13 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
           | (debug ? XCFLAGS_DEBUG : 0)
           | (dss->hvm ? XCFLAGS_HVM : 0);
 
-    dss->guest_evtchn.port = -1;
-    dss->guest_evtchn_lockfd = -1;
-    dss->guest_responded = 0;
-    dss->dm_savefile = libxl__device_model_savefile(gc, domid);
+    dss2->guest_evtchn.port = -1;
+    dss2->guest_evtchn_lockfd = -1;
+    dss2->guest_responded = 0;
+    dss2->dm_savefile = libxl__device_model_savefile(gc, domid);
+    dss2->domid = domid;
+    dss2->ao = ao;
+    dss2->save_dm = 1;
 
     if (r_info != NULL) {
         dss->interval = r_info->interval;
@@ -1674,11 +1698,11 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     port = xs_suspend_evtchn_port(dss->domid);
 
     if (port >= 0) {
-        dss->guest_evtchn.port =
+        dss2->guest_evtchn.port =
             xc_suspend_evtchn_init_exclusive(CTX->xch, CTX->xce,
-                                  dss->domid, port, &dss->guest_evtchn_lockfd);
+                                  dss->domid, port, &dss2->guest_evtchn_lockfd);
 
-        if (dss->guest_evtchn.port < 0) {
+        if (dss2->guest_evtchn.port < 0) {
             LOG(WARN, "Suspend event channel initialization failed");
             rc = ERROR_FAIL;
             goto out;
@@ -1717,10 +1741,10 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
 
     if (retval) {
         LOGEV(ERROR, errnoval, "saving domain: %s",
-                         dss->guest_responded ?
+                         dss->dss2.guest_responded ?
                          "domain responded to suspend request" :
                          "domain did not respond to suspend request");
-        if ( !dss->guest_responded )
+        if ( !dss->dss2.guest_responded )
             rc = ERROR_GUEST_TIMEDOUT;
         else
             rc = ERROR_FAIL;
@@ -1728,7 +1752,7 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        rc = libxl__domain_suspend_device_model(gc, dss);
+        rc = libxl__domain_suspend_device_model(gc, &dss->dss2);
         if (rc) goto out;
 
         libxl__domain_save_device_model(egc, dss, domain_suspend_done);
@@ -1756,7 +1780,7 @@ void libxl__domain_save_device_model(libxl__egc *egc,
     dss->save_dm_callback = callback;
 
     /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
+    const char *const filename = dss->dss2.dm_savefile;
     const int fd = dss->fd;
 
     libxl__datacopier_state *dc = &dss->save_dm_datacopier;
@@ -1812,7 +1836,7 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     STATE_AO_GC(dss->ao);
 
     /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
+    const char *const filename = dss->dss2.dm_savefile;
     int our_rc = 0;
     int rc;
 
@@ -1842,12 +1866,13 @@ static void domain_suspend_done(libxl__egc *egc,
 
     /* Convenience aliases */
     const uint32_t domid = dss->domid;
+    libxl__domain_suspend_state2 *const dss2 = &dss->dss2;
 
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
+    libxl__ev_evtchn_cancel(gc, &dss2->guest_evtchn);
 
-    if (dss->guest_evtchn.port > 0)
+    if (dss2->guest_evtchn.port > 0)
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
-                           dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
+                           dss2->guest_evtchn.port, &dss2->guest_evtchn_lockfd);
 
     if (dss->remus) {
         /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bb2aaed..881f3b9 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2656,6 +2656,7 @@ _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 /*----- Domain suspend (save) state structure -----*/
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
+typedef struct libxl__domain_suspend_state2 libxl__domain_suspend_state2;
 
 typedef void libxl__domain_suspend_cb(libxl__egc*,
                                       libxl__domain_suspend_state*, int rc);
@@ -2670,6 +2671,29 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+/*
+ * libxl__domain_suspend_state is for saving guest, not
+ * for suspending guest. We need to an independent API
+ * to suspend guest only.
+ */
+struct libxl__domain_suspend_state2 {
+    /* set by caller of libxl__domain_suspend2 */
+    libxl__ao *ao;
+
+    uint32_t domid;
+    libxl__ev_evtchn guest_evtchn;;
+    int guest_evtchn_lockfd;
+    int hvm;
+    const char *dm_savefile;
+    void (*callback_common_done)(libxl__egc*,
+                                 libxl__domain_suspend_state2*, int ok);
+    int save_dm;
+    int guest_responded;
+    libxl__xswait_state pvcontrol;
+    libxl__ev_xswatch guest_watch;
+    libxl__ev_time guest_timeout;
+};
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2682,15 +2706,9 @@ struct libxl__domain_suspend_state {
     int debug;
     const libxl_domain_remus_info *remus;
     /* private */
-    libxl__ev_evtchn guest_evtchn;
-    int guest_evtchn_lockfd;
+    libxl__domain_suspend_state2 dss2;
     int hvm;
     int xcflags;
-    int guest_responded;
-    libxl__xswait_state pvcontrol;
-    libxl__ev_xswatch guest_watch;
-    libxl__ev_time guest_timeout;
-    const char *dm_savefile;
     /* for Remus */
     struct {
         libxl__checkpoint_device_state cds;
@@ -2702,8 +2720,6 @@ struct libxl__domain_suspend_state {
     };
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
-    void (*callback_common_done)(libxl__egc*,
-                                 struct libxl__domain_suspend_state*, int ok);
     /* private for libxl__domain_save_device_model */
     libxl__save_device_model_cb *save_dm_callback;
     libxl__datacopier_state save_dm_datacopier;
@@ -2975,6 +2991,9 @@ struct libxl__domain_create_state {
 
 /*----- Domain suspend (save) functions -----*/
 
+/* calls dss2->callback_common_done when done */
+_hidden void libxl__domain_suspend2(libxl__egc *egc,
+                                    libxl__domain_suspend_state2 *dss2);
 /* calls dss->callback when done */
 _hidden void libxl__domain_suspend(libxl__egc *egc,
                                    libxl__domain_suspend_state *dss);
@@ -3014,7 +3033,7 @@ _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
 
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
-                                           libxl__domain_suspend_state *dss);
+                                           libxl__domain_suspend_state2 *dss2);
 _hidden void libxl__domain_save_device_model(libxl__egc *egc,
                                      libxl__domain_suspend_state *dss,
                                      libxl__save_device_model_cb *callback);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 08/25] Update libxl__domain_resume() for colo
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (6 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 07/25] Refactor domain_suspend_callback_common() Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 09/25] Update libxl__domain_suspend_common_switch_qemu_logdirty() " Wen Congyang
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. suspend both primay vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
We will send qemu's state each time in step2, and
slave's qemu should read it each time before resuming
secondary vm. libxl__domain_resume() doesn't
read qemu's state. Add a new parameter to
control whether we need to read qemu's state
before resuming.

Note: we should update qemu to support it.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |  7 ++++---
 tools/libxl/libxl_dom.c      | 24 +++++++++++++++++++++---
 tools/libxl/libxl_internal.h |  8 ++++++--
 tools/libxl/libxl_qmp.c      | 10 ++++++++++
 4 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index fc60bb1..7ff1cb6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -466,7 +466,8 @@ int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid,
     return rc;
 }
 
-int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
+int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
+                         int suspend_cancel, int read_savefile)
 {
     int rc = 0;
 
@@ -483,7 +484,7 @@ int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        rc = libxl__domain_resume_device_model(gc, domid);
+        rc = libxl__domain_resume_device_model(gc, domid, read_savefile);
         if (rc) {
             LOG(ERROR, "failed to resume device model for domain %u:%d",
                 domid, rc);
@@ -503,7 +504,7 @@ int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel,
                         const libxl_asyncop_how *ao_how)
 {
     AO_CREATE(ctx, domid, ao_how);
-    int rc = libxl__domain_resume(gc, domid, suspend_cancel);
+    int rc = libxl__domain_resume(gc, domid, suspend_cancel, 0);
     libxl__ao_complete(egc, ao, rc);
     return AO_INPROGRESS;
 }
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 035d25a..206c1dd 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1046,16 +1046,34 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
     return ret;
 }
 
-int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
+int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid,
+                                      int read_savefile)
 {
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
-        libxl__qemu_traditional_cmd(gc, domid, "continue");
+        if (read_savefile)
+            libxl__qemu_traditional_cmd(gc, domid, "resume");
+        else
+            libxl__qemu_traditional_cmd(gc, domid, "continue");
         libxl__wait_for_device_model_deprecated(gc, domid, "running", NULL, NULL, NULL);
         break;
     }
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        if (read_savefile) {
+            char *state_file;
+            int rc;
+
+            state_file = libxl__sprintf(NOGC,
+                                        XC_DEVICE_MODEL_RESTORE_FILE".%d",
+                                        domid);
+            /* This command only restores the device state */
+            rc = libxl__qmp_restore(gc, domid, state_file);
+            free(state_file);
+            if (rc)
+                return ERROR_FAIL;
+        }
+
         if (libxl__qmp_resume(gc, domid))
             return ERROR_FAIL;
         break;
@@ -1540,7 +1558,7 @@ static void remus_device_preresume_cb(libxl__egc *egc,
 
     if (!rc) {
         /* Resumes the domain and the device model */
-        if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+        if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1, 0))
             ok = 1;
     }
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 881f3b9..35227d3 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -994,12 +994,14 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
 
 _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
                                      uint32_t size, void *data);
-_hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
+_hidden int libxl__domain_resume_device_model(libxl__gc *gc,
+                                              uint32_t domid,
+                                              int read_savefile);
 
 _hidden void libxl__userdata_destroyall(libxl__gc *gc, uint32_t domid);
 
 _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
-                                 int suspend_cancel);
+                                 int suspend_cancel, int read_savefile);
 
 /* returns 0 or 1, or a libxl error code */
 _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
@@ -1585,6 +1587,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
 _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
 /* Save current QEMU state into fd. */
 _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
+/* Load current QEMU state from fd. */
+_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
 /* Set dirty bitmap logging status */
 _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 5cc56b1..60a119d 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -876,6 +876,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
                            NULL, NULL);
 }
 
+int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "filename", (char *)state_file);
+
+    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
+                           NULL, NULL);
+}
+
 static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
                       char *device, char *target, char *arg)
 {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 09/25] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (7 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 08/25] Update libxl__domain_resume() for colo Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 10/25] Introduce a new internal API libxl__domain_unpause() Wen Congyang
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Secondary vm is running in colo mode. So we need to
send secondary vm's dirty page information to master.
libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses domain_suspend_state, and calls
libxl__xc_domain_saverestore_async_callback_done()
before exits.

Introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds->callback before exits.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom.c      | 79 +++++++++++++++++++++++++++-----------------
 tools/libxl/libxl_internal.h | 12 +++++--
 2 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 206c1dd..32db79f 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -814,7 +814,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss, int ok);
+                                 libxl__logdirty_switch *lds, int ok);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -825,12 +825,10 @@ static void logdirty_init(libxl__logdirty_switch *lds)
 
 static void domain_suspend_switch_qemu_xen_traditional_logdirty
                                (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                                libxl__logdirty_switch *lds,
+                                libxl__egc *egc)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
     xs_transaction_t t = 0;
     const char *got;
@@ -891,64 +889,85 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
     LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
     libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,lds,-1);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
                                (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                                libxl__logdirty_switch *lds,
+                                libxl__egc *egc)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
 
     rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
     if (!rc) {
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        lds->callback(egc, lds, 0);
     } else {
         LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, -1);
     }
 }
 
+static void libxl__domain_suspend_switch_qemu_logdirty_done
+                                (libxl__egc *egc,
+                                 libxl__logdirty_switch *lds,
+                                 int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(lds, *dss, logdirty);
+
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, rc);
+}
+
 void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned enable, void *user)
 {
     libxl__save_helper_state *shs = user;
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
+
+    /* convenience aliases */
+    libxl__logdirty_switch *const lds = &dss->logdirty;
+
+    lds->callback = libxl__domain_suspend_switch_qemu_logdirty_done;
+
+    libxl__domain_common_switch_qemu_logdirty(domid, enable, lds, egc);
+}
+
+void libxl__domain_common_switch_qemu_logdirty(int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds,
+                                               libxl__egc *egc)
+{
+    STATE_AO_GC(lds->ao);
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable,
+                                                            lds, egc);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_logdirty(domid, enable, lds, egc);
         break;
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, aborting suspend");
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, -1);
     }
 }
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
                                     const struct timeval *requested_abs)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(ev, *lds, timeout);
+    STATE_AO_GC(lds->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,lds,-1);
 }
 
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
                             const char *watch_path, const char *event_path)
 {
-    libxl__domain_suspend_state *dss =
-        CONTAINER_OF(watch, *dss, logdirty.watch);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(watch, *lds, watch);
+    STATE_AO_GC(lds->ao);
     const char *got;
     xs_transaction_t t = 0;
     int rc;
@@ -992,24 +1011,23 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
     libxl__xs_transaction_abort(gc, &t);
 
     if (!rc) {
-        switch_logdirty_done(egc,dss,0);
+        switch_logdirty_done(egc,lds,0);
     } else if (rc < 0) {
         LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,-1);
+        switch_logdirty_done(egc,lds,-1);
     }
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss,
+                                 libxl__logdirty_switch *lds,
                                  int broke)
 {
-    STATE_AO_GC(dss->ao);
-    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(lds->ao);
 
     libxl__ev_xswatch_deregister(gc, &lds->watch);
     libxl__ev_time_deregister(gc, &lds->timeout);
 
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, broke);
+    lds->callback(egc, lds, broke);
 }
 
 /*----- callbacks, called by xc_domain_save -----*/
@@ -1676,6 +1694,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     libxl__domain_suspend_state2 *dss2 = &dss->dss2;
 
     logdirty_init(&dss->logdirty);
+    dss->logdirty.ao = ao;
     libxl__xswait_init(&dss2->pvcontrol);
     libxl__ev_evtchn_init(&dss2->guest_evtchn);
     libxl__ev_xswatch_init(&dss2->guest_watch);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 35227d3..2dd157c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2667,13 +2667,18 @@ typedef void libxl__domain_suspend_cb(libxl__egc*,
 typedef void libxl__save_device_model_cb(libxl__egc*,
                                          libxl__domain_suspend_state*, int rc);
 
-typedef struct libxl__logdirty_switch {
+typedef struct libxl__logdirty_switch libxl__logdirty_switch;
+struct libxl__logdirty_switch {
+    /* set by caller of libxl__domain_common_switch_qemu_logdirty */
+    libxl__ao *ao;
+    void (*callback)(libxl__egc *egc, libxl__logdirty_switch *lds, int rc);
+
     const char *cmd;
     const char *cmd_path;
     const char *ret_path;
     libxl__ev_xswatch watch;
     libxl__ev_time timeout;
-} libxl__logdirty_switch;
+};
 
 /*
  * libxl__domain_suspend_state is for saving guest, not
@@ -3021,6 +3026,9 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
+_hidden void libxl__domain_common_switch_qemu_logdirty
+                                (int domid, unsigned int enable,
+                                 libxl__logdirty_switch *lds, libxl__egc *egc);
 _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         uint32_t *len, void *data);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 10/25] Introduce a new internal API libxl__domain_unpause()
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (8 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 09/25] Update libxl__domain_suspend_common_switch_qemu_logdirty() " Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 11/25] Update libxl__domain_unpause() to support qemu-xen Wen Congyang
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

The guest is paused after libxl_domain_create_restore().
Secondary vm is running in colo mode. So we need to unpause
the guest. The current API libxl_domain_unpause() is
not an internal API. Introduce a new API to support it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          | 21 +++++++++++++++------
 tools/libxl/libxl_internal.h |  1 +
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 7ff1cb6..86958fe 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -975,9 +975,8 @@ out:
     return AO_INPROGRESS;
 }
 
-int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
+int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
 {
-    GC_INIT(ctx);
     char *path;
     char *state;
     int ret, rc = 0;
@@ -997,12 +996,22 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
                                          NULL, NULL, NULL);
         }
     }
-    ret = xc_domain_unpause(ctx->xch, domid);
-    if (ret<0) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
+
+    ret = xc_domain_unpause(CTX->xch, domid);
+    if (ret < 0) {
+        LOGE(ERROR, "unpausing domain %d", domid);
         rc = ERROR_FAIL;
     }
- out:
+
+out:
+    return rc;
+}
+
+int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
+{
+    GC_INIT(ctx);
+    int rc = libxl__domain_unpause(gc, domid);
+
     GC_FREE;
     return rc;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2dd157c..bfc9513 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1002,6 +1002,7 @@ _hidden void libxl__userdata_destroyall(libxl__gc *gc, uint32_t domid);
 
 _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
                                  int suspend_cancel, int read_savefile);
+_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
 
 /* returns 0 or 1, or a libxl error code */
 _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 11/25] Update libxl__domain_unpause() to support qemu-xen
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (9 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 10/25] Introduce a new internal API libxl__domain_unpause() Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 12/25] support to resume uncooperative HVM guests Wen Congyang
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Currently, libxl__domain_unpause() only supports
qemu-xen-traditional. Update it to support qemu-xen.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          | 13 +++++--------
 tools/libxl/libxl_dom.c      | 25 +++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |  2 ++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 86958fe..3fd31ed 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -977,8 +977,6 @@ out:
 
 int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
 {
-    char *path;
-    char *state;
     int ret, rc = 0;
 
     libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -988,12 +986,11 @@ int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
-        state = libxl__xs_read(gc, XBT_NULL, path);
-        if (state != NULL && !strcmp(state, "paused")) {
-            libxl__qemu_traditional_cmd(gc, domid, "continue");
-            libxl__wait_for_device_model_deprecated(gc, domid, "running",
-                                         NULL, NULL, NULL);
+        rc = libxl__domain_unpause_device_model(gc, domid);
+        if (rc < 0) {
+            LOG(ERROR, "failed to unpause device model for domain %u:%d",
+                domid, rc);
+            goto out;
         }
     }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 32db79f..d4bf8b3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1942,6 +1942,31 @@ static void libxl__remus_teardown_done(libxl__egc *egc,
     dss->callback(egc, dss, rc);
 }
 
+int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
+{
+    char *path;
+    char *state;
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
+        state = libxl__xs_read(gc, XBT_NULL, path);
+        if (state != NULL && !strcmp(state, "paused")) {
+            libxl__qemu_traditional_cmd(gc, domid, "continue");
+            libxl__wait_for_device_model_deprecated(gc, domid, "running",
+                                         NULL, NULL, NULL);
+        }
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        if (libxl__qmp_resume(gc, domid))
+            return ERROR_FAIL;
+        break;
+    default:
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 /*==================== Miscellaneous ====================*/
 
 char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bfc9513..67c73a2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -997,6 +997,8 @@ _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc,
                                               uint32_t domid,
                                               int read_savefile);
+_hidden int libxl__domain_unpause_device_model(libxl__gc *gc,
+                                               uint32_t domid);
 
 _hidden void libxl__userdata_destroyall(libxl__gc *gc, uint32_t domid);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 12/25] support to resume uncooperative HVM guests
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (10 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 11/25] Update libxl__domain_unpause() to support qemu-xen Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 13/25] update datecopier to support sending data only Wen Congyang
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

For PVHVM, the hypercall return code is 0, and it can be resumed
in a new domain context.

For HVM, do nothing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_resume.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e67bebd..b862ce3 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -109,6 +109,21 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
     return do_domctl(xch, &domctl);
 }
 
+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    /*
+     * If it is PVHVM, the hypercall return code is 0, and resume
+     * it in a new domain context.
+     *
+     * If it is a HVM, do nothing.
+     */
+    domctl.cmd = XEN_DOMCTL_resumedomain;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
+}
+
 static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
 {
     DECLARE_DOMCTL;
@@ -138,10 +153,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
      */
 #if defined(__i386__) || defined(__x86_64__)
     if ( info.hvm )
-    {
-        ERROR("Cannot resume uncooperative HVM guests");
-        return rc;
-    }
+        return xc_domain_resume_hvm(xch, domid);
 
     if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
     {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 13/25] update datecopier to support sending data only
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (11 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 12/25] support to resume uncooperative HVM guests Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:38 ` [RFC Patch 14/25] introduce a new API to aync read data from fd Wen Congyang
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

datacopier is to read some data and write it out. If we
have some data to send it over network, we cannot use
datacopier. Update it to support this case.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_aoutils.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index b10d2e1..3e0c0ae 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -309,9 +309,11 @@ int libxl__datacopier_start(libxl__datacopier_state *dc)
 
     libxl__datacopier_init(dc);
 
-    rc = libxl__ev_fd_register(gc, &dc->toread, datacopier_readable,
-                               dc->readfd, POLLIN);
-    if (rc) goto out;
+    if (dc->readfd >= 0) {
+        rc = libxl__ev_fd_register(gc, &dc->toread, datacopier_readable,
+                                   dc->readfd, POLLIN);
+        if (rc) goto out;
+    }
 
     rc = libxl__ev_fd_register(gc, &dc->towrite, datacopier_writable,
                                dc->writefd, POLLOUT);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 14/25] introduce a new API to aync read data from fd
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (12 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 13/25] update datecopier to support sending data only Wen Congyang
@ 2014-07-18 11:38 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 15/25] Update libxl_save_msgs_gen.pl to support return data from xl to xc Wen Congyang
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:38 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

In colo mode, we will read some data from an fd.
Introduce a new API to avoid redundant codes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_aoutils.c  | 73 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h | 30 ++++++++++++++++++
 2 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 3e0c0ae..2d36403 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -542,3 +542,76 @@ bool libxl__async_exec_inuse(const libxl__async_exec_state *aes)
     assert(time_inuse == child_inuse);
     return child_inuse;
 }
+
+
+/*----- data reader -----*/
+
+static void libxl__datareader_init(libxl__datareader_state *drs)
+{
+    assert(drs->ao);
+    libxl__ev_fd_init(&drs->toread);
+    drs->used = 0;
+}
+
+static void libxl__datareader_kill(libxl__datareader_state *drs)
+{
+    STATE_AO_GC(drs->ao);
+
+    libxl__ev_fd_deregister(gc, &drs->toread);
+}
+
+static void datareader_callback(libxl__egc *egc, libxl__datareader_state *drs,
+                                ssize_t size, int errnoval)
+{
+    libxl__datareader_kill(drs);
+    drs->callback(egc, drs, size, errnoval);
+}
+
+static void datareader_readable(libxl__egc *egc, libxl__ev_fd *ev,
+                                int fd, short events, short revents)
+{
+    libxl__datareader_state *drs = CONTAINER_OF(ev, *drs, toread);
+    STATE_AO_GC(drs->ao);
+    int r;
+
+    if (revents & ~POLLIN) {
+        LOG(ERROR, "unexpected poll event 0x%x (should be POLLIN) on %s",
+            revents, drs->readwhat);
+        datareader_callback(egc, drs, -1, 0);
+        return;
+    }
+
+    assert(revents & POLLIN);
+    while (1) {
+        r = read(ev->fd, drs->buf + drs->used, drs->readsize - drs->used);
+        if (r < 0) {
+            if (errno == EINTR)
+                continue;
+            if (errno == EWOULDBLOCK)
+                break;
+            LOGE(ERROR, "error reading %s",
+                 drs->readwhat);
+            datareader_callback(egc, drs, 0, errno);
+            return;
+        }
+        if (r == 0) {
+            datareader_callback(egc, drs, drs->used, 0);
+            break;
+        }
+
+        drs->used += r;
+    }
+}
+
+int libxl__datareader_start(libxl__datareader_state *drs)
+{
+    int rc;
+    STATE_AO_GC(drs->ao);
+
+    libxl__datareader_init(drs);
+
+    rc = libxl__ev_fd_register(gc, &drs->toread, datareader_readable,
+                               drs->readfd, POLLIN);
+
+    return rc;
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 67c73a2..5914953 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2097,6 +2097,36 @@ void libxl__async_exec_init(libxl__async_exec_state *aes);
 int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes);
 bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);
 
+/*----- datareader: read data from one fd to buffer -----*/
+
+typedef struct libxl__datareader_state libxl__datareader_state;
+
+/*
+ * real_size>=1 means all data was read
+ * real_size==0 means failure happened when reading, errnoval is valid, logged
+ * real_size==-1 means some other internal failure, errnoval not valid, logged
+ * In all cases reader is killed before calling this callback
+ */
+typedef void libxl__datareader_callback(libxl__egc *egc,
+     libxl__datareader_state *drs, ssize_t real_size, int errnoval);
+
+struct libxl__datareader_state {
+    /* caller must fill these in, and they must all remain valid */
+    libxl__ao *ao;
+    int readfd;
+    ssize_t readsize;
+    /* for error msgs */
+    const char *readwhat;
+    libxl__datareader_callback *callback;
+    /* It must contain enough space to store readsize bytes */
+    void *buf;
+    /* remaining fields are private to datareader */
+    libxl__ev_fd toread;
+    ssize_t used;
+};
+
+_hidden int libxl__datareader_start(libxl__datareader_state *drs);
+
 /*----- device addition/removal -----*/
 
 typedef struct libxl__ao_device libxl__ao_device;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 15/25] Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (13 preceding siblings ...)
  2014-07-18 11:38 ` [RFC Patch 14/25] introduce a new API to aync read data from fd Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 16/25] Allow slave sends data to master Wen Congyang
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

 Currently, all callbacks return an integer value or void. We cannot
 return some data to xc via callback. Update libxl_save_msgs_gen.pl
 to support this case.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h       |  3 ++
 tools/libxl/libxl_save_callout.c   | 31 ++++++++++++++++++
 tools/libxl/libxl_save_helper.c    | 17 ++++++++++
 tools/libxl/libxl_save_msgs_gen.pl | 65 ++++++++++++++++++++++++++++++++++----
 4 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5914953..9ff93e6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3055,6 +3055,9 @@ _hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
  * When they are ready to indicate completion, they call this. */
 void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
                            libxl__save_helper_state *shs, int return_value);
+void libxl__xc_domain_saverestore_async_callback_done_with_data(libxl__egc *egc,
+                           libxl__save_helper_state *shs,
+                           const void *data, uint64_t size);
 
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 1c9f806..0c09d94 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -145,6 +145,15 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
     shs->egc = 0;
 }
 
+void libxl__xc_domain_saverestore_async_callback_done_with_data(libxl__egc *egc,
+                           libxl__save_helper_state *shs,
+                           const void *data, uint64_t size)
+{
+    shs->egc = egc;
+    libxl__srm_callout_sendreply_data(data, size, shs);
+    shs->egc = 0;
+}
+
 /*----- helper execution -----*/
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
@@ -370,6 +379,28 @@ void libxl__srm_callout_sendreply(int r, void *user)
         helper_failed(egc, shs, ERROR_FAIL);
 }
 
+void libxl__srm_callout_sendreply_data(const void *data, uint64_t size, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(shs->ao);
+    int errnoval;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   &size, sizeof(size), shs->stdin_what,
+                                   "callback return data length");
+    if (errnoval)
+        goto out;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   data, size, shs->stdin_what,
+                                   "callback return data");
+
+out:
+    if (errnoval)
+        helper_failed(egc, shs, ERROR_FAIL);
+}
+
 void libxl__srm_callout_callback_log(uint32_t level, uint32_t errnoval,
                   const char *context, const char *formatted, void *user)
 {
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 74826a1..44c5807 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -155,6 +155,23 @@ int helper_getreply(void *user)
     return v;
 }
 
+uint8_t *helper_getreply_data(void *user)
+{
+    uint64_t size;
+    int r = read_exactly(0, &size, sizeof(size));
+    uint8_t *data;
+
+    if (r <= 0)
+        exit(-2);
+
+    data = helper_allocbuf(size, user);
+    r = read_exactly(0, data, size);
+    if (r <= 0)
+        exit(-2);
+
+    return data;
+}
+
 /*----- other callbacks -----*/
 
 static int toolstack_save_fd;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 6b4b65e..41ee000 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -15,6 +15,7 @@ our @msgs = (
     #         and its null-ness needs to be passed through to the helper's xc
     #   W  - needs a return value; callback is synchronous
     #   A  - needs a return value; callback is asynchronous
+    #   B  - return value is an pointer
     [  1, 'sr',     "log",                   [qw(uint32_t level
                                                  uint32_t errnoval
                                                  STRING context
@@ -99,23 +100,28 @@ our $libxl = "libxl__srm";
 our $callback = "${libxl}_callout_callback";
 our $receiveds = "${libxl}_callout_received";
 our $sendreply = "${libxl}_callout_sendreply";
+our $sendreply_data = "${libxl}_callout_sendreply_data";
 our $getcallbacks = "${libxl}_callout_get_callbacks";
 our $enumcallbacks = "${libxl}_callout_enumcallbacks";
 sub cbtype ($) { "${libxl}_".$_[0]."_autogen_callbacks"; };
 
 f_decl($sendreply, 'callout', 'void', "(int r, void *user)");
+f_decl($sendreply_data, 'callout', 'void',
+       "(const void *data, uint64_t size, void *user)");
 
 our $helper = "helper";
 our $encode = "${helper}_stub";
 our $allocbuf = "${helper}_allocbuf";
 our $transmit = "${helper}_transmitmsg";
 our $getreply = "${helper}_getreply";
+our $getreply_data = "${helper}_getreply_data";
 our $setcallbacks = "${helper}_setcallbacks";
 
 f_decl($allocbuf, 'helper', 'unsigned char *', '(int len, void *user)');
 f_decl($transmit, 'helper', 'void',
        '(unsigned char *msg_freed, int len, void *user)');
 f_decl($getreply, 'helper', 'int', '(void *user)');
+f_decl($getreply_data, 'helper', 'uint8_t *', '(void *user)');
 
 sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
 
@@ -259,12 +265,36 @@ foreach my $msginfo (@msgs) {
 
     $f_more_sr->("    case $msgnum: { /* $name */\n");
     if ($flags =~ m/W/) {
-        $f_more_sr->("        int r;\n");
+        if ($flags =~ m/B/) {
+            $f_more_sr->("        uint8_t *data;\n".
+                         "        uint64_t size;\n");
+        } else {
+            $f_more_sr->("        int r;\n");
+        }
     }
 
-    my $c_rtype_helper = $flags =~ m/[WA]/ ? 'int' : 'void';
-    my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_rtype_helper;
+    if ($flags =~ m/[WA]/) {
+        if ($flags =~ m/B/) {
+            $c_rtype_helper = 'uint8_t *'
+        } else {
+            $c_rtype_helper = 'int'
+        }
+    } else {
+        $c_rtype_helper = 'void';
+    }
+    my $c_rtype_callout;
+    if ($flags =~ m/W/) {
+        if ($flags =~ m/B/) {
+            $c_rtype_callout = 'uint8_t *';
+        } else {
+            $c_rtype_callout = 'int';
+        }
+    } else {
+        $c_rtype_callout = 'void';
+    }
     my $c_decl = '(';
+    my $c_helper_decl = '';
     my $c_callback_args = '';
 
     f_more("${encode}_$name",
@@ -305,7 +335,15 @@ END_ALWAYS
         f_more("${encode}_$name", "	${typeid}_put(buf, &len, $c_args);\n");
     }
     $f_more_sr->($c_recv);
+    $c_helper_decl = $c_decl;
+    if ($flags =~ m/W/ and $flags =~ m/B/) {
+        $c_decl .= "uint64_t *size, "
+    }
     $c_decl .= "void *user)";
+    $c_helper_decl .= "void *user)";
+    if ($flags =~ m/W/ and $flags =~ m/B/) {
+        $c_callback_args .= "&size, "
+    }
     $c_callback_args .= "user";
 
     $f_more_sr->("        if (msg != endmsg) return 0;\n");
@@ -326,10 +364,12 @@ END_ALWAYS
     my $c_make_callback = "$c_callback($c_callback_args)";
     if ($flags !~ m/W/) {
 	$f_more_sr->("        $c_make_callback;\n");
+    } elsif ($flags =~ m/B/) {
+        $f_more_sr->("        data = $c_make_callback;\n".
+                     "        $sendreply_data(data, size, user);\n");
     } else {
         $f_more_sr->("        r = $c_make_callback;\n".
                      "        $sendreply(r, user);\n");
-	f_decl($sendreply, 'callout', 'void', '(int r, void *user)');
     }
     if ($flags =~ m/x/) {
         my $c_v = "(1u<<$msgnum)";
@@ -340,7 +380,7 @@ END_ALWAYS
     }
     $f_more_sr->("        return 1;\n    }\n\n");
     f_decl("${callback}_$name", 'callout', $c_rtype_callout, $c_decl);
-    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_decl);
+    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_helper_decl);
     f_more("${encode}_$name",
 "        if (buf) break;
         buf = ${helper}_allocbuf(len, user);
@@ -352,12 +392,23 @@ END_ALWAYS
     ${transmit}(buf, len, user);
 ");
     if ($flags =~ m/[WA]/) {
-	f_more("${encode}_$name",
-               (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
+        if ($flags =~ m/B/) {
+            f_more("${encode}_$name",
+                   (<<END_ALWAYS.($debug ? <<END_DEBUG : '')));
+    uint8_t *r = ${helper}_getreply_data(user);
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: $name got reply data\\n");
+END_DEBUG
+        } else {
+            f_more("${encode}_$name",
+                   (<<END_ALWAYS.($debug ? <<END_DEBUG : '')));
     int r = ${helper}_getreply(user);
 END_ALWAYS
     fprintf(stderr,"libxl-save-helper: $name got reply %d\\n",r);
 END_DEBUG
+    }
+
+    f_more("${encode}_$name", (<<END_ALWAYS));
     return r;
 END_ALWAYS
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 16/25] Allow slave sends data to master
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (14 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 15/25] Update libxl_save_msgs_gen.pl to support return data from xl to xc Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 17/25] secondary vm suspend/resume/checkpoint code Wen Congyang
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

In colo mode, slave needs to send data to master, but the io_fd
only can be written in master, and only can be read in slave.
Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |  2 +-
 tools/libxl/libxl.h          |  3 ++-
 tools/libxl/libxl_create.c   | 14 ++++++++++----
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_types.idl  |  7 +++++++
 tools/libxl/xl_cmdimpl.c     |  7 +++++++
 6 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3fd31ed..6e6781e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -815,7 +815,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->callback = remus_failover_cb;
     dss->domid = domid;
     dss->fd = send_fd;
-    /* TODO do something with recv_fd */
+    dss->recv_fd = recv_fd;
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 81905b3..8274b9d 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -813,7 +813,8 @@ int static inline libxl_domain_create_restore_0x040200(
     LIBXL_EXTERNAL_CALLERS_ONLY
 {
     libxl_domain_restore_params params;
-    params.checkpointed_stream = 0;
+    params.checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
+    params.send_fd = -1;
 
     return libxl_domain_create_restore(
         ctx, d_config, domid, restore_fd, &params, ao_how, aop_console_how);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 0686f96..c277fd4 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1393,8 +1393,8 @@ static void domain_create_cb(libxl__egc *egc,
                              int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-                            uint32_t *domid,
-                            int restore_fd, int checkpointed_stream,
+                            uint32_t *domid, int restore_fd,
+                            int send_fd, int checkpointed_stream,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1405,6 +1405,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     cdcs->dcs.ao = ao;
     cdcs->dcs.guest_config = d_config;
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.send_fd = send_fd;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
@@ -1433,7 +1434,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, -1, 0,
                             ao_how, aop_console_how);
 }
 
@@ -1443,7 +1444,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, restore_fd,
+    int send_fd = -1;
+
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
+        send_fd = params->send_fd;
+
+    return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
                             params->checkpointed_stream, ao_how, aop_console_how);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9ff93e6..92e0801 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2743,6 +2743,7 @@ struct libxl__domain_suspend_state {
 
     uint32_t domid;
     int fd;
+    int recv_fd;
     libxl_domain_type type;
     int live;
     int debug;
@@ -3015,6 +3016,7 @@ struct libxl__domain_create_state {
     libxl__ao *ao;
     libxl_domain_config *guest_config;
     int restore_fd;
+    int send_fd;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index dc9f78e..1e1a62e 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -177,6 +177,12 @@ libxl_vendor_device = Enumeration("vendor_device", [
     (0, "NONE"),
     (1, "XENSERVER"),
     ])
+
+libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
+    (0, "NONE"),
+    (1, "REMUS"),
+    (2, "COLO"),
+    ], init_val = 0)
 #
 # Complex libxl types
 #
@@ -303,6 +309,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
+    ("send_fd", integer),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 167b65b..22b7964 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -151,6 +151,7 @@ struct domain_create {
     const char *extra_config; /* extra config string */
     const char *restore_file;
     int migrate_fd; /* -1 means none */
+    int send_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
 };
 
@@ -1991,6 +1992,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
     void *config_data = 0;
     int config_len = 0;
     int restore_fd = -1;
+    int send_fd = -1;
     const libxl_asyncprogress_how *autoconnect_console_how;
     struct save_file_header hdr;
 
@@ -2007,6 +2009,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
         if (migrate_fd >= 0) {
             restore_source = "<incoming migration stream>";
             restore_fd = migrate_fd;
+            send_fd = dom_info->send_fd;
         } else {
             restore_source = restore_file;
             restore_fd = open(restore_file, O_RDONLY);
@@ -2168,6 +2171,7 @@ start:
     if ( restoring ) {
         libxl_domain_restore_params params;
         params.checkpointed_stream = dom_info->checkpointed_stream;
+        params.send_fd = send_fd;
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
                                           &params,
@@ -3703,6 +3707,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.monitor = monitor;
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
+    dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = remus;
 
@@ -3873,6 +3878,7 @@ int main_restore(int argc, char **argv)
     dom_info.config_file = config_file;
     dom_info.restore_file = checkpoint_file;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
@@ -4312,6 +4318,7 @@ int main_create(int argc, char **argv)
     dom_info.config_file = filename;
     dom_info.extra_config = extra_config;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 17/25] secondary vm suspend/resume/checkpoint code
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (15 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 16/25] Allow slave sends data to master Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 18/25] primary vm suspend/get_dirty_pfn/resume/checkpoint code Wen Congyang
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send LIBXL_COLO_SVM_READY to master
   b. If it is resumed the first time, call libxl__xc_domain_restore_done()
      to build the secondary vm. We should also enable secondary vm's logdirty.
      Otherwise, call libxl__domain_resume() to resume secondary vm.
   c. Send LIBXL_COLO_SVM_RESUMED to master
2. Wait a new checkpoint
   a. Read LIBXL_COLO_NEW_CHECKPOINT from master
3. Suspend secondary vm
   a. Suspend secondary vm
   b. Get secondary vm's dirty page information
   c. Send LIBXL_COLO_SVM_SUSPENDED to master
   d. Send secondary vm's dirty page information to master(count + pfn list)

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xenguest.h             |  20 +
 tools/libxl/Makefile               |   1 +
 tools/libxl/libxl_colo.h           |  38 ++
 tools/libxl/libxl_colo_restore.c   | 883 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_create.c         | 116 ++++-
 tools/libxl/libxl_dom.c            |   2 +-
 tools/libxl/libxl_internal.h       |  22 +
 tools/libxl/libxl_save_callout.c   |   6 +-
 tools/libxl/libxl_save_msgs_gen.pl |   6 +-
 9 files changed, 1087 insertions(+), 7 deletions(-)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_restore.c

diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 40bbac8..d3061c7 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -91,6 +91,26 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
+    /* Called after a new checkpoint to suspend the guest.
+     */
+    int (*suspend)(void* data);
+
+    /* Called after the secondary vm is ready to resume.
+     * Callback function resumes the guest & the device model,
+     *  returns to xc_domain_restore.
+     */
+    int (*postcopy)(void* data);
+
+    /* callback to wait a new checkpoint
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint */
+    int (*checkpoint)(void* data);
+
+    /* Enable qemu-dm logging dirty pages to xen */
+    int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */
+
     /* callback to restore toolstack specific data */
     int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
             uint32_t size, void* data);
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index a33497d..9642500 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -57,6 +57,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_checkpoint_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_colo_restore.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
new file mode 100644
index 0000000..91df275
--- /dev/null
+++ b/tools/libxl/libxl_colo.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_COLO_H
+#define LIBXL_COLO_H
+
+/*
+ * values to control suspend/resume primary vm and secondary vm
+ * at the same time
+ */
+enum {
+    LIBXL_COLO_NEW_CHECKPOINT = 1,
+    LIBXL_COLO_SVM_SUSPENDED,
+    LIBXL_COLO_SVM_READY,
+    LIBXL_COLO_SVM_RESUMED,
+};
+
+extern void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+                                     int ret, int retval, int errnoval);
+extern void libxl__colo_restore_setup(libxl__egc *egc,
+                                      libxl__colo_restore_state *crs);
+extern void libxl__colo_restore_teardown(libxl__egc *egc,
+                                         libxl__colo_restore_state *crs,
+                                         int rc);
+
+#endif
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
new file mode 100644
index 0000000..ebbd6b9
--- /dev/null
+++ b/tools/libxl/libxl_colo_restore.c
@@ -0,0 +1,883 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+#include "xg_private.h"
+#include "xc_bitops.h"
+
+enum {
+    LIBXL_COLO_SETUPED,
+    LIBXL_COLO_SUSPENDED,
+    LIBXL_COLO_RESUMED,
+};
+
+typedef struct libxl__colo_restore_checkpoint_state libxl__colo_restore_checkpoint_state;
+struct libxl__colo_restore_checkpoint_state {
+    xc_hypercall_buffer_t _dirty_bitmap;
+    xc_hypercall_buffer_t *dirty_bitmap;
+    unsigned long p2m_size;
+    libxl__domain_suspend_state2 dss2;
+    /* for sending data to master */
+    libxl__datacopier_state dc;
+    /* for reading data from master */
+    libxl__datareader_state drs;
+    uint8_t section;
+    libxl__logdirty_switch lds;
+    libxl__colo_restore_state *crs;
+    int status;
+
+    void (*callback)(libxl__egc *,
+                     libxl__colo_restore_checkpoint_state *,
+                     int);
+
+    /*
+     * 0: secondary vm's dirty bitmap for domain @domid
+     * 1: secondary vm is ready(domain @domid)
+     * 2: secondary vm is resumed(domain @domid)
+     */
+    const char *copywhat[3];
+};
+
+
+static void libxl__colo_restore_domain_resume_callback(void *data);
+static void libxl__colo_restore_domain_checkpoint_callback(void *data);
+static void libxl__colo_restore_domain_suspend_callback(void *data);
+
+/* ===================== colo: common functions ===================== */
+static void colo_enable_logdirty(libxl__colo_restore_state *crs, libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid,
+                          XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                          NULL, 0, NULL, 0, NULL) < 0) {
+        LOG(ERROR, "cannot enable secondary vm's logdirty");
+        lds->callback(egc, lds, ERROR_FAIL);
+        return;
+    }
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(domid, 1, lds, egc);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_disable_logdirty(libxl__colo_restore_state *crs,
+                                  libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_OFF,
+                          NULL, 0, NULL, 0, NULL) < 0)
+        LOG(WARN, "cannot disable secondary vm's logdirty");
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(domid, 0, lds, egc);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_resume_vm(libxl__egc *egc,
+                          libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (!crs->saved_cb) {
+        /* TODO: sync mmu for hvm? */
+        rc = libxl__domain_resume(gc, crs->domid, 0, 1);
+        if (rc)
+            LOG(ERROR, "cannot resume secondary vm");
+
+        crcs->callback(egc, crcs, rc);
+        return;
+    }
+
+    /*
+     * TODO: get store mfn and console mfn
+     *  We should call the callback restore_results in
+     *  xc_domain_restore() before resuming the guest.
+     */
+    libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
+
+    return;
+}
+
+
+/* ================ colo: setup restore environment ================ */
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid);
+
+static int init_dss2(libxl__domain_suspend_state2 *dss2)
+{
+    int rc = ERROR_FAIL;
+    libxl_domain_type type;
+
+    STATE_AO_GC(dss2->ao);
+
+    type = libxl__domain_type(gc, dss2->domid);
+    if (type == LIBXL_DOMAIN_TYPE_INVALID)
+        goto out;
+
+    libxl__xswait_init(&dss2->pvcontrol);
+    libxl__ev_evtchn_init(&dss2->guest_evtchn);
+    libxl__ev_xswatch_init(&dss2->guest_watch);
+    libxl__ev_time_init(&dss2->guest_timeout);
+
+    if (type == LIBXL_DOMAIN_TYPE_HVM)
+        dss2->hvm = 1;
+    else
+        dss2->hvm = 0;
+
+    dss2->guest_evtchn.port = -1;
+    dss2->guest_evtchn_lockfd = -1;
+    dss2->guest_responded = 0;
+    dss2->dm_savefile = libxl__device_model_savefile(gc, dss2->domid);
+    dss2->save_dm = 0;
+
+    /* Secondary vm is not created, so we cannot get evtchn port */
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+void libxl__colo_restore_setup(libxl__egc *egc,
+                               libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs;
+    DECLARE_HYPERCALL_BUFFER(unsigned long, dirty_bitmap);
+    int rc = ERROR_FAIL;
+    int bsize;
+
+    /* Convenience aliases */
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->shs.callbacks.restore.a;
+    const int domid = crs->domid;
+
+    STATE_AO_GC(crs->ao);
+
+    GCNEW(crcs);
+    crs->crcs = crcs;
+    crcs->crs = crs;
+
+    crcs->p2m_size = xc_domain_maximum_gpfn(CTX->xch, domid) + 1;
+
+    crcs->copywhat[0] = GCSPRINTF("secondary vm's dirty bitmap for domain %"PRIu32,
+                                  domid);
+    crcs->copywhat[1] = GCSPRINTF("secondary vm is ready(domain %"PRIu32")",
+                                  domid);
+    crcs->copywhat[2] = GCSPRINTF("secondary vm is resumed(domain %"PRIu32")",
+                                  domid);
+
+    bsize = bitmap_size(crcs->p2m_size);
+    dirty_bitmap = xc_hypercall_buffer_alloc_pages(CTX->xch, dirty_bitmap,
+                                                   NRPAGES(bsize));
+    if (!dirty_bitmap) {
+        rc = ERROR_NOMEM;
+        goto err;
+    }
+    memset(dirty_bitmap, 0, bsize);
+    crcs->_dirty_bitmap = *HYPERCALL_BUFFER(dirty_bitmap);
+    crcs->dirty_bitmap = &crcs->_dirty_bitmap;
+
+    /* setup dss2 */
+    crcs->dss2.ao = ao;
+    crcs->dss2.domid = domid;
+    if (init_dss2(&crcs->dss2))
+        goto err_init_dss2;
+
+    callbacks->suspend = libxl__colo_restore_domain_suspend_callback;
+    callbacks->postcopy = libxl__colo_restore_domain_resume_callback;
+    callbacks->checkpoint = libxl__colo_restore_domain_checkpoint_callback;
+
+    /*
+     * Secondary vm is running in colo mode, so we need to call
+     * libxl__xc_domain_restore_done() to create secondary vm.
+     * But we will exit in domain_create_cb(). So replace the
+     * callback here.
+     */
+    crs->saved_cb = dcs->callback;
+    dcs->callback = libxl__colo_domain_create_cb;
+    crcs->status = LIBXL_COLO_SETUPED;
+
+    logdirty_init(&crcs->lds);
+    crcs->lds.ao = ao;
+
+    rc = 0;
+
+out:
+    crs->callback(egc, crs, rc);
+    return;
+
+err_init_dss2:
+    xc_hypercall_buffer_free_pages(CTX->xch, dirty_bitmap, NRPAGES(bsize));
+    crcs->dirty_bitmap = NULL;
+err:
+    goto out;
+}
+
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid)
+{
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    crcs->callback(egc, crcs, rc);
+}
+
+
+/* ================ colo: teardown restore environment ================ */
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc);
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc);
+
+static void do_failover(libxl__egc *egc, libxl__colo_restore_state *crs)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const int status = crcs->status;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    switch(status) {
+    case LIBXL_COLO_SETUPED:
+        /* We don't enable logdirty now */
+        colo_resume_vm(egc, crcs);
+        return;
+    case LIBXL_COLO_SUSPENDED:
+    case LIBXL_COLO_RESUMED:
+        /* disable logdirty first */
+        lds->callback = colo_disable_logdirty_done;
+        colo_disable_logdirty(crs, egc);
+        return;
+    default:
+        LOG(ERROR, "invalid status: %d", status);
+        crcs->callback(egc, crcs, ERROR_FAIL);
+    }
+}
+
+void libxl__colo_restore_teardown(libxl__egc *egc,
+                                  libxl__colo_restore_state *crs,
+                                  int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, crcs->dirty_bitmap);
+    int bsize = bitmap_size(crcs->p2m_size);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    EGC_GC;
+
+    if (!dirty_bitmap)
+        goto do_failover;
+
+    xc_hypercall_buffer_free_pages(CTX->xch, dirty_bitmap, NRPAGES(bsize));
+
+do_failover:
+    if (!rc) {
+        crcs->callback = do_failover_done;
+        do_failover(egc, crs);
+        return;
+    }
+
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+    crs->callback(egc, crs, rc);
+}
+
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc)
+        LOG(ERROR, "cannot do failover");
+
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+
+    crs->callback(egc, crs, rc);
+}
+
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    STATE_AO_GC(lds->ao);
+
+    if (rc)
+        LOG(WARN, "cannot disable logdirty");
+
+    if (crcs->status == LIBXL_COLO_SUSPENDED) {
+        colo_resume_vm(egc, crcs);
+        return;
+    }
+
+    /* If we cannot disable logdirty, we still can do failover */
+    crcs->callback(egc, crcs, 0);
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. resume
+ * 2. checkpoint
+ * 3. suspend
+ */
+static void colo_common_send_data_done(libxl__egc *egc,
+                                       libxl__datacopier_state *dc,
+                                       int onwrite, int errnoval);
+/* ===================== colo: resume secondary vm ===================== */
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. write LIBXL_COLO_SVM_READY
+ *  2. resume secondary vm
+ *  3. write LIBXL_COLO_SVM_RESUMED
+ */
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc);
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc);
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int retval);
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc);
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc);
+
+static void libxl__colo_restore_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+    uint8_t section = LIBXL_COLO_SVM_READY;
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    const int send_fd = crs->send_fd;
+    libxl__datacopier_state *const dc = &crcs->dc;
+
+    STATE_AO_GC(crs->ao);
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[1];
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_send_data_done;
+    crcs->callback = colo_send_svm_ready_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell master that secondary vm is ready */
+    libxl__datacopier_prefixdata(shs->egc, dc, &section, sizeof(section));
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc)
+{
+    crcs->callback = colo_resume_vm_done;
+    colo_resume_vm(egc, crcs);
+
+    return;
+}
+
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot resume secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_RESUMED;
+
+    /* avoid calling libxl__xc_domain_restore_done() more than once */
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+
+        lds->callback = colo_enable_logdirty_done;
+        colo_enable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    uint8_t section = LIBXL_COLO_SVM_RESUMED;
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    const int send_fd = crs->send_fd;
+    libxl__datacopier_state *const dc = &crcs->dc;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[2];
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_send_data_done;
+    /* TODO: configure network */
+    crcs->callback = NULL;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell master that secondary vm is resumed */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__save_helper_state *const shs = &dcs->shs;
+    const uint32_t domid = crs->domid;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        /*
+         * log-dirty already enabled? There's no test op,
+         * so attempt to disable then reenable it
+         */
+        lds->callback = colo_reenable_logdirty;
+        colo_disable_logdirty(crs, egc);
+        return;
+    }
+
+    /* We have enabled secondary vm's logdirty, so we can unpause it now */
+    rc = libxl__domain_unpause(gc, domid);
+    if (rc) {
+        LOG(ERROR, "cannot unpause secondary vm");
+        goto out;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    lds->callback = colo_reenable_logdirty_done;
+    colo_enable_logdirty(crs, egc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+    const uint32_t domid = crcs->crs->domid;
+
+    STATE_AO_GC(crcs->crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    /* We have enabled secondary vm's logdirty, so we can unpause it now */
+    rc = libxl__domain_unpause(gc, domid);
+    if (rc) {
+        LOG(ERROR, "cannot unpause secondary vm");
+        goto out;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+
+/* ===================== colo: wait new checkpoint ===================== */
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__datareader_state *drs,
+                                  ssize_t real_size, int errnoval);
+
+static void libxl__colo_restore_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    /* Convenience aliases */
+    const int recv_fd = dcs->crs.recv_fd;
+    libxl__datareader_state *const drs = &crcs->drs;
+
+    STATE_AO_GC(dcs->crs.ao);
+
+    memset(drs, 0, sizeof(*drs));
+    drs->ao = ao;
+    drs->readfd = recv_fd;
+    drs->readsize = sizeof(crcs->section);
+    drs->readwhat = "colo stream";
+    drs->callback = colo_stream_read_done;
+    drs->buf = &crcs->section;
+
+    if (libxl__datareader_start(drs)) {
+        LOG(ERROR, "libxl__datareader_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__datareader_state *drs,
+                                  ssize_t real_size, int errnoval)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(drs, *crcs, drs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok = 0;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(drs->ao);
+
+    if (real_size < drs->readsize) {
+        LOG(ERROR, "reading data fails: %lld", (long long)real_size);
+        goto out;
+    }
+
+    if (crcs->section != LIBXL_COLO_NEW_CHECKPOINT) {
+        LOG(ERROR, "invalid section: %d", crcs->section);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, ok);
+}
+
+
+/* ===================== colo: suspend secondary vm ===================== */
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. suspend secondary vm
+ *  2. get secondary vm's dirty page information
+ *  3. send LIBXL_COLO_SVM_SUSPENDED
+ *  4. send secondary vm's dirty page information(count + pfn list)
+ */
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state2 *dss2,
+                                 int ok);
+static void colo_append_pfn_type(libxl__egc *egc,
+                                 libxl__datacopier_state *dc,
+                                 unsigned long *dirty_bitmap,
+                                 unsigned long p2m_size);
+
+static void libxl__colo_restore_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    STATE_AO_GC(dcs->ao);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *const dss2 = &crcs->dss2;
+
+    /* suspend secondary vm */
+    dss2->callback_common_done = colo_suspend_vm_done;
+
+    libxl__domain_suspend2(shs->egc, dss2);
+}
+
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state2 *dss2,
+                                 int ok)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(dss2, *crcs, dss2);
+    libxl__colo_restore_state *crs = crcs->crs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, crcs->dirty_bitmap);
+    uint8_t section = LIBXL_COLO_SVM_SUSPENDED;
+    int i, rc;
+    uint64_t count;
+
+    /* Convenience aliases */
+    const int send_fd = crs->send_fd;
+    const unsigned long p2m_size = crcs->p2m_size;
+    const uint32_t domid = crs->domid;
+    libxl__datacopier_state *const dc = &crcs->dc;
+
+    STATE_AO_GC(crs->ao);
+
+    if (!ok) {
+        LOG(ERROR, "cannot suspend secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_SUSPENDED;
+
+    /*
+     * Secondary vm is running, so there are some dirty pages
+     * that are non-dirty in master. Get dirty bitmap and
+     * send it to master.
+     */
+    if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                          HYPERCALL_BUFFER(dirty_bitmap), p2m_size,
+                          NULL, 0, NULL) != p2m_size) {
+        LOG(ERROR, "getting secondary vm's dirty bitmap fails");
+        goto out;
+    }
+
+    count = 0;
+    for (i = 0; i < p2m_size; i++) {
+        if (test_bit(i, dirty_bitmap))
+            count++;
+    }
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[0];
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_send_data_done;
+    crcs->callback = NULL;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell master that secondary vm is suspended */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+
+    /* send dirty pages to master */
+    libxl__datacopier_prefixdata(egc, dc, &count, sizeof(count));
+    colo_append_pfn_type(egc, dc, dirty_bitmap, p2m_size);
+    return;
+
+out:
+    ok = 0;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->shs, ok);
+}
+
+static void colo_append_pfn_type(libxl__egc *egc,
+                                 libxl__datacopier_state *dc,
+                                 unsigned long *dirty_bitmap,
+                                 unsigned long p2m_size)
+{
+    int i, count;
+    /* Hack, buf->buf is private member... */
+    libxl__datacopier_buf *buf = NULL;
+    int max_batch = sizeof(buf->buf) / sizeof(uint64_t);
+    int buf_size = max_batch * sizeof(uint64_t);
+    uint64_t *pfn;
+
+    STATE_AO_GC(dc->ao);
+
+    pfn = libxl__zalloc(NOGC, buf_size);
+
+    count = 0;
+    for (i = 0; i < p2m_size; i++) {
+        if (!test_bit(i, dirty_bitmap))
+            continue;
+
+        pfn[count++] = i;
+        if (count == max_batch) {
+            libxl__datacopier_prefixdata(egc, dc, pfn, buf_size);
+            count = 0;
+        }
+    }
+
+    if (count)
+        libxl__datacopier_prefixdata(egc, dc, pfn, count * sizeof(uint64_t));
+
+    free(pfn);
+}
+
+
+/* ===================== colo: common callback ===================== */
+static void colo_common_send_data_done(libxl__egc *egc,
+                                       libxl__datacopier_state *dc,
+                                       int onwrite, int errnoval)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(dc, *crcs, dc);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok;
+    STATE_AO_GC(dc->ao);
+
+    if (onwrite == -1) {
+        LOG(ERROR, "sending data fails");
+        ok = 0;
+        goto out;
+    }
+
+    if (errnoval) {
+        /* failure happens when reading/writing, do failover? */
+        ok = 2;
+        goto out;
+    }
+
+    if (!crcs->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    crcs->callback(egc, crcs, 0);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->shs, ok);
+}
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c277fd4..73708c5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -19,6 +19,7 @@
 
 #include "libxl_internal.h"
 #include "libxl_arch.h"
+#include "libxl_colo.h"
 
 #include <xc_dom.h>
 #include <xenguest.h>
@@ -883,6 +884,96 @@ static void domcreate_console_available(libxl__egc *egc,
                                         dcs->aop_console_how.for_event));
 }
 
+static void libxl__colo_restore_teardown_done(libxl__egc *egc,
+                                              libxl__colo_restore_state *crs,
+                                              int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    STATE_AO_GC(crs->ao);
+
+    /* convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+    const int domid = crs->domid;
+    const libxl_ctx *const ctx = libxl__gc_owner(gc);
+    xc_interface *const xch = ctx->xch;
+
+    if (!rc)
+        /* failover, no need to destroy the secondary vm */
+        goto out;
+
+    if (shs->retval)
+        /*
+         * shs->retval stores the return value of xc_domain_restore().
+         * If it is not 0, we have destroyed the secondary vm in
+         * xc_domain_restore();
+         */
+        goto out;
+
+    xc_domain_destroy(xch, domid);
+
+out:
+    dcs->callback(egc, dcs, rc, crs->domid);
+}
+
+void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+                              int ret, int retval, int errnoval)
+{
+    libxl__domain_create_state *dcs = dcs_void;
+    int rc = 1;
+
+    /* convenience aliases */
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    STATE_AO_GC(crs->ao);
+
+    /* teardown and failover */
+    crs->callback = libxl__colo_restore_teardown_done;
+
+    if (ret == 0 && retval == 0)
+        rc = 0;
+
+    LOG(INFO, "%s", rc ? "colo fails" : "failover");
+    libxl__colo_restore_teardown(egc, crs, rc);
+}
+
+static void libxl__colo_restore_cp_done(libxl__egc *egc,
+                                        libxl__colo_restore_state *crs,
+                                        int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    int ok = 0;
+
+    /* convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    if (!rc)
+        ok = 1;
+
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, ok);
+}
+
+static void libxl__colo_restore_setup_done(libxl__egc *egc,
+                                           libxl__colo_restore_state *crs,
+                                           int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    /* convenience aliases */
+    const int hvm = crs->hvm;
+    const int superpages = crs->superpages;
+    const int pae = crs->pae;
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "colo restore setup fails: %d", rc);
+        libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+        return;
+    }
+
+    crs->callback = libxl__colo_restore_cp_done;
+    libxl__xc_domain_restore(egc, dcs,
+                             hvm, pae, superpages);
+}
+
 static void domcreate_bootloader_done(libxl__egc *egc,
                                       libxl__bootloader_state *bl,
                                       int rc)
@@ -898,6 +989,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->shs.callbacks.restore.a;
+    const int checkpointed_stream = dcs->checkpointed_stream;
+    libxl__colo_restore_state *const crs = &dcs->crs;
 
     if (rc) {
         domcreate_rebuild_done(egc, dcs, rc);
@@ -926,6 +1019,13 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 
     /* Restore */
 
+    /* COLO only supports HVM now */
+    if (info->type != LIBXL_DOMAIN_TYPE_HVM &&
+        checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     rc = libxl__build_pre(gc, domid, d_config, state);
     if (rc)
         goto out;
@@ -948,8 +1048,20 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
-    libxl__xc_domain_restore(egc, dcs,
-                             hvm, pae, superpages);
+
+    if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        crs->ao = ao;
+        crs->domid = domid;
+        crs->send_fd = dcs->send_fd;
+        crs->recv_fd = restore_fd;
+        crs->hvm = hvm;
+        crs->superpages = superpages;
+        crs->pae = pae;
+        crs->callback = libxl__colo_restore_setup_done;
+        libxl__colo_restore_setup(egc, crs);
+    } else
+        libxl__xc_domain_restore(egc, dcs,
+                                 hvm, pae, superpages);
     return;
 
  out:
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d4bf8b3..4ea2607 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -816,7 +816,7 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
 static void switch_logdirty_done(libxl__egc *egc,
                                  libxl__logdirty_switch *lds, int ok);
 
-static void logdirty_init(libxl__logdirty_switch *lds)
+void logdirty_init(libxl__logdirty_switch *lds)
 {
     lds->cmd_path = 0;
     libxl__ev_xswatch_init(&lds->watch);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 92e0801..a1f3ec8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2712,6 +2712,7 @@ struct libxl__logdirty_switch {
     libxl__ev_xswatch watch;
     libxl__ev_time timeout;
 };
+_hidden void logdirty_init(libxl__logdirty_switch *lds);
 
 /*
  * libxl__domain_suspend_state is for saving guest, not
@@ -3011,6 +3012,26 @@ typedef void libxl__domain_create_cb(libxl__egc *egc,
                                      libxl__domain_create_state*,
                                      int rc, uint32_t domid);
 
+/* colo related structure */
+typedef struct libxl__colo_restore_state libxl__colo_restore_state;
+typedef void libxl__colo_callback(libxl__egc *,
+                                  libxl__colo_restore_state *, int rc);
+struct libxl__colo_restore_state {
+    /* must set by caller of libxl__colo_(setup|teardown) */
+    libxl__ao *ao;
+    uint32_t domid;
+    int send_fd;
+    int recv_fd;
+    int hvm;
+    int pae;
+    int superpages;
+    libxl__colo_callback *callback;
+
+    /* private, colo restore checkpoint state */
+    libxl__domain_create_cb *saved_cb;
+    void *crcs;
+};
+
 struct libxl__domain_create_state {
     /* filled in by user */
     libxl__ao *ao;
@@ -3023,6 +3044,7 @@ struct libxl__domain_create_state {
     int guest_domid;
     int checkpointed_stream;
     libxl__domain_build_state build_state;
+    libxl__colo_restore_state crs;
     libxl__bootloader_state bl;
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 0c09d94..e251181 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -15,6 +15,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_colo.h"
 
 /* stream_fd is as from the caller (eventually, the application).
  * It may be 0, 1 or 2, in which case we need to dup it elsewhere.
@@ -65,7 +66,10 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     dcs->shs.ao = ao;
     dcs->shs.domid = domid;
     dcs->shs.recv_callback = libxl__srm_callout_received_restore;
-    dcs->shs.completion_callback = libxl__xc_domain_restore_done;
+    if (dcs->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
+        dcs->shs.completion_callback = libxl__colo_restore_done;
+    else
+        dcs->shs.completion_callback = libxl__xc_domain_restore_done;
     dcs->shs.caller_state = dcs;
     dcs->shs.need_results = 1;
     dcs->shs.toolstack_data_file = 0;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 41ee000..0239cac 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -24,9 +24,9 @@ our @msgs = (
                                                  STRING doing_what),
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
-    [  3, 'scxA',   "suspend", [] ],
-    [  4, 'scxA',   "postcopy", [] ],
-    [  5, 'scxA',   "checkpoint", [] ],
+    [  3, 'srcxA',   "suspend", [] ],
+    [  4, 'srcxA',   "postcopy", [] ],
+    [  5, 'srcxA',   "checkpoint", [] ],
     [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
     #                toolstack_save          done entirely `by hand'
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 18/25] primary vm suspend/get_dirty_pfn/resume/checkpoint code
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (16 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 17/25] secondary vm suspend/resume/checkpoint code Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 19/25] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Wen Congyang
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read LIBXL_COLO_SVM_SUSPENDED to master
   d. Read secondary vm's dirty page information to master(count + pfn list)
2. Get dirty pfn list
   a. Return secondary vm's dirty pfn list
3. Resume primary vm
   a. Read LIBXL_COLO_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read LIBXL_COLO_SVM_RESUMED from slave
4. Wait a new checkpoint
    a. Wait a new checkpoint(not implemented)
    b. Send LIBXL_COLO_NEW_CHECKPOINT to slave

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xenguest.h             |  12 +
 tools/libxl/Makefile               |   2 +-
 tools/libxl/libxl.c                |  18 ++
 tools/libxl/libxl_colo.h           |  10 +
 tools/libxl/libxl_colo_save.c      | 585 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c            |  13 +-
 tools/libxl/libxl_internal.h       |  39 ++-
 tools/libxl/libxl_save_msgs_gen.pl |   1 +
 tools/libxl/libxl_types.idl        |   1 +
 9 files changed, 670 insertions(+), 11 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_save.c

diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index d3061c7..1aeaad2 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -72,6 +72,18 @@ struct save_callbacks {
      */
     int (*toolstack_save)(uint32_t domid, uint8_t **buf, uint32_t *len, void *data);
 
+    /* Called after the guest is suspended.
+     *
+     * returns the list of dirty pfn:
+     *  struct {
+     *      uint64_t count;
+     *      uint64_t pfn[];
+     *  };
+     *
+     *  Note: the caller must free the return value.
+     */
+    uint8_t *(*get_dirty_pfn)(void *data);
+
     /* to be provided as the last argument to each callback function */
     void* data;
 };
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 9642500..6b01a94 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -57,7 +57,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_checkpoint_device.o libxl_remus_disk_drbd.o
-LIBXL_OBJS-y += libxl_colo_restore.o
+LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6e6781e..2ba6798 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_colo.h"
 
 #define PAGE_TO_MEMKB(pages) ((pages) * 4)
 #define BACKEND_STRING_SIZE 5
@@ -823,8 +824,25 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
+    if (type != LIBXL_DOMAIN_TYPE_HVM && info->colo) {
+        /* colo only supports hvm now */
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     /* Convenience aliases */
     libxl__checkpoint_device_state *const cds = &dss->cds;
+    libxl__colo_save_state *const css = &dss->css;
+
+    if (info->colo) {
+        css->cds.ao = ao;
+        css->cds.domid = domid;
+        css->cds.saved_rc = 0;
+
+        /* Point of no return */
+        libxl__colo_save_setup(egc, css);
+        return AO_INPROGRESS;
+    }
 
     if (info->netbuf) {
         if (!libxl__netbuffer_enabled(gc)) {
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 91df275..26a2563 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -35,4 +35,14 @@ extern void libxl__colo_restore_teardown(libxl__egc *egc,
                                          libxl__colo_restore_state *crs,
                                          int rc);
 
+extern void libxl__colo_save_domain_suspend_callback(void *data);
+extern void libxl__colo_save_domain_resume_callback(void *data);
+extern void libxl__colo_save_domain_checkpoint_callback(void *data);
+extern void libxl__colo_save_get_dirty_pfn_callback(void *data);
+extern void libxl__colo_save_setup(libxl__egc *egc,
+                                   libxl__colo_save_state *css);
+extern void libxl__colo_save_teardown(libxl__egc *egc,
+                                      libxl__colo_save_state *css,
+                                      int rc);
+
 #endif
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
new file mode 100644
index 0000000..aef6f97
--- /dev/null
+++ b/tools/libxl/libxl_colo_save.c
@@ -0,0 +1,585 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+
+static const libxl__checkpoint_device_subkind_ops *colo_ops[] = {
+    NULL,
+};
+
+/* ================= colo: setup save environment ================= */
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_device_state *cds,
+                                 int rc);
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_device_state *cds,
+                                   int rc);
+
+void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    css->send_fd = dss->fd;
+    css->recv_fd = dss->recv_fd;
+
+    /* TODO: disk/nic support */
+    css->cds.enabled_device_kinds = 0;
+    css->cds.ops = colo_ops;
+    css->cds.callback = colo_save_setup_done;
+    css->svm_running = false;
+
+    libxl__checkpoint_devices_setup(egc, &css->cds);
+}
+
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_device_state *cds,
+                                 int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    STATE_AO_GC(cds->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+        dss->domid);
+    css->cds.saved_rc = rc;
+    css->cds.callback = colo_save_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, &css->cds);
+}
+
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_device_state *cds,
+                                   int rc)
+{
+    STATE_AO_GC(cds->ao);
+
+    libxl__ao_complete(egc, ao, rc);
+}
+
+
+/* ================= colo: teardown save environment ================= */
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_device_state *cds,
+                               int rc);
+
+void libxl__colo_save_teardown(libxl__egc *egc,
+                               libxl__colo_save_state *css,
+                               int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    dss->css.cds.saved_rc = rc;
+    dss->css.cds.callback = colo_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->css.cds);
+    return;
+}
+
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_device_state *cds,
+                               int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    dss->callback(egc, dss, rc);
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. suspend
+ * 2. resume
+ * 3. checkpoint
+ */
+static void colo_common_read_done(libxl__egc *egc,
+                                  libxl__datareader_state *drs,
+                                  ssize_t real_size, int errnoval);
+/* ===================== colo: suspend primary vm ===================== */
+/*
+ * Do the following things when suspending primary vm:
+ * 1. suspend primary vm
+ * 2. do postsuspend
+ * 3. read LIBXL_COLO_SVM_SUSPENDED
+ * 4. read secondary vm's dirty pages
+ */
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state2 *dss2,
+                                         int ok);
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_device_state *cds,
+                                int rc);
+static void colo_read_pfn(libxl__egc *egc, libxl__colo_save_state *css);
+
+void libxl__colo_save_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *dss2 = &dss->dss2;
+
+    dss2->callback_common_done = colo_suspend_primary_vm_done;
+    libxl__domain_suspend2(egc, dss2);
+}
+
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state2 *dss2,
+                                         int ok)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
+    STATE_AO_GC(dss2->ao);
+
+    if (!ok) {
+        LOG(ERROR, "cannot suspend primary vm");
+        goto out;
+    }
+
+    /* Convenience aliases */
+    libxl__checkpoint_device_state *const cds = &dss->css.cds;
+
+    cds->callback = colo_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_device_state *cds,
+                                int rc)
+{
+    int ok = 0;
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__datareader_state *const drs = &css->drs;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    if (!css->svm_running) {
+        ok = 1;
+        goto out;
+    }
+
+    /*
+     * read LIBXL_COLO_SVM_SUSPENDED and the count of
+     * secondary vm's dirty pages.
+     */
+    memset(drs, 0, sizeof(*drs));
+    drs->ao = ao;
+    drs->readfd = css->recv_fd;
+    drs->readsize = sizeof(css->temp_buff);
+    drs->readwhat = "colo stream";
+    drs->callback = colo_common_read_done;
+    drs->buf = css->temp_buff;
+    css->callback = colo_read_pfn;
+
+    if (libxl__datareader_start(drs)) {
+        LOG(ERROR, "libxl__datareader_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void colo_read_pfn(libxl__egc *egc, libxl__colo_save_state *css)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    STATE_AO_GC(css->cds.ao);
+
+    /* Convenience aliases */
+    libxl__datareader_state *const drs = &css->drs;
+
+    assert(!css->buff);
+    css->section = css->temp_buff[0];
+    css->count = *(uint64_t *)(&css->temp_buff[1]);
+
+    if (css->section != LIBXL_COLO_SVM_SUSPENDED) {
+        LOG(ERROR, "invalid section: %d, expected: %d",
+            css->section, LIBXL_COLO_SVM_SUSPENDED);
+        goto out;
+    }
+
+    css->buff = libxl__zalloc(NOGC, sizeof(uint64_t) * (css->count + 1));
+    css->buff[0] = css->count;
+
+    if (css->count == 0) {
+        /* no dirty pages */
+        ok = 1;
+        goto out;
+    }
+
+    /* read the pfn of secondary vm's dirty pages */
+    memset(drs, 0, sizeof(*drs));
+    drs->ao = ao;
+    drs->readfd = css->recv_fd;
+    drs->readsize = css->count * sizeof(uint64_t);
+    drs->readwhat = "colo stream";
+    drs->callback = colo_common_read_done;
+    drs->buf = css->buff + 1;
+    css->callback = NULL;
+
+    if (libxl__datareader_start(drs)) {
+        LOG(ERROR, "libxl__datareader_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/* ===================== colo: get dirty pfn ===================== */
+void libxl__colo_save_get_dirty_pfn_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    uint64_t size;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    assert(css->buff);
+    size = sizeof(uint64_t) * (css->count + 1);
+
+    libxl__xc_domain_saverestore_async_callback_done_with_data(egc, shs,
+                                                               (uint8_t *)css->buff,
+                                                               size);
+    free(css->buff);
+    css->buff = NULL;
+}
+
+
+/* ===================== colo: resume primary vm ===================== */
+/*
+ * Do the following things when resuming primary vm:
+ *  1. read LIBXL_COLO_SVM_READY
+ *  2. do preresume
+ *  3. resume primary vm
+ *  4. read LIBXL_COLO_SVM_RESUMED
+ */
+static void colo_preresume_dm_saved(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss, int rc);
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css);
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_device_state *cds,
+                              int rc);
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css);
+
+void libxl__colo_save_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+
+    /* This would go into tailbuf. */
+    if (dss->hvm) {
+        libxl__domain_save_device_model(egc, dss, colo_preresume_dm_saved);
+    } else {
+        colo_preresume_dm_saved(egc, dss, 0);
+    }
+
+    return;
+}
+
+static void colo_preresume_dm_saved(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss, int rc)
+{
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+    libxl__datareader_state *const drs = &css->drs;
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating COLO..");
+        goto out;
+    }
+
+    /* read LIBXL_COLO_SVM_READY */
+    memset(drs, 0, sizeof(*drs));
+    drs->ao = ao;
+    drs->readfd = css->recv_fd;
+    drs->readsize = sizeof(css->section);
+    drs->readwhat = "colo stream";
+    drs->callback = colo_common_read_done;
+    drs->buf = &css->section;
+    css->callback = colo_read_svm_ready_done;
+
+    if (libxl__datareader_start(drs)) {
+        LOG(ERROR, "libxl__datareader_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (css->section != LIBXL_COLO_SVM_READY) {
+        LOG(ERROR, "invalid section: %d, expected: %d",
+            css->section, LIBXL_COLO_SVM_READY);
+        goto out;
+    }
+
+    css->svm_running = true;
+    css->cds.callback = colo_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, &css->cds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_device_state *cds,
+                              int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__datareader_state *const drs = &css->drs;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    /* Resumes the domain and the device model */
+    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1, 0)) {
+        LOG(ERROR, "cannot resume primary vm");
+        goto out;
+    }
+
+    /* read LIBXL_COLO_SVM_RESUMED */
+    memset(drs, 0, sizeof(*drs));
+    drs->ao = ao;
+    drs->readfd = css->recv_fd;
+    drs->readsize = sizeof(css->section);
+    drs->readwhat = "colo stream";
+    drs->callback = colo_common_read_done;
+    drs->buf = &css->section;
+    css->callback = colo_read_svm_resumed_done;
+
+    if (libxl__datareader_start(drs)) {
+        LOG(ERROR, "libxl__datareader_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (css->section != LIBXL_COLO_SVM_RESUMED) {
+        LOG(ERROR, "invalid section: %d, expected: %d",
+            css->section, LIBXL_COLO_SVM_RESUMED);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/* ===================== colo: wait new checkpoint ===================== */
+/*
+ * Do the following things:
+ * 1. do commit
+ * 2. wait for a new checkpoint
+ * 3. write LIBXL_COLO_NEW_CHECKPOINT
+ */
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_device_state *cds,
+                                  int rc);
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_device_state *cds,
+                                      int rc);
+static void colo_send_data_done(libxl__egc *egc,
+                                libxl__datacopier_state *dc,
+                                int onwrite, int errnoval);
+
+void libxl__colo_save_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    libxl__egc *egc = dss->shs.egc;
+
+    /* Convenience aliases */
+    libxl__checkpoint_device_state *const cds = &dss->css.cds;
+
+    cds->callback = colo_device_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
+}
+
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_device_state *cds,
+                                  int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    /* TODO: wait a new checkpoint */
+    colo_start_new_checkpoint(egc, cds, 0);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_device_state *cds,
+                                      int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    uint8_t section = LIBXL_COLO_NEW_CHECKPOINT;
+
+    /* Convenience aliases */
+    libxl__datacopier_state *const dc = &css->dc;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc)
+        goto out;
+
+    /* write LIBXL_COLO_NEW_CHECKPOINT */
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = css->send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = "new checkpoint is triggered";
+    dc->writewhat = "colo stream";
+    dc->callback = colo_send_data_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell slave that a new checkpoint is triggered */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_send_data_done(libxl__egc *egc,
+                                libxl__datacopier_state *dc,
+                                int onwrite, int errnoval)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dc, *css, dc);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    int ok;
+
+    STATE_AO_GC(dc->ao);
+
+    if (onwrite == -1 || errnoval) {
+        LOG(ERROR, "cannot start a new checkpoint");
+        ok = 0;
+        goto out;
+    }
+
+    /* Everything is OK */
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/* ===================== colo: common callback ===================== */
+static void colo_common_read_done(libxl__egc *egc,
+                                  libxl__datareader_state *drs,
+                                  ssize_t real_size, int errnoval)
+{
+    int ok = 0;
+    libxl__colo_save_state *css = CONTAINER_OF(drs, *css, drs);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    STATE_AO_GC(drs->ao);
+
+    if (real_size < drs->readsize) {
+        LOG(ERROR, "reading data fails: %lld", (long long)real_size);
+        goto out;
+    }
+
+    if (!css->callback) {
+        /* Everything is OK */
+        ok = 1;
+        goto out;
+    }
+
+    css->callback(egc, css);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 4ea2607..f51e701 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -19,6 +19,7 @@
 
 #include "libxl_internal.h"
 #include "libxl_arch.h"
+#include "libxl_colo.h"
 
 #include <xc_dom.h>
 #include <xen/hvm/hvm_info_table.h>
@@ -1747,7 +1748,12 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     }
 
     memset(callbacks, 0, sizeof(*callbacks));
-    if (r_info != NULL) {
+    if (r_info != NULL && r_info->colo) {
+        callbacks->suspend = libxl__colo_save_domain_suspend_callback;
+        callbacks->postcopy = libxl__colo_save_domain_resume_callback;
+        callbacks->checkpoint = libxl__colo_save_domain_checkpoint_callback;
+        callbacks->get_dirty_pfn = libxl__colo_save_get_dirty_pfn_callback;
+    } else if (r_info != NULL) {
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
@@ -1911,7 +1917,10 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss2->guest_evtchn.port, &dss2->guest_evtchn_lockfd);
 
-    if (dss->remus) {
+    if (dss->remus && dss->remus->colo) {
+        libxl__colo_save_teardown(egc, &dss->css, rc);
+        return;
+    } else if (dss->remus) {
         /*
          * With Remus, if we reach this point, it means either
          * backup died or some network error occurred preventing us
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a1f3ec8..20f7da8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2690,6 +2690,25 @@ extern const libxl__checkpoint_device_subkind_ops remus_device_drbd_disk;
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
+/*----- colo related state structure -----*/
+typedef struct libxl__colo_save_state libxl__colo_save_state;
+struct libxl__colo_save_state {
+    libxl__checkpoint_device_state cds;
+    int send_fd;
+    int recv_fd;
+
+    /* private */
+    libxl__datacopier_state dc;
+    libxl__datareader_state drs;
+    uint8_t section;
+    uint64_t count;
+    uint64_t *buff;
+    /* read section and count, and then store it in temp_buff */
+    uint8_t temp_buff[9];
+    void (*callback)(libxl__egc *, libxl__colo_save_state *);
+    bool svm_running;
+};
+
 /*----- Domain suspend (save) state structure -----*/
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
@@ -2753,14 +2772,18 @@ struct libxl__domain_suspend_state {
     libxl__domain_suspend_state2 dss2;
     int hvm;
     int xcflags;
-    /* for Remus */
-    struct {
-        libxl__checkpoint_device_state cds;
-        const char *netbufscript;
-        /* used for Remus checkpoint */
-        libxl__ev_time checkpoint_timeout;
-        /* checkpoint interval */
-        int interval;
+    union {
+        /* for Remus */
+        struct {
+            libxl__checkpoint_device_state cds;
+            const char *netbufscript;
+            /* used for Remus checkpoint */
+            libxl__ev_time checkpoint_timeout;
+            /* checkpoint interval */
+            int interval;
+        };
+        /* for COLO */
+        libxl__colo_save_state css;
     };
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 0239cac..fbb2d67 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -36,6 +36,7 @@ our @msgs = (
                                               'unsigned long', 'console_mfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
+    [ 10, 'scxAB',  "get_dirty_pfn", [] ],
 );
 
 #----------------------------------------
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1e1a62e..6b21dcb 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -600,6 +600,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("netbuf",       bool),
     ("netbufscript", string),
     ("diskbuf",      bool),
+    ("colo",         bool)
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 19/25] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (17 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 18/25] primary vm suspend/get_dirty_pfn/resume/checkpoint code Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 20/25] COLO: xc related codes Wen Congyang
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

In colo mode, secondary vm is running. We will use the io_fd to
ensure that both primary vm and secondary vm are resumed
at the same time. So we should call postcopy later.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_save.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..61caa47 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -2078,10 +2078,15 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
  out_rc:
     completed = 1;
 
-    if ( !rc && callbacks->postcopy )
+    /*
+     * COLO: secondary vm is running. We will use the io_fd to
+     * ensure that both primary vm and secondary vm are resumed
+     * at the same time. So we should call postcopy later.
+     */
+    if ( !rc && callbacks->postcopy && !callbacks->get_dirty_pfn )
         callbacks->postcopy(callbacks->data);
 
-    /* guest has been resumed. Now we can compress data
+    /* Remus: guest has been resumed. Now we can compress data
      * at our own pace.
      */
     if (!rc && compressing)
@@ -2109,6 +2114,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
 
     discard_file_cache(xch, io_fd, 1 /* flush */);
 
+    /*
+     * COLO: send qemu device state and resume both
+     * primary vm and secondary vm now.
+     */
+    if ( !rc && callbacks->postcopy && callbacks->get_dirty_pfn )
+        callbacks->postcopy(callbacks->data);
+
     /* Enable compression now, finally */
     compressing = (flags & XCFLAGS_CHECKPOINT_COMPRESS);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 20/25] COLO: xc related codes
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (18 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 19/25] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 21/25] send store mfn and console mfn to xl before resuming secondary vm Wen Congyang
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Save:
1. send XC_SAVE_ID_LAST_CHECKPOINT, so secondary vm can be resumed
2. call callbacks->get_dirty_pfn() after suspend primary vm if we
   are doing checkpoint.

Restore:
1. call the callbacks resume/checkpoint/suspend if secondary vm's
   status is the same as primary vm's status.
2. zero out tdata because we will use it zero out pagebuf.tdata.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_restore.c | 44 ++++++++++++++++++++++++++++++++--
 tools/libxc/xc_domain_save.c    | 52 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 32a3e72..6e025f0 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1454,7 +1454,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     int nraces = 0;
 
     /* The new domain's shared-info frame number. */
-    unsigned long shared_info_frame;
+    unsigned long shared_info_frame = 0;
     unsigned char shared_info_page[PAGE_SIZE]; /* saved contents from file */
     shared_info_any_t *old_shared_info = 
         (shared_info_any_t *)shared_info_page;
@@ -1504,6 +1504,8 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
 
     DPRINTF("%s: starting restore of new domid %u", __func__, dom);
 
+    n = m = 0;
+
     pagebuf_init(&pagebuf);
     memset(&tailbuf, 0, sizeof(tailbuf));
     tailbuf.ishvm = hvm;
@@ -1629,7 +1631,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
      * We uncanonicalise page tables as we go.
      */
 
-    n = m = 0;
  loadpages:
     for ( ; ; )
     {
@@ -1793,6 +1794,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
         goto finish;
     }
 
+new_checkpoint:
     // DPRINTF("Buffered checkpoint\n");
 
     if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
@@ -2292,6 +2294,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
             free(tdata.data);
             goto out;
         }
+        memset(&tdata, 0, sizeof(tdata));
     }
 
     /* Dump the QEMU state to a state file for QEMU to load */
@@ -2357,6 +2360,43 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     rc = 0;
 
  out:
+    if ( !rc && callbacks->checkpoint )
+    {
+#define HANDLE_CALLBACK_RETURN_VALUE(frc)                   \
+    do {                                                    \
+        if ( frc == 0 )                                     \
+        {                                                   \
+            /* Some internal error happens */               \
+            rc = 1;                                         \
+            goto out;                                       \
+        }                                                   \
+        else if ( frc == 2 )                                \
+        {                                                   \
+            /* Reading/writing error, do failover */        \
+            rc = 0;                                         \
+            goto failover;                                  \
+        }                                                   \
+    } while (0)
+        /* COLO */
+
+        /* TODO: call restore_results */
+
+        /* Resume secondary vm */
+        frc = callbacks->postcopy(callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(frc);
+
+        /* wait for new checkpoint */
+        frc = callbacks->checkpoint(callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(frc);
+
+        /* suspend secondary vm */
+        frc = callbacks->suspend(callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(frc);
+
+        goto new_checkpoint;
+    }
+
+failover:
     if ( (rc != 0) && (dom != 0) )
         xc_domain_destroy(xch, dom);
     xc_hypercall_buffer_free(xch, ctxt);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 61caa47..79cc2c8 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -377,6 +377,31 @@ static int suspend_and_state(int (*suspend)(void*), void* data,
     return 0;
 }
 
+static int update_dirty_bitmap(uint8_t *(*get_dirty_pfn)(void *), void *data,
+                               unsigned long p2m_size, unsigned long *to_send)
+{
+    uint64_t *pfn_list;
+    uint64_t count, i;
+    uint64_t pfn;
+
+    pfn_list = (uint64_t *)get_dirty_pfn(data);
+    assert(pfn_list);
+
+    count = pfn_list[0];
+    for (i = 0; i < count; i++) {
+        pfn = pfn_list[i + 1];
+        if (pfn > p2m_size) {
+            errno = EINVAL;
+            return -1;
+        }
+
+        set_bit(pfn, to_send);
+    }
+
+    free(pfn_list);
+    return 0;
+}
+
 /*
 ** Map the top-level page of MFNs from the guest. The guest might not have
 ** finished resuming from a previous restore operation, so we wait a while for
@@ -1769,11 +1794,14 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
         free(buf);
     }
 
-    if ( !callbacks->checkpoint )
+    if ( !callbacks->checkpoint || callbacks->get_dirty_pfn )
     {
         /*
          * If this is not a checkpointed save then this must be the first and
          * last checkpoint.
+         *
+         * If we are in colo mode, send last checkpoint to resume secondary
+         * vm.
          */
         i = XC_SAVE_ID_LAST_CHECKPOINT;
         if ( wrexact(io_fd, &i, sizeof(int)) )
@@ -2119,7 +2147,14 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
      * primary vm and secondary vm now.
      */
     if ( !rc && callbacks->postcopy && callbacks->get_dirty_pfn )
-        callbacks->postcopy(callbacks->data);
+    {
+        if ( !callbacks->postcopy(callbacks->data) )
+        {
+            ERROR("postcopy fails");
+            /* postcopy may be implemented in libxl, no way to get errno */
+            rc = -1;
+        }
+    }
 
     /* Enable compression now, finally */
     compressing = (flags & XCFLAGS_CHECKPOINT_COMPRESS);
@@ -2136,8 +2171,11 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
                                io_fd, dom, &info) )
         {
             ERROR("Domain appears not to have suspended");
+            /* postcopy may be implemented in libxl, no way to get errno */
+            errno = -1;
             goto out;
         }
+
         DPRINTF("SUSPEND shinfo %08lx\n", info.shared_info_frame);
         print_stats(xch, dom, 0, &time_stats, &shadow_stats, 1);
 
@@ -2148,6 +2186,16 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error flushing shadow PT");
         }
 
+        if ( callbacks->get_dirty_pfn )
+        {
+            if ( update_dirty_bitmap(callbacks->get_dirty_pfn, callbacks->data,
+                                     dinfo->p2m_size, to_send) )
+            {
+                ERROR("getting secondary vm's dirty pages failed");
+                goto out;
+            }
+        }
+
         goto copypages;
     }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 21/25] send store mfn and console mfn to xl before resuming secondary vm
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (19 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 20/25] COLO: xc related codes Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 22/25] implement the cmdline for COLO Wen Congyang
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But
we need store mfn and console mfn when rebuilding secondary vm. So make
restore_results is a function pointers in callbacks struct and struct
{save,restore}_callbacks, and use this callback to send store mfn and
console mfn to xl.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_restore.c    | 2 +-
 tools/libxc/xenguest.h             | 8 ++++++++
 tools/libxl/libxl_colo_restore.c   | 5 -----
 tools/libxl/libxl_create.c         | 1 +
 tools/libxl/libxl_save_msgs_gen.pl | 2 +-
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 6e025f0..dc66d89 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -2379,7 +2379,7 @@ new_checkpoint:
     } while (0)
         /* COLO */
 
-        /* TODO: call restore_results */
+        callbacks->restore_results(*store_mfn, *console_mfn, callbacks->data);
 
         /* Resume secondary vm */
         frc = callbacks->postcopy(callbacks->data);
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 1aeaad2..be8afd4 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -123,6 +123,14 @@ struct restore_callbacks {
     /* Enable qemu-dm logging dirty pages to xen */
     int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */
 
+    /*
+     * callback to send store mfn and console mfn to xl
+     * if we want to resume vm before xc_domain_save()
+     * exits.
+     */
+    void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn,
+                            void *data);
+
     /* callback to restore toolstack specific data */
     int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
             uint32_t size, void* data);
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index ebbd6b9..aea3feb 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -133,11 +133,6 @@ static void colo_resume_vm(libxl__egc *egc,
         return;
     }
 
-    /*
-     * TODO: get store mfn and console mfn
-     *  We should call the callback restore_results in
-     *  xc_domain_restore() before resuming the guest.
-     */
     libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
 
     return;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 73708c5..677a98a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1048,6 +1048,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
+    callbacks->restore_results = libxl__srm_callout_callback_restore_results;
 
     if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
         crs->ao = ao;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index fbb2d67..2ecd25d 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -32,7 +32,7 @@ our @msgs = (
     #                toolstack_save          done entirely `by hand'
     [  7, 'rcxW',   "toolstack_restore",     [qw(uint32_t domid
                                                 BLOCK tsdata)] ],
-    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+    [  8, 'rcx',    "restore_results",       ['unsigned long', 'store_mfn',
                                               'unsigned long', 'console_mfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 22/25] implement the cmdline for COLO
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (20 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 21/25] send store mfn and console mfn to xl before resuming secondary vm Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 23/25] HACK: do checkpoint per 20ms Wen Congyang
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option -c to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.pod.1         |  9 ++++++++-
 tools/libxl/xl_cmdimpl.c  | 47 ++++++++++++++++++++++++++++++++++++++---------
 tools/libxl/xl_cmdtable.c |  3 ++-
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 547e8a2..accdbe5 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -427,12 +427,15 @@ Print huge (!) amount of debug during the migration process.
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
 
-Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
+Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk buffering support is limited to drbd disks.
 
+     COLO support in xl is still in experimental (proof-of-concept) phase.
+     There is no support for network or disk at the moment.
+
 B<OPTIONS>
 
 =over 4
@@ -473,6 +476,10 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
 On the new host, do not wait in the background (on <host>) for the death
 of the domain. See the corresponding option of the I<create> subcommand.
 
+=item B<-c>
+
+Enable COLO HA. It is conflict with B<-i> and B<-b>.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 22b7964..1839555 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3710,6 +3710,9 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = remus;
+    if (remus == LIBXL_CHECKPOINTED_STREAM_COLO)
+        /* COLO uses stdout to send control message to master */
+        dom_info.quiet = 1;
 
     rc = create_domain(&dom_info);
     if (rc < 0) {
@@ -3724,7 +3727,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
         /* If we are here, it means that the sender (primary) has crashed.
          * TODO: Split-Brain Check.
          */
-        fprintf(stderr, "migration target: Remus Failover for domain %u\n",
+        fprintf(stderr, "migration target: %s Failover for domain %u\n",
+                remus == LIBXL_CHECKPOINTED_STREAM_COLO ? "COLO" : "Remus",
                 domid);
 
         /*
@@ -3741,15 +3745,21 @@ static void migrate_receive(int debug, int daemonize, int monitor,
             rc = libxl_domain_rename(ctx, domid, migration_domname,
                                      common_domname);
             if (rc)
-                fprintf(stderr, "migration target (Remus): "
+                fprintf(stderr, "migration target (%s): "
                         "Failed to rename domain from %s to %s:%d\n",
+                        remus == LIBXL_CHECKPOINTED_STREAM_COLO ? "COLO" : "Remus",
                         migration_domname, common_domname, rc);
         }
 
+        if (remus == LIBXL_CHECKPOINTED_STREAM_COLO)
+            /* The guest is running after failover in COLO mode */
+            exit(rc ? -ERROR_FAIL: 0);
+
         rc = libxl_domain_unpause(ctx, domid);
         if (rc)
-            fprintf(stderr, "migration target (Remus): "
+            fprintf(stderr, "migration target (%s): "
                     "Failed to unpause domain %s (id: %u):%d\n",
+                    remus == LIBXL_CHECKPOINTED_STREAM_COLO ? "COLO" : "Remus",
                     common_domname, domid, rc);
 
         exit(rc ? -ERROR_FAIL: 0);
@@ -3895,7 +3905,7 @@ int main_migrate_receive(int argc, char **argv)
     int debug = 0, daemonize = 1, monitor = 1, remus = 0;
     int opt;
 
-    SWITCH_FOREACH_OPT(opt, "Fedr", NULL, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrc", NULL, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -3907,8 +3917,10 @@ int main_migrate_receive(int argc, char **argv)
         debug = 1;
         break;
     case 'r':
-        remus = 1;
+        remus = LIBXL_CHECKPOINTED_STREAM_REMUS;
         break;
+    case 'c':
+        remus = LIBXL_CHECKPOINTED_STREAM_COLO;
     }
 
     if (argc-optind != 0) {
@@ -7176,6 +7188,7 @@ int main_remus(int argc, char **argv)
     pid_t child = -1;
     uint8_t *config_data;
     int config_len;
+    int interval = 0;
 
     memset(&r_info, 0, sizeof(libxl_domain_remus_info));
     /* Defaults */
@@ -7185,9 +7198,10 @@ int main_remus(int argc, char **argv)
     r_info.netbuf = 1;
     r_info.diskbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bundi:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "bundi:s:N:ec", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
+        interval = 1;
         break;
     case 'b':
         r_info.blackhole = 1;
@@ -7210,11 +7224,23 @@ int main_remus(int argc, char **argv)
     case 'e':
         daemonize = 0;
         break;
+    case 'c':
+        r_info.colo = 1;
     }
 
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (r_info.colo) {
+        if (!interval)
+            r_info.interval = 0;
+
+        if (r_info.interval + r_info.blackhole > 0) {
+            perror("option c is conflict with i or b");
+            exit(-1);
+        }
+    }
+
     if (!r_info.netbufscript)
         r_info.netbufscript = default_remus_netbufscript;
 
@@ -7229,8 +7255,9 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            if (asprintf(&rune, "exec %s %s xl migrate-receive -r %s",
+            if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
                          ssh_command, host,
+                         r_info.colo ? "-c" : "-r",
                          daemonize ? "" : " -e") < 0)
                 return 1;
         }
@@ -7259,7 +7286,8 @@ int main_remus(int argc, char **argv)
      * domain to force failover
      */
     if (libxl_domain_info(ctx, 0, domid)) {
-        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        fprintf(stderr, "%s: Primary domain has been destroyed.\n",
+                r_info.colo ? "COLO" : "Remus");
         close(send_fd);
         return 0;
     }
@@ -7271,7 +7299,8 @@ int main_remus(int argc, char **argv)
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
-        fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n");
+        fprintf(stderr, "%s: Backup failed? resuming domain at primary.\n",
+                r_info.colo ? "COLO" : "Remus");
         libxl_domain_resume(ctx, domid, 1, 0);
     }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 246aa11..b91b638 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -495,7 +495,8 @@ struct cmd_spec cmd_table[] = {
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
       "-e                      Do not wait in the background (on <host>) for the death\n"
-      "                        of the domain."
+      "                        of the domain.\n"
+      "-c                      Enable COLO HA. It is conflict with -i and -b"
     },
 #endif
     { "devd",
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 23/25] HACK: do checkpoint per 20ms
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (21 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 22/25] implement the cmdline for COLO Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:39 ` [RFC Patch 24/25] fix vm entry fail Wen Congyang
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_colo_save.c | 19 ++++++++++++++++++-
 tools/libxl/libxl_internal.h  |  3 +++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index aef6f97..3118f5d 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -43,6 +43,8 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->cds.callback = colo_save_setup_done;
     css->svm_running = false;
 
+    libxl__ev_time_init(&css->timeout);
+
     libxl__checkpoint_devices_setup(egc, &css->cds);
 }
 
@@ -450,6 +452,8 @@ out:
 static void colo_device_commit_cb(libxl__egc *egc,
                                   libxl__checkpoint_device_state *cds,
                                   int rc);
+static void colo_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 static void colo_start_new_checkpoint(libxl__egc *egc,
                                       libxl__checkpoint_device_state *cds,
                                       int rc);
@@ -485,13 +489,26 @@ static void colo_device_commit_cb(libxl__egc *egc,
     }
 
     /* TODO: wait a new checkpoint */
-    colo_start_new_checkpoint(egc, cds, 0);
+    rc = libxl__ev_time_register_rel(gc, &css->timeout,
+                                     colo_next_checkpoint,
+                                     20);
+    if (rc)
+        goto out;
+
     return;
 
 out:
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
 }
 
+static void colo_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(ev, *css, timeout);
+
+    colo_start_new_checkpoint(egc, &css->cds, 0);
+}
+
 static void colo_start_new_checkpoint(libxl__egc *egc,
                                       libxl__checkpoint_device_state *cds,
                                       int rc)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 20f7da8..2fafe1c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2707,6 +2707,9 @@ struct libxl__colo_save_state {
     uint8_t temp_buff[9];
     void (*callback)(libxl__egc *, libxl__colo_save_state *);
     bool svm_running;
+
+    /* hack */
+    libxl__ev_time timeout;
 };
 
 /*----- Domain suspend (save) state structure -----*/
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 24/25] fix vm entry fail
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (22 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 23/25] HACK: do checkpoint per 20ms Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-24 10:40   ` Tim Deegan
  2014-07-18 11:39 ` [RFC Patch 25/25] sync mmu before resuming secondary vm Wen Congyang
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
valid before restoring vmcs. If there is no pending event after
restoring vm, we should clear it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2caa04a..eb73412 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -526,6 +526,13 @@ static int vmx_vmcs_restore(struct vcpu *v, struct hvm_hw_cpu *c)
             vmx_vmcs_exit(v);
         }
     }
+    else
+    {
+        vmx_vmcs_enter(v);
+        __vmwrite(VM_ENTRY_INTR_INFO, 0);
+        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, 0);
+        vmx_vmcs_exit(v);
+    }
 
     return 0;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 25/25] sync mmu before resuming secondary vm
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (23 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 24/25] fix vm entry fail Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-24 10:59   ` Tim Deegan
  2014-07-18 11:39 ` [RFC Patch 26/25] Introduce "xen-load-devices-state" Wen Congyang
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

In out test, we find secondary vm will be bluescreen due to memory
related problem. If we sync mmu, the problem will disappear.

TODO: only vmx+ept is done.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain.c            |  9 +++++++++
 tools/libxc/xenctrl.h              |  2 ++
 tools/libxl/libxl_colo_restore.c   |  6 +++++-
 xen/arch/x86/domctl.c              | 15 +++++++++++++++
 xen/arch/x86/hvm/save.c            |  6 ++++++
 xen/arch/x86/hvm/vmx/vmcs.c        |  8 ++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |  1 +
 xen/include/asm-x86/hvm/hvm.h      |  1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 xen/include/public/domctl.h        |  1 +
 xen/include/xen/hvm/save.h         |  2 ++
 11 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0230c6c..0b47bdd 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2123,6 +2123,15 @@ int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_hvm_sync_mmu(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_hvm_sync_mmu;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 3578b09..a83364a 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -961,6 +961,8 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq);
 int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
                              uint32_t max_port);
 
+int xc_domain_hvm_sync_mmu(xc_interface *xch, uint32_t domid);
+
 /*
  * CPUPOOL MANAGEMENT FUNCTIONS
  */
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index aea3feb..730b492 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -124,11 +124,15 @@ static void colo_resume_vm(libxl__egc *egc,
     STATE_AO_GC(crs->ao);
 
     if (!crs->saved_cb) {
-        /* TODO: sync mmu for hvm? */
+        rc = xc_domain_hvm_sync_mmu(CTX->xch, crs->domid);
+        if (rc)
+            goto fail;
+
         rc = libxl__domain_resume(gc, crs->domid, 0, 1);
         if (rc)
             LOG(ERROR, "cannot resume secondary vm");
 
+fail:
         crcs->callback(egc, crcs, rc);
         return;
     }
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index d62c715..d0dfad7 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1395,6 +1395,21 @@ long arch_do_domctl(
     }
     break;
 
+    case XEN_DOMCTL_hvm_sync_mmu:
+    {
+        struct domain *d;
+
+        ret = -ESRCH;
+        d = rcu_lock_domain_by_id(domctl->domain);
+        if ( d != NULL )
+        {
+            arch_hvm_sync_mmu(d);
+            rcu_unlock_domain(d);
+            ret = 0;
+        }
+    }
+    break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c
index 6af19be..7a07ebf 100644
--- a/xen/arch/x86/hvm/save.c
+++ b/xen/arch/x86/hvm/save.c
@@ -79,6 +79,12 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
     return 0;
 }
 
+void arch_hvm_sync_mmu(struct domain *d)
+{
+    if (hvm_funcs.sync_mmu)
+        hvm_funcs.sync_mmu(d);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8ffc562..4be9b4d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -596,6 +596,14 @@ void vmx_cpu_down(void)
     local_irq_restore(flags);
 }
 
+void vmx_sync_mmu(struct domain *d)
+{
+    ept_sync_domain(p2m_get_hostp2m(d));
+
+    /* flush tlb */
+    flush_all(FLUSH_TLB_GLOBAL);
+}
+
 struct foreign_vmcs {
     struct vcpu *v;
     unsigned int count;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index eb73412..b46b4dd 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1719,6 +1719,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .event_pending        = vmx_event_pending,
     .cpu_up               = vmx_cpu_up,
     .cpu_down             = vmx_cpu_down,
+    .sync_mmu             = vmx_sync_mmu,
     .cpuid_intercept      = vmx_cpuid_intercept,
     .wbinvd_intercept     = vmx_wbinvd_intercept,
     .fpu_dirty_intercept  = vmx_fpu_dirty_intercept,
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 0ebd478..b4f89a7 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -151,6 +151,7 @@ struct hvm_function_table {
 
     int  (*cpu_up)(void);
     void (*cpu_down)(void);
+    void (*sync_mmu)(struct domain *d);
 
     /* Copy up to 15 bytes from cached instruction bytes at current rIP. */
     unsigned int (*get_insn_bytes)(struct vcpu *v, uint8_t *buf);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 215d93c..664741a 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -29,6 +29,7 @@ extern int  vmx_cpu_up_prepare(unsigned int cpu);
 extern void vmx_cpu_dead(unsigned int cpu);
 extern int  vmx_cpu_up(void);
 extern void vmx_cpu_down(void);
+extern void vmx_sync_mmu(struct domain *d);
 extern void vmx_save_host_msrs(void);
 
 struct vmcs_struct {
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 5b11bbf..11e5a26 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1008,6 +1008,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_cacheflush                    71
 #define XEN_DOMCTL_get_vcpu_msrs                 72
 #define XEN_DOMCTL_set_vcpu_msrs                 73
+#define XEN_DOMCTL_hvm_sync_mmu                  74
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
diff --git a/xen/include/xen/hvm/save.h b/xen/include/xen/hvm/save.h
index ae6f0bb..049fdb8 100644
--- a/xen/include/xen/hvm/save.h
+++ b/xen/include/xen/hvm/save.h
@@ -135,4 +135,6 @@ struct hvm_save_header;
 void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr);
 int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr);
 
+void arch_hvm_sync_mmu(struct domain *d);
+
 #endif /* __XEN_HVM_SAVE_H__ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC Patch 26/25] Introduce "xen-load-devices-state"
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (24 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 25/25] sync mmu before resuming secondary vm Wen Congyang
@ 2014-07-18 11:39 ` Wen Congyang
  2014-07-18 11:43 ` [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
  2014-07-18 14:18 ` Andrew Cooper
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:39 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

introduce a "xen-load-devices-state" QAPI command that can be used to load
the state of all devices, but not the RAM or the block devices of the
VM.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 qapi-schema.json |  18 ++++++++
 qmp-commands.hx  |  27 ++++++++++++
 savevm.c         | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 391356f..c569856 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4689,3 +4689,21 @@
               'btn'     : 'InputBtnEvent',
               'rel'     : 'InputMoveEvent',
               'abs'     : 'InputMoveEvent' } }
+
+##
+# @xen-load-devices-state:
+#
+# Load the state of all devices from file. The RAM and the block devices
+# of the VM are not loaded by this command.
+#
+# @filename: the file to load the state of the devices from as binary
+# data. See xen-save-devices-state.txt for a description of the binary
+# format.
+#
+# Returns: Nothing on success
+#          If @filename cannot be opened, OpenFileFailed
+#          If an I/O error occurs while reading the file, IOError
+#
+# Since: 2.0
+##
+{ 'command': 'xen-load-devices-state', 'data': {'filename': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ed3ab92..b796be5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -586,6 +586,33 @@ Example:
 EQMP
 
     {
+        .name       = "xen-load-devices-state",
+        .args_type  = "filename:F",
+        .mhandler.cmd_new = qmp_marshal_input_xen_load_devices_state,
+    },
+
+SQMP
+xen-load-devices-state
+-------
+
+Load the state of all devices from file. The RAM and the block devices
+of the VM are not loaded by this command.
+
+Arguments:
+
+- "filename": the file to load the state of the devices from as binary
+data. See xen-save-devices-state.txt for a description of the binary
+format.
+
+Example:
+
+-> { "execute": "xen-load-devices-state",
+     "arguments": { "filename": "/tmp/resume" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "xen-set-global-dirty-log",
         .args_type  = "enable:b",
         .mhandler.cmd_new = qmp_marshal_input_xen_set_global_dirty_log,
diff --git a/savevm.c b/savevm.c
index 22123be..c6aa502 100644
--- a/savevm.c
+++ b/savevm.c
@@ -863,6 +863,105 @@ out:
     return ret;
 }
 
+static int qemu_load_devices_state(QEMUFile *f)
+{
+    uint8_t section_type;
+    unsigned int v;
+    int ret;
+
+    if (qemu_savevm_state_blocked(NULL)) {
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        return -ENOTSUP;
+    }
+
+    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
+        uint32_t instance_id, version_id, section_id;
+        SaveStateEntry *se;
+        char idstr[257];
+        int len;
+
+        switch (section_type) {
+        case QEMU_VM_SECTION_FULL:
+            /* Read section start */
+            section_id = qemu_get_be32(f);
+            len = qemu_get_byte(f);
+            qemu_get_buffer(f, (uint8_t *)idstr, len);
+            idstr[len] = 0;
+            instance_id = qemu_get_be32(f);
+            version_id = qemu_get_be32(f);
+
+            /* Find savevm section */
+            se = find_se(idstr, instance_id);
+            if (se == NULL) {
+                fprintf(stderr, "Unknown savevm section or instance '%s' %d\n",
+                        idstr, instance_id);
+                ret = -EINVAL;
+                goto out;
+            }
+
+            /* Validate version */
+            if (version_id > se->version_id) {
+                fprintf(stderr, "loadvm: unsupported version %d for '%s' v%d\n",
+                        version_id, idstr, se->version_id);
+                ret = -EINVAL;
+                goto out;
+            }
+
+            /* Validate if it is a device's state */
+            if (se->is_ram) {
+                fprintf(stderr, "loadvm: %s is not devices state\n", idstr);
+                ret = -EINVAL;
+                goto out;
+            }
+
+            ret = vmstate_load(f, se, version_id);
+            if (ret < 0) {
+                fprintf(stderr, "qemu: warning: error while loading state for instance 0x%x of device '%s'\n",
+                        instance_id, idstr);
+                goto out;
+            }
+            break;
+        case QEMU_VM_SECTION_START:
+        case QEMU_VM_SECTION_PART:
+        case QEMU_VM_SECTION_END:
+            /*
+             * The file is saved by the command xen-save-devices-state,
+             * So it should not contain section start/part/end.
+             */
+        default:
+            fprintf(stderr, "Unknown savevm section type %d\n", section_type);
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
+    cpu_synchronize_all_post_init();
+
+    ret = 0;
+
+out:
+    if (ret == 0) {
+        if (qemu_file_get_error(f)) {
+            ret = -EIO;
+        }
+    }
+
+    return ret;
+}
+
 static BlockDriverState *find_vmstate_bs(void)
 {
     BlockDriverState *bs = NULL;
@@ -1027,6 +1126,33 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp)
     }
 }
 
+void qmp_xen_load_devices_state(const char *filename, Error **errp)
+{
+    QEMUFile *f;
+    int saved_vm_running;
+    int ret;
+
+    saved_vm_running = runstate_is_running();
+    vm_stop(RUN_STATE_RESTORE_VM);
+
+    f = qemu_fopen(filename, "rb");
+    if (!f) {
+        error_setg_file_open(errp, errno, filename);
+        goto out;
+    }
+
+    ret = qemu_load_devices_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_set(errp, QERR_IO_ERROR);
+    }
+
+out:
+    if (saved_vm_running) {
+        vm_start();
+    }
+}
+
 int load_vmstate(const char *name)
 {
     BlockDriverState *bs, *bs_vm_state;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (25 preceding siblings ...)
  2014-07-18 11:39 ` [RFC Patch 26/25] Introduce "xen-load-devices-state" Wen Congyang
@ 2014-07-18 11:43 ` Wen Congyang
  2014-07-18 14:18 ` Andrew Cooper
  27 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 11:43 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

At 07/18/2014 07:38 PM, Wen Congyang Wrote:
> Virtual machine (VM) replication is a well known technique for providing
> application-agnostic software-implemented hardware fault tolerance -
> "non-stop service". Currently, remus provides this function, but it buffers
> all output packets, and the latency is unacceptable.
> 
> In xen summit 2012, We introduce a new VM replication solution: colo
> (COarse-grain LOck-stepping virtual machine). The presentation is in
> the following URL:
> http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service
> 
> Here is the summary of the solution:
>>From the client's point of view, as long as the client observes identical
> responses from the primary and secondary VMs, according to the service
> semantics, then the secondary vm is a valid replica of the primary
> vm, and can successfully take over when a hardware failure of the
> primary vm is detected.
> 
> This patchset is RFC, and implements the frame of colo:
> 1. Both primary vm and secondary vm are running
> 2. do checkoint
> 
> This patchset is based on remus-v15, and use migration v1. Only supports hvm
> guest now.
> 
> TODO list:
> 1. rebase to remus-v17 or newer
> 2. support migration v2
> 3. nic/disk replication
> 4. support pvm
> 
> Patch 1-3: bugfix
> Patch 4-6: temporarily update remus to reuse remus device codes
> Patch 7-14: update some APIs which will be used by colo
> Patch 15-22: colo related codes
> Patch 23: Hack patch, just for test
> Patch 24-25: bugfix. We find this bug before rebasing colo to newest xen.
>           But we don't trigger this bug now.
> Patch 26: A patch for qemu-xen

I also put the codes in github:
https://github.com/wencongyang/xen/tree/colo

> 
> Hong Tao (1):
>   copy the correct page to memory
> 
> Wen Congyang (24):
>   csum the correct page
>   don't zero out ioreq page
>   don't touch remus in remus_device
>   rename remus device to checkpoint device
>   adjust the indentation
>   Refactor domain_suspend_callback_common()
>   Update libxl__domain_resume() for colo
>   Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
>   Introduce a new internal API libxl__domain_unpause()
>   Update libxl__domain_unpause() to support qemu-xen
>   support to resume uncooperative HVM guests
>   update datecopier to support sending data only
>   introduce a new API to aync read data from fd
>   Update libxl_save_msgs_gen.pl to support return data from xl to xc
>   Allow slave sends data to master
>   secondary vm suspend/resume/checkpoint code
>   primary vm suspend/get_dirty_pfn/resume/checkpoint code
>   xc_domain_save: flush cache before calling callbacks->postcopy() in
>     colo mode
>   COLO: xc related codes
>   send store mfn and console mfn to xl before resuming secondary vm
>   implement the cmdline for COLO
>   HACK: do checkpoint per 20ms
>   fix vm entry fail
>   sync mmu before resuming secondary vm
> 
>  docs/man/xl.pod.1                                  |   9 +-
>  tools/libxc/xc_domain.c                            |   9 +
>  tools/libxc/xc_domain_restore.c                    |  74 +-
>  tools/libxc/xc_domain_save.c                       |  66 +-
>  tools/libxc/xc_resume.c                            |  20 +-
>  tools/libxc/xenctrl.h                              |   2 +
>  tools/libxc/xenguest.h                             |  40 +
>  tools/libxl/Makefile                               |   3 +-
>  tools/libxl/libxl.c                                | 102 ++-
>  tools/libxl/libxl.h                                |   3 +-
>  tools/libxl/libxl_aoutils.c                        |  81 +-
>  ...xl_remus_device.c => libxl_checkpoint_device.c} | 266 ++++---
>  tools/libxl/libxl_colo.h                           |  48 ++
>  tools/libxl/libxl_colo_restore.c                   | 882 +++++++++++++++++++++
>  tools/libxl/libxl_colo_save.c                      | 602 ++++++++++++++
>  tools/libxl/libxl_create.c                         | 131 ++-
>  tools/libxl/libxl_dom.c                            | 424 ++++++----
>  tools/libxl/libxl_internal.h                       | 262 ++++--
>  tools/libxl/libxl_netbuffer.c                      |  85 +-
>  tools/libxl/libxl_nonetbuffer.c                    |  14 +-
>  tools/libxl/libxl_qmp.c                            |  10 +
>  tools/libxl/libxl_remus_disk_drbd.c                |  54 +-
>  tools/libxl/libxl_save_callout.c                   |  37 +-
>  tools/libxl/libxl_save_helper.c                    |  17 +
>  tools/libxl/libxl_save_msgs_gen.pl                 |  74 +-
>  tools/libxl/libxl_types.idl                        |  12 +-
>  tools/libxl/xl_cmdimpl.c                           |  54 +-
>  tools/libxl/xl_cmdtable.c                          |   3 +-
>  xen/arch/x86/domctl.c                              |  15 +
>  xen/arch/x86/hvm/save.c                            |   6 +
>  xen/arch/x86/hvm/vmx/vmcs.c                        |   8 +
>  xen/arch/x86/hvm/vmx/vmx.c                         |   8 +
>  xen/include/asm-x86/hvm/hvm.h                      |   1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h                 |   1 +
>  xen/include/public/domctl.h                        |   1 +
>  xen/include/xen/hvm/save.h                         |   2 +
>  36 files changed, 2895 insertions(+), 531 deletions(-)
>  rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (47%)
>  create mode 100644 tools/libxl/libxl_colo.h
>  create mode 100644 tools/libxl/libxl_colo_restore.c
>  create mode 100644 tools/libxl/libxl_colo_save.c
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
                   ` (26 preceding siblings ...)
  2014-07-18 11:43 ` [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
@ 2014-07-18 14:18 ` Andrew Cooper
  2014-07-18 14:30   ` Wen Congyang
  27 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-07-18 14:18 UTC (permalink / raw)
  To: Wen Congyang, xen devel
  Cc: Lai Jiangshan, Ian Jackson, Jiang Yunhong, Dong Eddie,
	Yang Hongyang, Ian Campbell

On 18/07/14 12:38, Wen Congyang wrote:
> Virtual machine (VM) replication is a well known technique for providing
> application-agnostic software-implemented hardware fault tolerance -
> "non-stop service". Currently, remus provides this function, but it buffers
> all output packets, and the latency is unacceptable.
>
> In xen summit 2012, We introduce a new VM replication solution: colo
> (COarse-grain LOck-stepping virtual machine). The presentation is in
> the following URL:
> http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service
>
> Here is the summary of the solution:
> >From the client's point of view, as long as the client observes identical
> responses from the primary and secondary VMs, according to the service
> semantics, then the secondary vm is a valid replica of the primary
> vm, and can successfully take over when a hardware failure of the
> primary vm is detected.
>
> This patchset is RFC, and implements the frame of colo:
> 1. Both primary vm and secondary vm are running
> 2. do checkoint
>
> This patchset is based on remus-v15, and use migration v1. Only supports hvm
> guest now.
>
> TODO list:
> 1. rebase to remus-v17 or newer
> 2. support migration v2
> 3. nic/disk replication
> 4. support pvm

This is an interesting set of patches.  One query I have is what you
mean by "support migration v2".

While I am developing migration v2, the old legacy code is left in place
for easier testing and development purposes, but when the migration v2
code does finally get committed, the legacy migration code will be
deleted. 

The legacy code is unfit for purpose and entirely replaced by v2, in a
backwards-compatible manor given the included conversion script (not
that that matters for Remus/colo).

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2014-07-18 14:18 ` Andrew Cooper
@ 2014-07-18 14:30   ` Wen Congyang
  0 siblings, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-18 14:30 UTC (permalink / raw)
  To: Andrew Cooper, Wen Congyang, xen devel
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie,
	Yang Hongyang, Lai Jiangshan

At 2014/7/18 22:18, Andrew Cooper Wrote:
> On 18/07/14 12:38, Wen Congyang wrote:
>> Virtual machine (VM) replication is a well known technique for providing
>> application-agnostic software-implemented hardware fault tolerance -
>> "non-stop service". Currently, remus provides this function, but it buffers
>> all output packets, and the latency is unacceptable.
>>
>> In xen summit 2012, We introduce a new VM replication solution: colo
>> (COarse-grain LOck-stepping virtual machine). The presentation is in
>> the following URL:
>> http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service
>>
>> Here is the summary of the solution:
>> >From the client's point of view, as long as the client observes identical
>> responses from the primary and secondary VMs, according to the service
>> semantics, then the secondary vm is a valid replica of the primary
>> vm, and can successfully take over when a hardware failure of the
>> primary vm is detected.
>>
>> This patchset is RFC, and implements the frame of colo:
>> 1. Both primary vm and secondary vm are running
>> 2. do checkoint
>>
>> This patchset is based on remus-v15, and use migration v1. Only supports hvm
>> guest now.
>>
>> TODO list:
>> 1. rebase to remus-v17 or newer
>> 2. support migration v2
>> 3. nic/disk replication
>> 4. support pvm
>
> This is an interesting set of patches.  One query I have is what you
> mean by "support migration v2".
>
> While I am developing migration v2, the old legacy code is left in place
> for easier testing and development purposes, but when the migration v2
> code does finally get committed, the legacy migration code will be
> deleted.
>
> The legacy code is unfit for purpose and entirely replaced by v2, in a
> backwards-compatible manor given the included conversion script (not
> that that matters for Remus/colo).


We use migration to implement remus/colo. If migration v2 is merged into 
upstream
xen, I think I need to add codes to migration v2, and make colo can work
with migration v2.

Thanks
Wen Congyang

>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 24/25] fix vm entry fail
  2014-07-18 11:39 ` [RFC Patch 24/25] fix vm entry fail Wen Congyang
@ 2014-07-24 10:40   ` Tim Deegan
  2014-07-25  5:39     ` Wen Congyang
  2014-08-07  6:52     ` Wen Congyang
  0 siblings, 2 replies; 36+ messages in thread
From: Tim Deegan @ 2014-07-24 10:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie, xen devel,
	Yang Hongyang, Lai Jiangshan

Hi,

At 19:39 +0800 on 18 Jul (1405708749), Wen Congyang wrote:
> In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
> valid before restoring vmcs. If there is no pending event after
> restoring vm, we should clear it.

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

This looks correct but incomplete -- we should also clear this state for
the other cases where we don't explicitly set it.  I've done this
below, and also copied the fix to the equivalent SVM code.  Can you
test that it works for you?

Cheers,

Tim.

commit c9e81a06c02ffc45594798616409335fc09cd32f
Author: Wen Congyang <wency@cn.fujitsu.com>
Date:   Fri Jul 18 19:39:09 2014 +0800

    x86/hvm: Always set pending event injection when loading VMC[BS] state.
    
    In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
    valid before restoring vmcs. If there is no pending event after
    restoring vm, we should clear it.
    
    Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
    
    Also clear pending software exceptions.
    Copy the fix to SVM as well.
    
    Signed-off-by: Tim Deegan <tim@xen.org>

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 76616ac..6551b38 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -321,16 +321,18 @@ static int svm_vmcb_restore(struct vcpu *v, struct hvm_hw_cpu *c)
         vmcb_set_h_cr3(vmcb, pagetable_get_paddr(p2m_get_pagetable(p2m)));
     }
 
-    if ( c->pending_valid ) 
+    if ( c->pending_valid
+         && hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
     {
         gdprintk(XENLOG_INFO, "Re-injecting %#"PRIx32", %#"PRIx32"\n",
                  c->pending_event, c->error_code);
-
-        if ( hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
-        {
-            vmcb->eventinj.bytes = c->pending_event;
-            vmcb->eventinj.fields.errorcode = c->error_code;
-        }
+        vmcb->eventinj.bytes = c->pending_event;
+        vmcb->eventinj.fields.errorcode = c->error_code;
+    }
+    else
+    {
+        vmcb->eventinj.bytes = 0;
+        vmcb->eventinj.fields.errorcode = 0;
     }
 
     vmcb->cleanbits.bytes = 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2caa04a..cfc4801 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -509,23 +509,22 @@ static int vmx_vmcs_restore(struct vcpu *v, struct hvm_hw_cpu *c)
 
     __vmwrite(GUEST_DR7, c->dr7);
 
-    vmx_vmcs_exit(v);
-
-    paging_update_paging_modes(v);
-
-    if ( c->pending_valid )
+    if ( c->pending_valid
+         && hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
     {
         gdprintk(XENLOG_INFO, "Re-injecting %#"PRIx32", %#"PRIx32"\n",
                  c->pending_event, c->error_code);
-
-        if ( hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
-        {
-            vmx_vmcs_enter(v);
-            __vmwrite(VM_ENTRY_INTR_INFO, c->pending_event);
-            __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, c->error_code);
-            vmx_vmcs_exit(v);
-        }
+        __vmwrite(VM_ENTRY_INTR_INFO, c->pending_event);
+        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, c->error_code);
     }
+    else
+    {
+        __vmwrite(VM_ENTRY_INTR_INFO, 0);
+        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, 0);
+    }
+    vmx_vmcs_exit(v);
+
+    paging_update_paging_modes(v);
 
     return 0;
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 25/25] sync mmu before resuming secondary vm
  2014-07-18 11:39 ` [RFC Patch 25/25] sync mmu before resuming secondary vm Wen Congyang
@ 2014-07-24 10:59   ` Tim Deegan
  2014-07-25  5:46     ` Wen Congyang
  2014-08-07  7:46     ` Wen Congyang
  0 siblings, 2 replies; 36+ messages in thread
From: Tim Deegan @ 2014-07-24 10:59 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie, xen devel,
	Yang Hongyang, Lai Jiangshan

At 19:39 +0800 on 18 Jul (1405708750), Wen Congyang wrote:
> In out test, we find secondary vm will be bluescreen due to memory
> related problem. If we sync mmu, the problem will disappear.

Erk.  Do you understand _why_ this happens?  Do you need both the TLB
flush and the EPT flush to fix it?  

The TLB flush sounds plausible because the migration may have changed
some pagetables and you've elided any TLB flushes that happened on the
source.  For that case I think I'd prefer to make the TLB flush
implicit in the HVM load operation, e.g., by putting a call to
hvm_asid_flush_vcpu(v) into hvm_load_cpu_ctxt() (on the grounds that the
TLB is part of the vcpu state).

Cheers,

Tim.

> TODO: only vmx+ept is done.
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxc/xc_domain.c            |  9 +++++++++
>  tools/libxc/xenctrl.h              |  2 ++
>  tools/libxl/libxl_colo_restore.c   |  6 +++++-
>  xen/arch/x86/domctl.c              | 15 +++++++++++++++
>  xen/arch/x86/hvm/save.c            |  6 ++++++
>  xen/arch/x86/hvm/vmx/vmcs.c        |  8 ++++++++
>  xen/arch/x86/hvm/vmx/vmx.c         |  1 +
>  xen/include/asm-x86/hvm/hvm.h      |  1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
>  xen/include/public/domctl.h        |  1 +
>  xen/include/xen/hvm/save.h         |  2 ++
>  11 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 0230c6c..0b47bdd 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -2123,6 +2123,15 @@ int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
>      return do_domctl(xch, &domctl);
>  }
>  
> +int xc_domain_hvm_sync_mmu(xc_interface *xch, uint32_t domid)
> +{
> +    DECLARE_DOMCTL;
> +
> +    domctl.cmd = XEN_DOMCTL_hvm_sync_mmu;
> +    domctl.domain = domid;
> +    return do_domctl(xch, &domctl);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 3578b09..a83364a 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -961,6 +961,8 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq);
>  int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
>                               uint32_t max_port);
>  
> +int xc_domain_hvm_sync_mmu(xc_interface *xch, uint32_t domid);
> +
>  /*
>   * CPUPOOL MANAGEMENT FUNCTIONS
>   */
> diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
> index aea3feb..730b492 100644
> --- a/tools/libxl/libxl_colo_restore.c
> +++ b/tools/libxl/libxl_colo_restore.c
> @@ -124,11 +124,15 @@ static void colo_resume_vm(libxl__egc *egc,
>      STATE_AO_GC(crs->ao);
>  
>      if (!crs->saved_cb) {
> -        /* TODO: sync mmu for hvm? */
> +        rc = xc_domain_hvm_sync_mmu(CTX->xch, crs->domid);
> +        if (rc)
> +            goto fail;
> +
>          rc = libxl__domain_resume(gc, crs->domid, 0, 1);
>          if (rc)
>              LOG(ERROR, "cannot resume secondary vm");
>  
> +fail:
>          crcs->callback(egc, crcs, rc);
>          return;
>      }
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index d62c715..d0dfad7 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -1395,6 +1395,21 @@ long arch_do_domctl(
>      }
>      break;
>  
> +    case XEN_DOMCTL_hvm_sync_mmu:
> +    {
> +        struct domain *d;
> +
> +        ret = -ESRCH;
> +        d = rcu_lock_domain_by_id(domctl->domain);
> +        if ( d != NULL )
> +        {
> +            arch_hvm_sync_mmu(d);
> +            rcu_unlock_domain(d);
> +            ret = 0;
> +        }
> +    }
> +    break;
> +
>      default:
>          ret = iommu_do_domctl(domctl, d, u_domctl);
>          break;
> diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c
> index 6af19be..7a07ebf 100644
> --- a/xen/arch/x86/hvm/save.c
> +++ b/xen/arch/x86/hvm/save.c
> @@ -79,6 +79,12 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
>      return 0;
>  }
>  
> +void arch_hvm_sync_mmu(struct domain *d)
> +{
> +    if (hvm_funcs.sync_mmu)
> +        hvm_funcs.sync_mmu(d);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 8ffc562..4be9b4d 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -596,6 +596,14 @@ void vmx_cpu_down(void)
>      local_irq_restore(flags);
>  }
>  
> +void vmx_sync_mmu(struct domain *d)
> +{
> +    ept_sync_domain(p2m_get_hostp2m(d));
> +
> +    /* flush tlb */
> +    flush_all(FLUSH_TLB_GLOBAL);
> +}
> +
>  struct foreign_vmcs {
>      struct vcpu *v;
>      unsigned int count;
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index eb73412..b46b4dd 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1719,6 +1719,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .event_pending        = vmx_event_pending,
>      .cpu_up               = vmx_cpu_up,
>      .cpu_down             = vmx_cpu_down,
> +    .sync_mmu             = vmx_sync_mmu,
>      .cpuid_intercept      = vmx_cpuid_intercept,
>      .wbinvd_intercept     = vmx_wbinvd_intercept,
>      .fpu_dirty_intercept  = vmx_fpu_dirty_intercept,
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 0ebd478..b4f89a7 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -151,6 +151,7 @@ struct hvm_function_table {
>  
>      int  (*cpu_up)(void);
>      void (*cpu_down)(void);
> +    void (*sync_mmu)(struct domain *d);
>  
>      /* Copy up to 15 bytes from cached instruction bytes at current rIP. */
>      unsigned int (*get_insn_bytes)(struct vcpu *v, uint8_t *buf);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 215d93c..664741a 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -29,6 +29,7 @@ extern int  vmx_cpu_up_prepare(unsigned int cpu);
>  extern void vmx_cpu_dead(unsigned int cpu);
>  extern int  vmx_cpu_up(void);
>  extern void vmx_cpu_down(void);
> +extern void vmx_sync_mmu(struct domain *d);
>  extern void vmx_save_host_msrs(void);
>  
>  struct vmcs_struct {
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 5b11bbf..11e5a26 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1008,6 +1008,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_cacheflush                    71
>  #define XEN_DOMCTL_get_vcpu_msrs                 72
>  #define XEN_DOMCTL_set_vcpu_msrs                 73
> +#define XEN_DOMCTL_hvm_sync_mmu                  74
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> diff --git a/xen/include/xen/hvm/save.h b/xen/include/xen/hvm/save.h
> index ae6f0bb..049fdb8 100644
> --- a/xen/include/xen/hvm/save.h
> +++ b/xen/include/xen/hvm/save.h
> @@ -135,4 +135,6 @@ struct hvm_save_header;
>  void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr);
>  int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr);
>  
> +void arch_hvm_sync_mmu(struct domain *d);
> +
>  #endif /* __XEN_HVM_SAVE_H__ */
> -- 
> 1.9.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 24/25] fix vm entry fail
  2014-07-24 10:40   ` Tim Deegan
@ 2014-07-25  5:39     ` Wen Congyang
  2014-08-07  6:52     ` Wen Congyang
  1 sibling, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-25  5:39 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie, xen devel,
	Yang Hongyang, Lai Jiangshan

At 07/24/2014 06:40 PM, Tim Deegan Write:
> Hi,
> 
> At 19:39 +0800 on 18 Jul (1405708749), Wen Congyang wrote:
>> In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
>> valid before restoring vmcs. If there is no pending event after
>> restoring vm, we should clear it.
> 
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> 
> This looks correct but incomplete -- we should also clear this state for
> the other cases where we don't explicitly set it.  I've done this
> below, and also copied the fix to the equivalent SVM code.  Can you
> test that it works for you?

Thanks, I will test vmx related codes later(I don't have amd cpu now).

Thanks
Wen Congyang

> 
> Cheers,
> 
> Tim.
> 
> commit c9e81a06c02ffc45594798616409335fc09cd32f
> Author: Wen Congyang <wency@cn.fujitsu.com>
> Date:   Fri Jul 18 19:39:09 2014 +0800
> 
>     x86/hvm: Always set pending event injection when loading VMC[BS] state.
>     
>     In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
>     valid before restoring vmcs. If there is no pending event after
>     restoring vm, we should clear it.
>     
>     Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>     
>     Also clear pending software exceptions.
>     Copy the fix to SVM as well.
>     
>     Signed-off-by: Tim Deegan <tim@xen.org>
> 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 76616ac..6551b38 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -321,16 +321,18 @@ static int svm_vmcb_restore(struct vcpu *v, struct hvm_hw_cpu *c)
>          vmcb_set_h_cr3(vmcb, pagetable_get_paddr(p2m_get_pagetable(p2m)));
>      }
>  
> -    if ( c->pending_valid ) 
> +    if ( c->pending_valid
> +         && hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
>      {
>          gdprintk(XENLOG_INFO, "Re-injecting %#"PRIx32", %#"PRIx32"\n",
>                   c->pending_event, c->error_code);
> -
> -        if ( hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
> -        {
> -            vmcb->eventinj.bytes = c->pending_event;
> -            vmcb->eventinj.fields.errorcode = c->error_code;
> -        }
> +        vmcb->eventinj.bytes = c->pending_event;
> +        vmcb->eventinj.fields.errorcode = c->error_code;
> +    }
> +    else
> +    {
> +        vmcb->eventinj.bytes = 0;
> +        vmcb->eventinj.fields.errorcode = 0;
>      }
>  
>      vmcb->cleanbits.bytes = 0;
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 2caa04a..cfc4801 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -509,23 +509,22 @@ static int vmx_vmcs_restore(struct vcpu *v, struct hvm_hw_cpu *c)
>  
>      __vmwrite(GUEST_DR7, c->dr7);
>  
> -    vmx_vmcs_exit(v);
> -
> -    paging_update_paging_modes(v);
> -
> -    if ( c->pending_valid )
> +    if ( c->pending_valid
> +         && hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
>      {
>          gdprintk(XENLOG_INFO, "Re-injecting %#"PRIx32", %#"PRIx32"\n",
>                   c->pending_event, c->error_code);
> -
> -        if ( hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
> -        {
> -            vmx_vmcs_enter(v);
> -            __vmwrite(VM_ENTRY_INTR_INFO, c->pending_event);
> -            __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, c->error_code);
> -            vmx_vmcs_exit(v);
> -        }
> +        __vmwrite(VM_ENTRY_INTR_INFO, c->pending_event);
> +        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, c->error_code);
>      }
> +    else
> +    {
> +        __vmwrite(VM_ENTRY_INTR_INFO, 0);
> +        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, 0);
> +    }
> +    vmx_vmcs_exit(v);
> +
> +    paging_update_paging_modes(v);
>  
>      return 0;
>  }
> .
> 


-- 
ÒÔÉϤǤ¹
¤è¤í¤·¤¯¤ªîŠ¤¤¤·¤Þ¤¹
Ώ¾Ñó
--------------------------------------------------
Wen Congyang
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
Postcode: 210012
TEL£º+86+25-86630566-8503
FUJITSU INTERNAL£º79955-8503
FAX£º+86+25-83317685
Mail£ºwency@cn.fujitsu.com
--------------------------------------------------
This communication is for use by the intended recipient(s) only and
may contain information that is privileged, confidential and
exempt from disclosure under applicable law.
If you are not an intended recipient of this communication,
you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.
If you have received this communication in error,
please notify me by reply e-mail,
permanently delete this communication from your system,
and destroy any hard copies you may have printed.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 25/25] sync mmu before resuming secondary vm
  2014-07-24 10:59   ` Tim Deegan
@ 2014-07-25  5:46     ` Wen Congyang
  2014-08-07  7:46     ` Wen Congyang
  1 sibling, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-07-25  5:46 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie, xen devel,
	Yang Hongyang, Lai Jiangshan

At 07/24/2014 06:59 PM, Tim Deegan Write:
> At 19:39 +0800 on 18 Jul (1405708750), Wen Congyang wrote:
>> In out test, we find secondary vm will be bluescreen due to memory
>> related problem. If we sync mmu, the problem will disappear.
> 
> Erk.  Do you understand _why_ this happens?  Do you need both the TLB
> flush and the EPT flush to fix it?  
> 
> The TLB flush sounds plausible because the migration may have changed
> some pagetables and you've elided any TLB flushes that happened on the
> source.  For that case I think I'd prefer to make the TLB flush
> implicit in the HVM load operation, e.g., by putting a call to
> hvm_asid_flush_vcpu(v) into hvm_load_cpu_ctxt() (on the grounds that the
> TLB is part of the vcpu state).

Hmm, I don't investigate this problem deeply. I will try you suggestion.

Thanks
Wen Congyang

> 
> Cheers,
> 
> Tim.
> 
>> TODO: only vmx+ept is done.
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  tools/libxc/xc_domain.c            |  9 +++++++++
>>  tools/libxc/xenctrl.h              |  2 ++
>>  tools/libxl/libxl_colo_restore.c   |  6 +++++-
>>  xen/arch/x86/domctl.c              | 15 +++++++++++++++
>>  xen/arch/x86/hvm/save.c            |  6 ++++++
>>  xen/arch/x86/hvm/vmx/vmcs.c        |  8 ++++++++
>>  xen/arch/x86/hvm/vmx/vmx.c         |  1 +
>>  xen/include/asm-x86/hvm/hvm.h      |  1 +
>>  xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
>>  xen/include/public/domctl.h        |  1 +
>>  xen/include/xen/hvm/save.h         |  2 ++
>>  11 files changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>> index 0230c6c..0b47bdd 100644
>> --- a/tools/libxc/xc_domain.c
>> +++ b/tools/libxc/xc_domain.c
>> @@ -2123,6 +2123,15 @@ int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
>>      return do_domctl(xch, &domctl);
>>  }
>>  
>> +int xc_domain_hvm_sync_mmu(xc_interface *xch, uint32_t domid)
>> +{
>> +    DECLARE_DOMCTL;
>> +
>> +    domctl.cmd = XEN_DOMCTL_hvm_sync_mmu;
>> +    domctl.domain = domid;
>> +    return do_domctl(xch, &domctl);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
>> index 3578b09..a83364a 100644
>> --- a/tools/libxc/xenctrl.h
>> +++ b/tools/libxc/xenctrl.h
>> @@ -961,6 +961,8 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq);
>>  int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
>>                               uint32_t max_port);
>>  
>> +int xc_domain_hvm_sync_mmu(xc_interface *xch, uint32_t domid);
>> +
>>  /*
>>   * CPUPOOL MANAGEMENT FUNCTIONS
>>   */
>> diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
>> index aea3feb..730b492 100644
>> --- a/tools/libxl/libxl_colo_restore.c
>> +++ b/tools/libxl/libxl_colo_restore.c
>> @@ -124,11 +124,15 @@ static void colo_resume_vm(libxl__egc *egc,
>>      STATE_AO_GC(crs->ao);
>>  
>>      if (!crs->saved_cb) {
>> -        /* TODO: sync mmu for hvm? */
>> +        rc = xc_domain_hvm_sync_mmu(CTX->xch, crs->domid);
>> +        if (rc)
>> +            goto fail;
>> +
>>          rc = libxl__domain_resume(gc, crs->domid, 0, 1);
>>          if (rc)
>>              LOG(ERROR, "cannot resume secondary vm");
>>  
>> +fail:
>>          crcs->callback(egc, crcs, rc);
>>          return;
>>      }
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index d62c715..d0dfad7 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -1395,6 +1395,21 @@ long arch_do_domctl(
>>      }
>>      break;
>>  
>> +    case XEN_DOMCTL_hvm_sync_mmu:
>> +    {
>> +        struct domain *d;
>> +
>> +        ret = -ESRCH;
>> +        d = rcu_lock_domain_by_id(domctl->domain);
>> +        if ( d != NULL )
>> +        {
>> +            arch_hvm_sync_mmu(d);
>> +            rcu_unlock_domain(d);
>> +            ret = 0;
>> +        }
>> +    }
>> +    break;
>> +
>>      default:
>>          ret = iommu_do_domctl(domctl, d, u_domctl);
>>          break;
>> diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c
>> index 6af19be..7a07ebf 100644
>> --- a/xen/arch/x86/hvm/save.c
>> +++ b/xen/arch/x86/hvm/save.c
>> @@ -79,6 +79,12 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
>>      return 0;
>>  }
>>  
>> +void arch_hvm_sync_mmu(struct domain *d)
>> +{
>> +    if (hvm_funcs.sync_mmu)
>> +        hvm_funcs.sync_mmu(d);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 8ffc562..4be9b4d 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -596,6 +596,14 @@ void vmx_cpu_down(void)
>>      local_irq_restore(flags);
>>  }
>>  
>> +void vmx_sync_mmu(struct domain *d)
>> +{
>> +    ept_sync_domain(p2m_get_hostp2m(d));
>> +
>> +    /* flush tlb */
>> +    flush_all(FLUSH_TLB_GLOBAL);
>> +}
>> +
>>  struct foreign_vmcs {
>>      struct vcpu *v;
>>      unsigned int count;
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index eb73412..b46b4dd 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1719,6 +1719,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>>      .event_pending        = vmx_event_pending,
>>      .cpu_up               = vmx_cpu_up,
>>      .cpu_down             = vmx_cpu_down,
>> +    .sync_mmu             = vmx_sync_mmu,
>>      .cpuid_intercept      = vmx_cpuid_intercept,
>>      .wbinvd_intercept     = vmx_wbinvd_intercept,
>>      .fpu_dirty_intercept  = vmx_fpu_dirty_intercept,
>> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
>> index 0ebd478..b4f89a7 100644
>> --- a/xen/include/asm-x86/hvm/hvm.h
>> +++ b/xen/include/asm-x86/hvm/hvm.h
>> @@ -151,6 +151,7 @@ struct hvm_function_table {
>>  
>>      int  (*cpu_up)(void);
>>      void (*cpu_down)(void);
>> +    void (*sync_mmu)(struct domain *d);
>>  
>>      /* Copy up to 15 bytes from cached instruction bytes at current rIP. */
>>      unsigned int (*get_insn_bytes)(struct vcpu *v, uint8_t *buf);
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 215d93c..664741a 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -29,6 +29,7 @@ extern int  vmx_cpu_up_prepare(unsigned int cpu);
>>  extern void vmx_cpu_dead(unsigned int cpu);
>>  extern int  vmx_cpu_up(void);
>>  extern void vmx_cpu_down(void);
>> +extern void vmx_sync_mmu(struct domain *d);
>>  extern void vmx_save_host_msrs(void);
>>  
>>  struct vmcs_struct {
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 5b11bbf..11e5a26 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -1008,6 +1008,7 @@ struct xen_domctl {
>>  #define XEN_DOMCTL_cacheflush                    71
>>  #define XEN_DOMCTL_get_vcpu_msrs                 72
>>  #define XEN_DOMCTL_set_vcpu_msrs                 73
>> +#define XEN_DOMCTL_hvm_sync_mmu                  74
>>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>> diff --git a/xen/include/xen/hvm/save.h b/xen/include/xen/hvm/save.h
>> index ae6f0bb..049fdb8 100644
>> --- a/xen/include/xen/hvm/save.h
>> +++ b/xen/include/xen/hvm/save.h
>> @@ -135,4 +135,6 @@ struct hvm_save_header;
>>  void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr);
>>  int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr);
>>  
>> +void arch_hvm_sync_mmu(struct domain *d);
>> +
>>  #endif /* __XEN_HVM_SAVE_H__ */
>> -- 
>> 1.9.3
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> .
> 


-- 
ÒÔÉϤǤ¹
¤è¤í¤·¤¯¤ªîŠ¤¤¤·¤Þ¤¹
Ώ¾Ñó
--------------------------------------------------
Wen Congyang
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
Postcode: 210012
TEL£º+86+25-86630566-8503
FUJITSU INTERNAL£º79955-8503
FAX£º+86+25-83317685
Mail£ºwency@cn.fujitsu.com
--------------------------------------------------
This communication is for use by the intended recipient(s) only and
may contain information that is privileged, confidential and
exempt from disclosure under applicable law.
If you are not an intended recipient of this communication,
you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.
If you have received this communication in error,
please notify me by reply e-mail,
permanently delete this communication from your system,
and destroy any hard copies you may have printed.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 24/25] fix vm entry fail
  2014-07-24 10:40   ` Tim Deegan
  2014-07-25  5:39     ` Wen Congyang
@ 2014-08-07  6:52     ` Wen Congyang
  1 sibling, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-08-07  6:52 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie, xen devel,
	Yang Hongyang, Lai Jiangshan

At 07/24/2014 06:40 PM, Tim Deegan Write:
> Hi,
> 
> At 19:39 +0800 on 18 Jul (1405708749), Wen Congyang wrote:
>> In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
>> valid before restoring vmcs. If there is no pending event after
>> restoring vm, we should clear it.
> 
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> 
> This looks correct but incomplete -- we should also clear this state for
> the other cases where we don't explicitly set it.  I've done this
> below, and also copied the fix to the equivalent SVM code.  Can you
> test that it works for you?

Sorry for later reply.

COLO for upstream is not finished, and I cannot reproduce this bug.
So, I use old version colo(based on xen 4.1), and test this patch.
I works for me.

Thanks for your help
Wen Congyang

> 
> Cheers,
> 
> Tim.
> 
> commit c9e81a06c02ffc45594798616409335fc09cd32f
> Author: Wen Congyang <wency@cn.fujitsu.com>
> Date:   Fri Jul 18 19:39:09 2014 +0800
> 
>     x86/hvm: Always set pending event injection when loading VMC[BS] state.
>     
>     In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
>     valid before restoring vmcs. If there is no pending event after
>     restoring vm, we should clear it.
>     
>     Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>     
>     Also clear pending software exceptions.
>     Copy the fix to SVM as well.
>     
>     Signed-off-by: Tim Deegan <tim@xen.org>
> 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 76616ac..6551b38 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -321,16 +321,18 @@ static int svm_vmcb_restore(struct vcpu *v, struct hvm_hw_cpu *c)
>          vmcb_set_h_cr3(vmcb, pagetable_get_paddr(p2m_get_pagetable(p2m)));
>      }
>  
> -    if ( c->pending_valid ) 
> +    if ( c->pending_valid
> +         && hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
>      {
>          gdprintk(XENLOG_INFO, "Re-injecting %#"PRIx32", %#"PRIx32"\n",
>                   c->pending_event, c->error_code);
> -
> -        if ( hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
> -        {
> -            vmcb->eventinj.bytes = c->pending_event;
> -            vmcb->eventinj.fields.errorcode = c->error_code;
> -        }
> +        vmcb->eventinj.bytes = c->pending_event;
> +        vmcb->eventinj.fields.errorcode = c->error_code;
> +    }
> +    else
> +    {
> +        vmcb->eventinj.bytes = 0;
> +        vmcb->eventinj.fields.errorcode = 0;
>      }
>  
>      vmcb->cleanbits.bytes = 0;
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 2caa04a..cfc4801 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -509,23 +509,22 @@ static int vmx_vmcs_restore(struct vcpu *v, struct hvm_hw_cpu *c)
>  
>      __vmwrite(GUEST_DR7, c->dr7);
>  
> -    vmx_vmcs_exit(v);
> -
> -    paging_update_paging_modes(v);
> -
> -    if ( c->pending_valid )
> +    if ( c->pending_valid
> +         && hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
>      {
>          gdprintk(XENLOG_INFO, "Re-injecting %#"PRIx32", %#"PRIx32"\n",
>                   c->pending_event, c->error_code);
> -
> -        if ( hvm_event_needs_reinjection(c->pending_type, c->pending_vector) )
> -        {
> -            vmx_vmcs_enter(v);
> -            __vmwrite(VM_ENTRY_INTR_INFO, c->pending_event);
> -            __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, c->error_code);
> -            vmx_vmcs_exit(v);
> -        }
> +        __vmwrite(VM_ENTRY_INTR_INFO, c->pending_event);
> +        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, c->error_code);
>      }
> +    else
> +    {
> +        __vmwrite(VM_ENTRY_INTR_INFO, 0);
> +        __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, 0);
> +    }
> +    vmx_vmcs_exit(v);
> +
> +    paging_update_paging_modes(v);
>  
>      return 0;
>  }
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC Patch 25/25] sync mmu before resuming secondary vm
  2014-07-24 10:59   ` Tim Deegan
  2014-07-25  5:46     ` Wen Congyang
@ 2014-08-07  7:46     ` Wen Congyang
  1 sibling, 0 replies; 36+ messages in thread
From: Wen Congyang @ 2014-08-07  7:46 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Ian Campbell, Ian Jackson, Jiang Yunhong, Dong Eddie, xen devel,
	Yang Hongyang, Lai Jiangshan

At 07/24/2014 06:59 PM, Tim Deegan Write:
> At 19:39 +0800 on 18 Jul (1405708750), Wen Congyang wrote:
>> In out test, we find secondary vm will be bluescreen due to memory
>> related problem. If we sync mmu, the problem will disappear.
> 
> Erk.  Do you understand _why_ this happens?  Do you need both the TLB
> flush and the EPT flush to fix it?  

I cannot reproduce this problem now. But Some user processes will exited
due to memory problem.

Secondary vm is running like this:
1. running
2. stop
3. update the the state(both memory and device)
4. continue running

Before step4, I think the tlb for the guest is out-of-data.
I don't have any knowledge about EPT. So I don't know if the
ept is out-of-data too.

Anyway, I will remove this patch in next version.

Thanks
Wen Congyang

> 
> The TLB flush sounds plausible because the migration may have changed
> some pagetables and you've elided any TLB flushes that happened on the
> source.  For that case I think I'd prefer to make the TLB flush
> implicit in the HVM load operation, e.g., by putting a call to
> hvm_asid_flush_vcpu(v) into hvm_load_cpu_ctxt() (on the grounds that the
> TLB is part of the vcpu state).
> 
> Cheers,
> 
> Tim.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-08-07  7:46 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
2014-07-18 11:38 ` [RFC Patch 01/25] copy the correct page to memory Wen Congyang
2014-07-18 11:38 ` [RFC Patch 02/25] csum the correct page Wen Congyang
2014-07-18 11:38 ` [RFC Patch 03/25] don't zero out ioreq page Wen Congyang
2014-07-18 11:38 ` [RFC Patch 04/25] don't touch remus in remus_device Wen Congyang
2014-07-18 11:38 ` [RFC Patch 05/25] rename remus device to checkpoint device Wen Congyang
2014-07-18 11:38 ` [RFC Patch 06/25] adjust the indentation Wen Congyang
2014-07-18 11:38 ` [RFC Patch 07/25] Refactor domain_suspend_callback_common() Wen Congyang
2014-07-18 11:38 ` [RFC Patch 08/25] Update libxl__domain_resume() for colo Wen Congyang
2014-07-18 11:38 ` [RFC Patch 09/25] Update libxl__domain_suspend_common_switch_qemu_logdirty() " Wen Congyang
2014-07-18 11:38 ` [RFC Patch 10/25] Introduce a new internal API libxl__domain_unpause() Wen Congyang
2014-07-18 11:38 ` [RFC Patch 11/25] Update libxl__domain_unpause() to support qemu-xen Wen Congyang
2014-07-18 11:38 ` [RFC Patch 12/25] support to resume uncooperative HVM guests Wen Congyang
2014-07-18 11:38 ` [RFC Patch 13/25] update datecopier to support sending data only Wen Congyang
2014-07-18 11:38 ` [RFC Patch 14/25] introduce a new API to aync read data from fd Wen Congyang
2014-07-18 11:39 ` [RFC Patch 15/25] Update libxl_save_msgs_gen.pl to support return data from xl to xc Wen Congyang
2014-07-18 11:39 ` [RFC Patch 16/25] Allow slave sends data to master Wen Congyang
2014-07-18 11:39 ` [RFC Patch 17/25] secondary vm suspend/resume/checkpoint code Wen Congyang
2014-07-18 11:39 ` [RFC Patch 18/25] primary vm suspend/get_dirty_pfn/resume/checkpoint code Wen Congyang
2014-07-18 11:39 ` [RFC Patch 19/25] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Wen Congyang
2014-07-18 11:39 ` [RFC Patch 20/25] COLO: xc related codes Wen Congyang
2014-07-18 11:39 ` [RFC Patch 21/25] send store mfn and console mfn to xl before resuming secondary vm Wen Congyang
2014-07-18 11:39 ` [RFC Patch 22/25] implement the cmdline for COLO Wen Congyang
2014-07-18 11:39 ` [RFC Patch 23/25] HACK: do checkpoint per 20ms Wen Congyang
2014-07-18 11:39 ` [RFC Patch 24/25] fix vm entry fail Wen Congyang
2014-07-24 10:40   ` Tim Deegan
2014-07-25  5:39     ` Wen Congyang
2014-08-07  6:52     ` Wen Congyang
2014-07-18 11:39 ` [RFC Patch 25/25] sync mmu before resuming secondary vm Wen Congyang
2014-07-24 10:59   ` Tim Deegan
2014-07-25  5:46     ` Wen Congyang
2014-08-07  7:46     ` Wen Congyang
2014-07-18 11:39 ` [RFC Patch 26/25] Introduce "xen-load-devices-state" Wen Congyang
2014-07-18 11:43 ` [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
2014-07-18 14:18 ` Andrew Cooper
2014-07-18 14:30   ` Wen Congyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.