All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2015-04-01  6:41 Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
                   ` (28 more replies)
  0 siblings, 29 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

This patchset is for xen-4.6. The main diffrence from previous versions are:
1. Use qdisk block replication
   http://wiki.qemu.org/Features/BlockReplication
2. Nic replication based on colo-proxy
   http://wiki.qemu.org/Features/COLO#Components
Note that COLO feature is under active development, this version is not well
tested and has some known problems.
We post this early in order to give you a brief impression about how COLO
will be implemented and we request for your comments about the general idea
of COLO and of course the implementation, if you have any idea/suggestion
on COLO, please do not hesitate to give your comments, thanks in advance.

Virtual machine (VM) replication is a well known technique for providing
application-agnostic software-implemented hardware fault tolerance -
"non-stop service". Currently, remus provides this function, but it buffers
all output packets, and the latency is unacceptable.
In xen summit 2012, We introduce a new VM replication solution: colo
(COarse-grain LOck-stepping virtual machine). The presentation is in
the following URL:
http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service

Here is the summary of the solution:
>From the client's point of view, as long as the client observes identical
responses from the primary and secondary VMs, according to the service
semantics, then the secondary vm is a valid replica of the primary
vm, and can successfully take over when a hardware failure of the
primary vm is detected.

This patchset is based on migration v1.
Only supports hvm guest now. The codes are also hosted on github:
https://github.com/macrosheep/xen/tree/COLO_RFC_v5

TODO list:
1. Code reviews and Bug fixes
2. Switch to migration v2
3. Support pvm

Known bugs:
1. Secondary vm may crash due to triple fault.

Wiki pages:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
http://wiki.qemu.org/Features/COLO

Patch 1    : Add readme
Patch 2-8  : Some refactor and prepare work
Patch 9-12 : Update remus to reuse remus device codes
Patch 13-21: COLO framework related codes
Patch 22-23: implement disk replication
Patch 24-29: implement nic replication

Changelog from v4 to v5:
1. rebase to the latest xen upstream
2. disk replication: blktap2->qdisk
3. nic replication: colo-agent->colo-proxy

Changelog from v3 to v4:
1. rebase to newest xen
2. bug fix

Changlog from v2 to v3:
1. rebase to newest remus
2. add nic replication support

Changlog from v1 to v2:
1. rebase to newest remus
2. add disk replication support

Wen Congyang (23):
  Add readme
  Refactor domain_suspend_callback_common()
  tools: libxl: introduce a new API libxl__domain_restore() to read qemu
    state
  Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
  Introduce a new internal API libxl__domain_unpause()
  Update libxl__domain_unpause() to support qemu-xen
  support to resume uncooperative HVM guests
  tools/libxl: Introduce bitops macros
  move remus related codes to libxl_remus.c
  rename remus device to checkpoint device
  adjust the indentation
  don't touch remus in checkpoint_device
  Update libxl_save_msgs_gen.pl to support return data from xl to xc
  Allow slave sends data to master
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/get_dirty_pfn/resume/checkpoint code
  xc_domain_save: flush cache before calling callbacks->postcopy() in
    colo mode
  COLO: xc related codes
  send store mfn and console mfn to xl before resuming secondary vm
  implement the cmdline for COLO
  tools: xc_doamin_restore: zero ioreq page only one time
  Support colo mode for qemu disk
  COLO: use qemu block replication

Yang Hongyang (6):
  COLO proxy: implement setup/teardown of COLO proxy module
  COLO proxy: preresume, postresume and checkpoint
  COLO nic: implement COLO nic subkind
  setup and control colo proxy on primary side
  setup and control colo proxy on secondary side
  cmdline switches and config vars to control colo-proxy

 docs/README.colo                      |   92 +++
 docs/man/xl.conf.pod.5                |    6 +
 docs/man/xl.pod.1                     |   11 +-
 tools/hotplug/Linux/Makefile          |    1 +
 tools/hotplug/Linux/colo-proxy-setup  |  128 ++++
 tools/libxc/include/xenguest.h        |   40 ++
 tools/libxc/xc_domain_restore.c       |  106 ++-
 tools/libxc/xc_domain_save.c          |   71 +-
 tools/libxc/xc_resume.c               |   20 +-
 tools/libxl/Makefile                  |    6 +-
 tools/libxl/libxl.c                   |  185 +++--
 tools/libxl/libxl_bitops.h            |   79 +++
 tools/libxl/libxl_checkpoint_device.c |  282 ++++++++
 tools/libxl/libxl_colo.h              |   53 ++
 tools/libxl/libxl_colo_nic.c          |  313 +++++++++
 tools/libxl/libxl_colo_proxy.c        |  267 ++++++++
 tools/libxl/libxl_colo_qdisk.c        |  209 ++++++
 tools/libxl/libxl_colo_restore.c      | 1190 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_save.c         |  782 ++++++++++++++++++++++
 tools/libxl/libxl_create.c            |  166 ++++-
 tools/libxl/libxl_device.c            |   38 ++
 tools/libxl/libxl_dm.c                |  262 +++++++-
 tools/libxl/libxl_dom.c               |  569 +++++++---------
 tools/libxl/libxl_internal.h          |  302 ++++++---
 tools/libxl/libxl_netbuffer.c         |  117 ++--
 tools/libxl/libxl_nonetbuffer.c       |   10 +-
 tools/libxl/libxl_qmp.c               |   41 ++
 tools/libxl/libxl_remus.c             |  373 +++++++++++
 tools/libxl/libxl_remus.h             |   27 +
 tools/libxl/libxl_remus_device.c      |  327 ---------
 tools/libxl/libxl_remus_disk_drbd.c   |   57 +-
 tools/libxl/libxl_save_callout.c      |   37 +-
 tools/libxl/libxl_save_helper.c       |   17 +
 tools/libxl/libxl_save_msgs_gen.pl    |   74 +-
 tools/libxl/libxl_types.idl           |   20 +-
 tools/libxl/libxlu_disk_l.l           |    5 +
 tools/libxl/xl.c                      |    3 +
 tools/libxl/xl.h                      |    1 +
 tools/libxl/xl_cmdimpl.c              |  101 ++-
 tools/libxl/xl_cmdtable.c             |    4 +-
 40 files changed, 5413 insertions(+), 979 deletions(-)
 create mode 100644 docs/README.colo
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_bitops.h
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_nic.c
 create mode 100644 tools/libxl/libxl_colo_proxy.c
 create mode 100644 tools/libxl/libxl_colo_qdisk.c
 create mode 100644 tools/libxl/libxl_colo_restore.c
 create mode 100644 tools/libxl/libxl_colo_save.c
 create mode 100644 tools/libxl/libxl_remus.c
 create mode 100644 tools/libxl/libxl_remus.h
 delete mode 100644 tools/libxl/libxl_remus_device.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 01/29] Add readme
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:11   ` Wei Liu
  2015-04-01  6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/README.colo | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)
 create mode 100644 docs/README.colo

diff --git a/docs/README.colo b/docs/README.colo
new file mode 100644
index 0000000..60f487d
--- /dev/null
+++ b/docs/README.colo
@@ -0,0 +1,92 @@
+COLO provides fault tolerance for virtual machines by sending continuous
+checkpoints to a backup, which will activate if the target VM fails. It
+only supports HVM guest(without pv extensions).
+
+Requriements:
+1. Hardware requriements
+   There is at least one directly connected nic to forward the nic from client
+   to secondary vm. The directly connected nic must not be used by any other
+   purpose. If your guest has more than one nic, you should have directly
+   connected nic for each guest nic. If you don't have enouth directly connected
+   nic, you can use vlan.
+2. Dom0 requirements
+   - Support dom0
+   - kernel module:
+        sch_ingress
+        cls_basic
+        cls_tcindex
+        cls_u32
+        act_mirred
+   - libnl-tools >= 3.0. This package provides the command nl-qdisc-list, and
+     colo need this command.
+   - If your host os has OEM-released xen tools, please uninstall it first.
+   - You can load the module which is not provided by OEM.
+3. Guest requirements
+   Only HVM guest(without pv extensions) is supported now. If you want to
+   use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not
+   supported now because I don't find any way to disable pv extensions.
+   If you want to use REDHAT or Ubuntu, you need to build the newest
+   kernel which has the parameter xen_nopv.
+
+Network link topology
+   Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology
+
+The steps to setup COLO environment:
+You need to recompile your host kernel because colo-proxy module need cooperate
+with linux kernel.
+Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare
+1. Build and install xen
+2. Apply the patch for qemu xen, and rebuild xen tools:
+    - cd tools/qemu-xen-dir
+    - use git am to apply the patch:
+      https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch
+    - make tools && make install-tools
+    Note: You must use qemu-xen. qemu-xen-traditional is not supported.
+3. Install COLO proxy module:
+    3.1 Download COLO proxy, compile and install it:
+        https://github.com/gao-feng/colo-proxy.git
+    3.2 Download iptables patch, it is based on v1.4.21 compile and install it:
+        https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch
+4. Install the guest
+    4.1 Add "xen_platform_pci=0" into the guest configfile
+    4.2 If you use suse, please select physical machine
+    4.3 copy the disk image to the secondary host
+5. Update your guest config file for COLO:
+    5.1 disk
+        disk = [
+        'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img' ]
+    5.2 nic
+        vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth0, forwardbr=br1' ]
+    Note:
+    a. The ip/port in colo-params is the secondary host's IP. Don't use the
+       directly connected nic's IP.
+    b. forwarddev is the directly connected nic.
+    c. If you have more than one disk, colo-params's host/port must be the same
+       and colo-param's exportname must be different.
+6. Run COLO:
+    xl remus -c -u <domname> <secondary host IP>
+    Note: The ip must not be the directly connected nic's IP.
+Note:
+Secondary host only need to do step 1-3.
+
+The known problem:
+1. Secondary vm may crash due to triple fault.
+2. The heartbeat is not reliable. If you want to test the performance,
+   please disable the heartbeat(modify the xen codes). You can use the
+   branch colo-v4-noheartbeat.
+3. Suspending the vm fails, and the error message is:
+    libxl: error: libxl_qmp.c:429:qmp_next: timeout
+
+Problem 1 and 3 don't happen every time. So you can run colo again to
+avoid this problem.
+
+Virtio-Net:
+1. If you want to get better performance, you can use virtio-net.
+
+Trouble shooting:
+If there's some error happend when staritng COLO, you can do:
+1. Make sure you have all necessary modules that DOM0 needed on both side.
+2. Make sure you have followed all the instructions in this README.
+3. Try to reboot both primary and secondary host.
+4. If you still have problems, collect the error logs and contact
+   Wen Congyang(wency@cn.fujitsu.com)/Yang Hongyang(yanghy@cn.fujitsu.com).
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common()
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:11   ` Wei Liu
  2015-04-22 14:45   ` Ian Campbell
  2015-04-01  6:41 ` [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state Yang Hongyang
                   ` (26 subsequent siblings)
  28 siblings, 2 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

libxl__domain_suspend() is to save the guest. I think
we should call it libxl__domain_save(), but I don't
rename it.

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. suspend both primay vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
To suspend secondary vm, we need an independent API to
suspend vm.

The core function to suspend vm is domain_suspend_callback_common().
So use a new structure libxl__domain_suspend_state2 to
instead of libxl__domain_suspend_state. The dss's members that
will be used in domain_suspend_callback_common() are
moved to dss2.

We introduce a new API libxl__domain_suspend2() too.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom.c      | 235 ++++++++++++++++++++++++-------------------
 tools/libxl/libxl_internal.h |  39 +++++--
 2 files changed, 159 insertions(+), 115 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 26a0382..d286851 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1094,7 +1094,7 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
+                                libxl__domain_suspend_state2 *dss2, int ok);
 
 /*----- complicated callback, called by xc_domain_save -----*/
 
@@ -1312,16 +1312,17 @@ static void switch_logdirty_done(libxl__egc *egc,
 /*----- callbacks, called by xc_domain_save -----*/
 
 int libxl__domain_suspend_device_model(libxl__gc *gc,
-                                       libxl__domain_suspend_state *dss)
+                                       libxl__domain_suspend_state2 *dss2)
 {
     int ret = 0;
-    uint32_t const domid = dss->domid;
-    const char *const filename = dss->dm_savefile;
+    uint32_t const domid = dss2->domid;
+    const char *const filename = dss2->dm_savefile;
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
         LOG(DEBUG, "Saving device model state to %s", filename);
-        libxl__qemu_traditional_cmd(gc, domid, "save");
+        if (dss2->save_dm)
+            libxl__qemu_traditional_cmd(gc, domid, "save");
         libxl__wait_for_device_model_deprecated(gc, domid, "paused", NULL, NULL, NULL);
         break;
     }
@@ -1329,9 +1330,11 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
         if (libxl__qmp_stop(gc, domid))
             return ERROR_FAIL;
         /* Save DM state into filename */
-        ret = libxl__qmp_save(gc, domid, filename);
-        if (ret)
-            unlink(filename);
+        if (dss2->save_dm) {
+            ret = libxl__qmp_save(gc, domid, filename);
+            if (ret)
+                unlink(filename);
+        }
         break;
     default:
         return ERROR_INVAL;
@@ -1361,9 +1364,9 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
 }
 
 static void domain_suspend_common_wait_guest(libxl__egc *egc,
-                                             libxl__domain_suspend_state *dss);
+                                             libxl__domain_suspend_state2 *dss2);
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss);
+                                         libxl__domain_suspend_state2 *dss2);
 
 static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
       libxl__xswait_state *xswa, int rc, const char *state);
@@ -1372,14 +1375,14 @@ static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path);
 static void suspend_common_wait_guest_check(libxl__egc *egc,
-        libxl__domain_suspend_state *dss);
+                                            libxl__domain_suspend_state2 *dss2);
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs);
 
 static void domain_suspend_common_failed(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss);
+                                         libxl__domain_suspend_state2 *dss2);
 static void domain_suspend_common_done(libxl__egc *egc,
-                                       libxl__domain_suspend_state *dss,
+                                       libxl__domain_suspend_state2 *dss2,
                                        bool ok);
 
 static bool domain_suspend_pvcontrol_acked(const char *state) {
@@ -1388,36 +1391,36 @@ static bool domain_suspend_pvcontrol_acked(const char *state) {
     return strcmp(state,"suspend");
 }
 
-/* calls dss->callback_common_done when done */
+/* calls dss2->callback_common_done when done */
 static void domain_suspend_callback_common(libxl__egc *egc,
-                                           libxl__domain_suspend_state *dss)
+                                           libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     uint64_t hvm_s_state = 0, hvm_pvdrv = 0;
     int ret, rc;
 
     /* Convenience aliases */
-    const uint32_t domid = dss->domid;
+    const uint32_t domid = dss2->domid;
 
-    if (dss->hvm) {
+    if (dss2->hvm) {
         xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, &hvm_pvdrv);
         xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, &hvm_s_state);
     }
 
-    if ((hvm_s_state == 0) && (dss->guest_evtchn.port >= 0)) {
+    if ((hvm_s_state == 0) && (dss2->guest_evtchn.port >= 0)) {
         LOG(DEBUG, "issuing %s suspend request via event channel",
-            dss->hvm ? "PVHVM" : "PV");
-        ret = xc_evtchn_notify(CTX->xce, dss->guest_evtchn.port);
+            dss2->hvm ? "PVHVM" : "PV");
+        ret = xc_evtchn_notify(CTX->xce, dss2->guest_evtchn.port);
         if (ret < 0) {
             LOG(ERROR, "xc_evtchn_notify failed ret=%d", ret);
             goto err;
         }
 
-        dss->guest_evtchn.callback = domain_suspend_common_wait_guest_evtchn;
-        rc = libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
+        dss2->guest_evtchn.callback = domain_suspend_common_wait_guest_evtchn;
+        rc = libxl__ev_evtchn_wait(gc, &dss2->guest_evtchn);
         if (rc) goto err;
 
-        rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
+        rc = libxl__ev_time_register_rel(gc, &dss2->guest_timeout,
                                          suspend_common_wait_guest_timeout,
                                          60*1000);
         if (rc) goto err;
@@ -1425,7 +1428,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
         return;
     }
 
-    if (dss->hvm && (!hvm_pvdrv || hvm_s_state)) {
+    if (dss2->hvm && (!hvm_pvdrv || hvm_s_state)) {
         LOG(DEBUG, "Calling xc_domain_shutdown on HVM domain");
         ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
         if (ret < 0) {
@@ -1433,55 +1436,55 @@ static void domain_suspend_callback_common(libxl__egc *egc,
             goto err;
         }
         /* The guest does not (need to) respond to this sort of request. */
-        dss->guest_responded = 1;
-        domain_suspend_common_wait_guest(egc, dss);
+        dss2->guest_responded = 1;
+        domain_suspend_common_wait_guest(egc, dss2);
         return;
     }
 
     LOG(DEBUG, "issuing %s suspend request via XenBus control node",
-        dss->hvm ? "PVHVM" : "PV");
+        dss2->hvm ? "PVHVM" : "PV");
 
     libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend");
 
-    dss->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
-    if (!dss->pvcontrol.path) goto err;
+    dss2->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
+    if (!dss2->pvcontrol.path) goto err;
 
-    dss->pvcontrol.ao = ao;
-    dss->pvcontrol.what = "guest acknowledgement of suspend request";
-    dss->pvcontrol.timeout_ms = 60 * 1000;
-    dss->pvcontrol.callback = domain_suspend_common_pvcontrol_suspending;
-    libxl__xswait_start(gc, &dss->pvcontrol);
+    dss2->pvcontrol.ao = ao;
+    dss2->pvcontrol.what = "guest acknowledgement of suspend request";
+    dss2->pvcontrol.timeout_ms = 60 * 1000;
+    dss2->pvcontrol.callback = domain_suspend_common_pvcontrol_suspending;
+    libxl__xswait_start(gc, &dss2->pvcontrol);
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
-        libxl__ev_evtchn *evev)
+                                                    libxl__ev_evtchn *evev)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(evev, *dss, guest_evtchn);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(evev, *dss2, guest_evtchn);
+    STATE_AO_GC(dss2->ao);
     /* If we should be done waiting, suspend_common_wait_guest_check
      * will end up calling domain_suspend_common_guest_suspended or
      * domain_suspend_common_failed, both of which cancel the evtchn
      * wait.  So re-enable it now. */
-    libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
-    suspend_common_wait_guest_check(egc, dss);
+    libxl__ev_evtchn_wait(gc, &dss2->guest_evtchn);
+    suspend_common_wait_guest_check(egc, dss2);
 }
 
 static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
       libxl__xswait_state *xswa, int rc, const char *state)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(xswa, *dss, pvcontrol);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(xswa, *dss2, pvcontrol);
+    STATE_AO_GC(dss2->ao);
     xs_transaction_t t = 0;
 
     if (!rc && !domain_suspend_pvcontrol_acked(state))
         /* keep waiting */
         return;
 
-    libxl__xswait_stop(gc, &dss->pvcontrol);
+    libxl__xswait_stop(gc, &dss2->pvcontrol);
 
     if (rc == ERROR_TIMEDOUT) {
         /*
@@ -1524,56 +1527,56 @@ static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
     LOG(DEBUG, "guest acknowledged suspend request");
 
     libxl__xs_transaction_abort(gc, &t);
-    dss->guest_responded = 1;
-    domain_suspend_common_wait_guest(egc,dss);
+    dss2->guest_responded = 1;
+    domain_suspend_common_wait_guest(egc,dss2);
     return;
 
  err:
     libxl__xs_transaction_abort(gc, &t);
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
     return;
 }
 
 static void domain_suspend_common_wait_guest(libxl__egc *egc,
-                                             libxl__domain_suspend_state *dss)
+                                             libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     int rc;
 
     LOG(DEBUG, "wait for the guest to suspend");
 
-    rc = libxl__ev_xswatch_register(gc, &dss->guest_watch,
+    rc = libxl__ev_xswatch_register(gc, &dss2->guest_watch,
                                     suspend_common_wait_guest_watch,
                                     "@releaseDomain");
     if (rc) goto err;
 
-    rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
+    rc = libxl__ev_time_register_rel(gc, &dss2->guest_timeout,
                                      suspend_common_wait_guest_timeout,
                                      60*1000);
     if (rc) goto err;
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(xsw, *dss, guest_watch);
-    suspend_common_wait_guest_check(egc, dss);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(xsw, *dss2, guest_watch);
+    suspend_common_wait_guest_check(egc, dss2);
 }
 
 static void suspend_common_wait_guest_check(libxl__egc *egc,
-        libxl__domain_suspend_state *dss)
+                                            libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     xc_domaininfo_t info;
     int ret;
     int shutdown_reason;
 
     /* Convenience aliases */
-    const uint32_t domid = dss->domid;
+    const uint32_t domid = dss2->domid;
 
     ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info);
     if (ret < 0) {
@@ -1600,59 +1603,59 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
     }
 
     LOG(DEBUG, "guest has suspended");
-    domain_suspend_common_guest_suspended(egc, dss);
+    domain_suspend_common_guest_suspended(egc, dss2);
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, guest_timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state2 *dss2 = CONTAINER_OF(ev, *dss2, guest_timeout);
+    STATE_AO_GC(dss2->ao);
     LOG(ERROR, "guest did not suspend, timed out");
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_failed(egc, dss2);
 }
 
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss)
+                                            libxl__domain_suspend_state2 *dss2)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dss2->ao);
     int ret;
 
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
-    libxl__ev_time_deregister(gc, &dss->guest_timeout);
+    libxl__ev_evtchn_cancel(gc, &dss2->guest_evtchn);
+    libxl__ev_xswatch_deregister(gc, &dss2->guest_watch);
+    libxl__ev_time_deregister(gc, &dss2->guest_timeout);
 
-    if (dss->hvm) {
-        ret = libxl__domain_suspend_device_model(gc, dss);
+    if (dss2->hvm) {
+        ret = libxl__domain_suspend_device_model(gc, dss2);
         if (ret) {
             LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret);
-            domain_suspend_common_failed(egc, dss);
+            domain_suspend_common_failed(egc, dss2);
             return;
         }
     }
-    domain_suspend_common_done(egc, dss, 1);
+    domain_suspend_common_done(egc, dss2, 1);
 }
 
 static void domain_suspend_common_failed(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss)
+                                         libxl__domain_suspend_state2 *dss2)
 {
-    domain_suspend_common_done(egc, dss, 0);
+    domain_suspend_common_done(egc, dss2, 0);
 }
 
 static void domain_suspend_common_done(libxl__egc *egc,
-                                       libxl__domain_suspend_state *dss,
+                                       libxl__domain_suspend_state2 *dss2,
                                        bool ok)
 {
     EGC_GC;
-    assert(!libxl__xswait_inuse(&dss->pvcontrol));
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
-    libxl__ev_time_deregister(gc, &dss->guest_timeout);
-    dss->callback_common_done(egc, dss, ok);
+    assert(!libxl__xswait_inuse(&dss2->pvcontrol));
+    libxl__ev_evtchn_cancel(gc, &dss2->guest_evtchn);
+    libxl__ev_xswatch_deregister(gc, &dss2->guest_watch);
+    libxl__ev_time_deregister(gc, &dss2->guest_timeout);
+    dss2->callback_common_done(egc, dss2, ok);
 }
 
 static inline char *physmap_path(libxl__gc *gc, uint32_t domid,
@@ -1745,19 +1748,24 @@ static void libxl__domain_suspend_callback(void *data)
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
 
-    dss->callback_common_done = domain_suspend_callback_common_done;
-    domain_suspend_callback_common(egc, dss);
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *dss2 = &dss->dss2;
+
+    dss2->callback_common_done = domain_suspend_callback_common_done;
+    domain_suspend_callback_common(egc, dss2);
 }
 
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok)
+                                libxl__domain_suspend_state2 *dss2, int ok)
 {
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
 /*----- remus callbacks -----*/
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
+                                libxl__domain_suspend_state2 *dss2, int ok);
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
                                          libxl__remus_devices_state *rds,
                                          int rc);
@@ -1771,13 +1779,18 @@ static void libxl__remus_domain_suspend_callback(void *data)
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
 
-    dss->callback_common_done = remus_domain_suspend_callback_common_done;
-    domain_suspend_callback_common(egc, dss);
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *const dss2 = &dss->dss2;
+
+    dss2->callback_common_done = remus_domain_suspend_callback_common_done;
+    domain_suspend_callback_common(egc, dss2);
 }
 
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok)
+                                libxl__domain_suspend_state2 *dss2, int ok)
 {
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
     if (!ok)
         goto out;
 
@@ -1939,6 +1952,11 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
 }
 
 /*----- main code for suspending, in order of execution -----*/
+void libxl__domain_suspend2(libxl__egc *egc,
+                            libxl__domain_suspend_state2 *dss2)
+{
+    domain_suspend_callback_common(egc, dss2);
+}
 
 void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 {
@@ -1954,20 +1972,23 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     const libxl_domain_remus_info *const r_info = dss->remus;
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->shs.callbacks.save.a;
+    libxl__domain_suspend_state2 *dss2 = &dss->dss2;
 
     logdirty_init(&dss->logdirty);
-    libxl__xswait_init(&dss->pvcontrol);
-    libxl__ev_evtchn_init(&dss->guest_evtchn);
-    libxl__ev_xswatch_init(&dss->guest_watch);
-    libxl__ev_time_init(&dss->guest_timeout);
+    libxl__xswait_init(&dss2->pvcontrol);
+    libxl__ev_evtchn_init(&dss2->guest_evtchn);
+    libxl__ev_xswatch_init(&dss2->guest_watch);
+    libxl__ev_time_init(&dss2->guest_timeout);
 
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
         dss->hvm = 1;
+        dss2->hvm = 1;
         break;
     }
     case LIBXL_DOMAIN_TYPE_PV:
         dss->hvm = 0;
+        dss2->hvm = 0;
         break;
     default:
         abort();
@@ -1977,10 +1998,13 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
           | (debug ? XCFLAGS_DEBUG : 0)
           | (dss->hvm ? XCFLAGS_HVM : 0);
 
-    dss->guest_evtchn.port = -1;
-    dss->guest_evtchn_lockfd = -1;
-    dss->guest_responded = 0;
-    dss->dm_savefile = libxl__device_model_savefile(gc, domid);
+    dss2->guest_evtchn.port = -1;
+    dss2->guest_evtchn_lockfd = -1;
+    dss2->guest_responded = 0;
+    dss2->dm_savefile = libxl__device_model_savefile(gc, domid);
+    dss2->domid = domid;
+    dss2->ao = ao;
+    dss2->save_dm = 1;
 
     if (r_info != NULL) {
         dss->interval = r_info->interval;
@@ -1994,11 +2018,11 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
         rc = libxl__ctx_evtchn_init(gc);
         if (rc) goto out;
 
-        dss->guest_evtchn.port =
+        dss2->guest_evtchn.port =
             xc_suspend_evtchn_init_exclusive(CTX->xch, CTX->xce,
-                                  dss->domid, port, &dss->guest_evtchn_lockfd);
+                                  dss->domid, port, &dss2->guest_evtchn_lockfd);
 
-        if (dss->guest_evtchn.port < 0) {
+        if (dss2->guest_evtchn.port < 0) {
             LOG(WARN, "Suspend event channel initialization failed");
             rc = ERROR_FAIL;
             goto out;
@@ -2037,10 +2061,10 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
 
     if (retval) {
         LOGEV(ERROR, errnoval, "saving domain: %s",
-                         dss->guest_responded ?
+                         dss->dss2.guest_responded ?
                          "domain responded to suspend request" :
                          "domain did not respond to suspend request");
-        if ( !dss->guest_responded )
+        if ( !dss->dss2.guest_responded )
             rc = ERROR_GUEST_TIMEDOUT;
         else
             rc = ERROR_FAIL;
@@ -2048,7 +2072,7 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        rc = libxl__domain_suspend_device_model(gc, dss);
+        rc = libxl__domain_suspend_device_model(gc, &dss->dss2);
         if (rc) goto out;
 
         libxl__domain_save_device_model(egc, dss, domain_suspend_done);
@@ -2076,7 +2100,7 @@ void libxl__domain_save_device_model(libxl__egc *egc,
     dss->save_dm_callback = callback;
 
     /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
+    const char *const filename = dss->dss2.dm_savefile;
     const int fd = dss->fd;
 
     libxl__datacopier_state *dc = &dss->save_dm_datacopier;
@@ -2133,7 +2157,7 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     STATE_AO_GC(dss->ao);
 
     /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
+    const char *const filename = dss->dss2.dm_savefile;
     int our_rc = 0;
     int rc;
 
@@ -2164,12 +2188,13 @@ static void domain_suspend_done(libxl__egc *egc,
 
     /* Convenience aliases */
     const uint32_t domid = dss->domid;
+    libxl__domain_suspend_state2 *const dss2 = &dss->dss2;
 
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
+    libxl__ev_evtchn_cancel(gc, &dss2->guest_evtchn);
 
-    if (dss->guest_evtchn.port > 0)
+    if (dss2->guest_evtchn.port > 0)
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
-                           dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
+                           dss2->guest_evtchn.port, &dss2->guest_evtchn_lockfd);
 
     if (!dss->remus) {
         dss->callback(egc, dss, rc);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7f81dd3..c1ea498 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2801,6 +2801,7 @@ _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 /*----- Domain suspend (save) state structure -----*/
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
+typedef struct libxl__domain_suspend_state2 libxl__domain_suspend_state2;
 
 typedef void libxl__domain_suspend_cb(libxl__egc*,
                                       libxl__domain_suspend_state*, int rc);
@@ -2815,6 +2816,29 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+/*
+ * libxl__domain_suspend_state is for saving guest, not
+ * for suspending guest. We need to an independent API
+ * to suspend guest only.
+ */
+struct libxl__domain_suspend_state2 {
+    /* set by caller of libxl__domain_suspend2 */
+    libxl__ao *ao;
+
+    uint32_t domid;
+    libxl__ev_evtchn guest_evtchn;;
+    int guest_evtchn_lockfd;
+    int hvm;
+    const char *dm_savefile;
+    void (*callback_common_done)(libxl__egc*,
+                                 libxl__domain_suspend_state2*, int ok);
+    int save_dm;
+    int guest_responded;
+    libxl__xswait_state pvcontrol;
+    libxl__ev_xswatch guest_watch;
+    libxl__ev_time guest_timeout;
+};
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2827,22 +2851,14 @@ struct libxl__domain_suspend_state {
     int debug;
     const libxl_domain_remus_info *remus;
     /* private */
-    libxl__ev_evtchn guest_evtchn;
-    int guest_evtchn_lockfd;
+    libxl__domain_suspend_state2 dss2;
     int hvm;
     int xcflags;
-    int guest_responded;
-    libxl__xswait_state pvcontrol;
-    libxl__ev_xswatch guest_watch;
-    libxl__ev_time guest_timeout;
-    const char *dm_savefile;
     libxl__remus_devices_state rds;
     libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
-    void (*callback_common_done)(libxl__egc*,
-                                 struct libxl__domain_suspend_state*, int ok);
     /* private for libxl__domain_save_device_model */
     libxl__save_device_model_cb *save_dm_callback;
     libxl__datacopier_state save_dm_datacopier;
@@ -3116,6 +3132,9 @@ struct libxl__domain_create_state {
 
 /*----- Domain suspend (save) functions -----*/
 
+/* calls dss2->callback_common_done when done */
+_hidden void libxl__domain_suspend2(libxl__egc *egc,
+                                    libxl__domain_suspend_state2 *dss2);
 /* calls dss->callback when done */
 _hidden void libxl__domain_suspend(libxl__egc *egc,
                                    libxl__domain_suspend_state *dss);
@@ -3155,7 +3174,7 @@ _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
 
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
-                                           libxl__domain_suspend_state *dss);
+                                           libxl__domain_suspend_state2 *dss2);
 _hidden void libxl__domain_save_device_model(libxl__egc *egc,
                                      libxl__domain_suspend_state *dss,
                                      libxl__save_device_model_cb *callback);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:11   ` Wei Liu
  2015-04-01  6:41 ` [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Yang Hongyang
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. suspend both primay vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
We will send qemu's state each time in step2, and
slave's qemu should read it each time before resuming
secondary vm. Introduce a new API libxl__domain_restore()
to do it. This API should be called before resuming
secondary vm.

Note: we should update qemu to support it.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          | 18 ++++++++++++++++++
 tools/libxl/libxl_dom.c      | 26 ++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |  4 ++++
 tools/libxl/libxl_qmp.c      | 10 ++++++++++
 4 files changed, 58 insertions(+)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2a735b3..6e55afc 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -510,6 +510,24 @@ int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid,
     return rc;
 }
 
+int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
+{
+    int rc = 0;
+
+    libxl_domain_type type = libxl__domain_type(gc, domid);
+    if (type != LIBXL_DOMAIN_TYPE_HVM) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__domain_restore_device_model(gc, domid);
+    if (rc)
+        LOG(ERROR, "failed to restore device mode for domain %u:%d",
+            domid, rc);
+out:
+    return rc;
+}
+
 int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
 {
     int rc = 0;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d286851..fd0c5c2 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1343,6 +1343,32 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
     return ret;
 }
 
+int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
+{
+    char *state_file;
+    int rc;
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        /* not supported now */
+        return ERROR_FAIL;
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        /*
+         * This function may be called too many times for the same gc,
+         * so we use NOGC, and free the memory before return to avoid
+         * OOM.
+         */
+        state_file = libxl__sprintf(NOGC,
+                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
+                                    domid);
+        rc = libxl__qmp_restore(gc, domid, state_file);
+        free(state_file);
+        return rc;
+    default:
+        return ERROR_INVAL;
+    }
+}
+
 int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
 {
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c1ea498..3b4e6c4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1016,6 +1016,7 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
 
 _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
                                      uint32_t size, void *data);
+_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
 
 _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
@@ -1033,6 +1034,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
                                   const char *userdata_userid,
                                   const uint8_t *data, int datalen);
 
+_hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
                                  int suspend_cancel);
 
@@ -1629,6 +1631,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
 _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
 /* Save current QEMU state into fd. */
 _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
+/* Load current QEMU state from fd. */
+_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
 /* Set dirty bitmap logging status */
 _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 9aa7e2e..1b66d55 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -892,6 +892,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
                            NULL, NULL);
 }
 
+int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "filename", (char *)state_file);
+
+    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
+                           NULL, NULL);
+}
+
 static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
                       char *device, char *target, char *arg)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (2 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:12   ` Wei Liu
  2015-04-01  6:41 ` [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause() Yang Hongyang
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Secondary vm is running in colo mode. So we need to
send secondary vm's dirty page information to master.
libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses domain_suspend_state, and calls
libxl__xc_domain_saverestore_async_callback_done()
before exits.

Introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds->callback before exits.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom.c      | 79 +++++++++++++++++++++++++++-----------------
 tools/libxl/libxl_internal.h | 12 +++++--
 2 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index fd0c5c2..eb4ed94 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1111,7 +1111,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss, int ok);
+                                 libxl__logdirty_switch *lds, int ok);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -1122,12 +1122,10 @@ static void logdirty_init(libxl__logdirty_switch *lds)
 
 static void domain_suspend_switch_qemu_xen_traditional_logdirty
                                (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                                libxl__logdirty_switch *lds,
+                                libxl__egc *egc)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
     xs_transaction_t t = 0;
     const char *got;
@@ -1188,64 +1186,85 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
     LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
     libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,lds,-1);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
                                (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                                libxl__logdirty_switch *lds,
+                                libxl__egc *egc)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
 
     rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
     if (!rc) {
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        lds->callback(egc, lds, 0);
     } else {
         LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, -1);
     }
 }
 
+static void libxl__domain_suspend_switch_qemu_logdirty_done
+                                (libxl__egc *egc,
+                                 libxl__logdirty_switch *lds,
+                                 int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(lds, *dss, logdirty);
+
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, rc);
+}
+
 void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned enable, void *user)
 {
     libxl__save_helper_state *shs = user;
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
+
+    /* convenience aliases */
+    libxl__logdirty_switch *const lds = &dss->logdirty;
+
+    lds->callback = libxl__domain_suspend_switch_qemu_logdirty_done;
+
+    libxl__domain_common_switch_qemu_logdirty(domid, enable, lds, egc);
+}
+
+void libxl__domain_common_switch_qemu_logdirty(int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds,
+                                               libxl__egc *egc)
+{
+    STATE_AO_GC(lds->ao);
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable,
+                                                            lds, egc);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_logdirty(domid, enable, lds, egc);
         break;
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, aborting suspend");
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, -1);
     }
 }
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
                                     const struct timeval *requested_abs)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(ev, *lds, timeout);
+    STATE_AO_GC(lds->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,lds,-1);
 }
 
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
                             const char *watch_path, const char *event_path)
 {
-    libxl__domain_suspend_state *dss =
-        CONTAINER_OF(watch, *dss, logdirty.watch);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(watch, *lds, watch);
+    STATE_AO_GC(lds->ao);
     const char *got;
     xs_transaction_t t = 0;
     int rc;
@@ -1289,24 +1308,23 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
     libxl__xs_transaction_abort(gc, &t);
 
     if (!rc) {
-        switch_logdirty_done(egc,dss,0);
+        switch_logdirty_done(egc,lds,0);
     } else if (rc < 0) {
         LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,-1);
+        switch_logdirty_done(egc,lds,-1);
     }
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss,
+                                 libxl__logdirty_switch *lds,
                                  int broke)
 {
-    STATE_AO_GC(dss->ao);
-    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(lds->ao);
 
     libxl__ev_xswatch_deregister(gc, &lds->watch);
     libxl__ev_time_deregister(gc, &lds->timeout);
 
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, broke);
+    lds->callback(egc, lds, broke);
 }
 
 /*----- callbacks, called by xc_domain_save -----*/
@@ -2001,6 +2019,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     libxl__domain_suspend_state2 *dss2 = &dss->dss2;
 
     logdirty_init(&dss->logdirty);
+    dss->logdirty.ao = ao;
     libxl__xswait_init(&dss2->pvcontrol);
     libxl__ev_evtchn_init(&dss2->guest_evtchn);
     libxl__ev_xswatch_init(&dss2->guest_watch);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3b4e6c4..6470866 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2812,13 +2812,18 @@ typedef void libxl__domain_suspend_cb(libxl__egc*,
 typedef void libxl__save_device_model_cb(libxl__egc*,
                                          libxl__domain_suspend_state*, int rc);
 
-typedef struct libxl__logdirty_switch {
+typedef struct libxl__logdirty_switch libxl__logdirty_switch;
+struct libxl__logdirty_switch {
+    /* set by caller of libxl__domain_common_switch_qemu_logdirty */
+    libxl__ao *ao;
+    void (*callback)(libxl__egc *egc, libxl__logdirty_switch *lds, int rc);
+
     const char *cmd;
     const char *cmd_path;
     const char *ret_path;
     libxl__ev_xswatch watch;
     libxl__ev_time timeout;
-} libxl__logdirty_switch;
+};
 
 /*
  * libxl__domain_suspend_state is for saving guest, not
@@ -3162,6 +3167,9 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
+_hidden void libxl__domain_common_switch_qemu_logdirty
+                                (int domid, unsigned int enable,
+                                 libxl__logdirty_switch *lds, libxl__egc *egc);
 _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         uint32_t *len, void *data);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause()
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (3 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:12   ` Wei Liu
  2015-04-01  6:41 ` [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

The guest is paused after libxl_domain_create_restore().
Secondary vm is running in colo mode. So we need to unpause
the guest. The current API libxl_domain_unpause() is
not an internal API. Introduce a new API to support it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          | 21 +++++++++++++++------
 tools/libxl/libxl_internal.h |  1 +
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6e55afc..c3898ce 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1037,9 +1037,8 @@ out:
     return AO_INPROGRESS;
 }
 
-int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
+int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
 {
-    GC_INIT(ctx);
     char *path;
     char *state;
     int ret, rc = 0;
@@ -1059,12 +1058,22 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
                                          NULL, NULL, NULL);
         }
     }
-    ret = xc_domain_unpause(ctx->xch, domid);
-    if (ret<0) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
+
+    ret = xc_domain_unpause(CTX->xch, domid);
+    if (ret < 0) {
+        LOGE(ERROR, "unpausing domain %d", domid);
         rc = ERROR_FAIL;
     }
- out:
+
+out:
+    return rc;
+}
+
+int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
+{
+    GC_INIT(ctx);
+    int rc = libxl__domain_unpause(gc, domid);
+
     GC_FREE;
     return rc;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6470866..538ac4b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1037,6 +1037,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
                                  int suspend_cancel);
+_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
 
 /* returns 0 or 1, or a libxl error code */
 _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (4 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause() Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:12   ` Wei Liu
  2015-04-01  6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Currently, libxl__domain_unpause() only supports
qemu-xen-traditional. Update it to support qemu-xen.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          | 13 +++++--------
 tools/libxl/libxl_dom.c      | 25 +++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |  2 ++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c3898ce..58629ed 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1039,8 +1039,6 @@ out:
 
 int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
 {
-    char *path;
-    char *state;
     int ret, rc = 0;
 
     libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -1050,12 +1048,11 @@ int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
-        state = libxl__xs_read(gc, XBT_NULL, path);
-        if (state != NULL && !strcmp(state, "paused")) {
-            libxl__qemu_traditional_cmd(gc, domid, "continue");
-            libxl__wait_for_device_model_deprecated(gc, domid, "running",
-                                         NULL, NULL, NULL);
+        rc = libxl__domain_unpause_device_model(gc, domid);
+        if (rc < 0) {
+            LOG(ERROR, "failed to unpause device model for domain %u:%d",
+                domid, rc);
+            goto out;
         }
     }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index eb4ed94..a3fce46 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -2272,6 +2272,31 @@ static void remus_teardown_done(libxl__egc *egc,
     dss->callback(egc, dss, rc);
 }
 
+int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
+{
+    char *path;
+    char *state;
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
+        state = libxl__xs_read(gc, XBT_NULL, path);
+        if (state != NULL && !strcmp(state, "paused")) {
+            libxl__qemu_traditional_cmd(gc, domid, "continue");
+            libxl__wait_for_device_model_deprecated(gc, domid, "running",
+                                         NULL, NULL, NULL);
+        }
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        if (libxl__qmp_resume(gc, domid))
+            return ERROR_FAIL;
+        break;
+    default:
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 /*==================== Miscellaneous ====================*/
 
 char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 538ac4b..8d229ac 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1018,6 +1018,8 @@ _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
                                      uint32_t size, void *data);
 _hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
+_hidden int libxl__domain_unpause_device_model(libxl__gc *gc,
+                                               uint32_t domid);
 
 _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
                                          const char *userdata_userid,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (5 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-08 18:12   ` Wei Liu
  2015-04-22 14:54   ` Ian Campbell
  2015-04-01  6:41 ` [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros Yang Hongyang
                   ` (21 subsequent siblings)
  28 siblings, 2 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

For PVHVM, the hypercall return code is 0, and it can be resumed
in a new domain context.

For HVM, do nothing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_resume.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e67bebd..b862ce3 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -109,6 +109,21 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
     return do_domctl(xch, &domctl);
 }
 
+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    /*
+     * If it is PVHVM, the hypercall return code is 0, and resume
+     * it in a new domain context.
+     *
+     * If it is a HVM, do nothing.
+     */
+    domctl.cmd = XEN_DOMCTL_resumedomain;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
+}
+
 static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
 {
     DECLARE_DOMCTL;
@@ -138,10 +153,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
      */
 #if defined(__i386__) || defined(__x86_64__)
     if ( info.hvm )
-    {
-        ERROR("Cannot resume uncooperative HVM guests");
-        return rc;
-    }
+        return xc_domain_resume_hvm(xch, domid);
 
     if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (6 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-22 15:10   ` Ian Campbell
  2015-04-01  6:41 ` [RFC PATCH COLO v5 09/29] move remus related codes to libxl_remus.c Yang Hongyang
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

This is the same set used by libxc.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_bitops.h | 79 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 tools/libxl/libxl_bitops.h

diff --git a/tools/libxl/libxl_bitops.h b/tools/libxl/libxl_bitops.h
new file mode 100644
index 0000000..c6ef8df
--- /dev/null
+++ b/tools/libxl/libxl_bitops.h
@@ -0,0 +1,79 @@
+#ifndef LIBXL_BITOPS_H
+#define LIBXL_BITOPS_H 1
+
+/* bitmap operations for single threaded access */
+
+#include <stdlib.h>
+#include <string.h>
+
+#define BITS_PER_LONG (sizeof(unsigned long) * 8)
+#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
+
+#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
+#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
+
+/* calculate required space for number of longs needed to hold nr_bits */
+static inline int bitmap_size(int nr_bits)
+{
+    int nr_long, nr_bytes;
+    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
+    nr_bytes = nr_long * sizeof(unsigned long);
+    return nr_bytes;
+}
+
+static inline unsigned long *bitmap_alloc(int nr_bits)
+{
+    return calloc(1, bitmap_size(nr_bits));
+}
+
+static inline void bitmap_clear(unsigned long *addr, int nr_bits)
+{
+    memset(addr, 0, bitmap_size(nr_bits));
+}
+
+static inline int test_bit(int nr, unsigned long *addr)
+{
+    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
+}
+
+static inline void clear_bit(int nr, unsigned long *addr)
+{
+    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
+}
+
+static inline void set_bit(int nr, unsigned long *addr)
+{
+    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
+}
+
+static inline int test_and_clear_bit(int nr, unsigned long *addr)
+{
+    int oldbit = test_bit(nr, addr);
+    clear_bit(nr, addr);
+    return oldbit;
+}
+
+static inline int test_and_set_bit(int nr, unsigned long *addr)
+{
+    int oldbit = test_bit(nr, addr);
+    set_bit(nr, addr);
+    return oldbit;
+}
+
+static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
+                             int nr_bits)
+{
+    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
+    for ( i = 0; i < nr_longs; ++i )
+        dst[i] |= other[i];
+}
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 09/29] move remus related codes to libxl_remus.c
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (7 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 10/29] rename remus device to checkpoint device Yang Hongyang
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

libxl_domain_remus_start() is external API, and is not moved.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 tools/libxl/Makefile      |   2 +-
 tools/libxl/libxl.c       |  57 +--------
 tools/libxl/libxl_dom.c   | 220 +-------------------------------
 tools/libxl/libxl_remus.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_remus.h |  28 ++++
 5 files changed, 352 insertions(+), 273 deletions(-)
 create mode 100644 tools/libxl/libxl_remus.c
 create mode 100644 tools/libxl/libxl_remus.h

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 1b16598..7eeda0e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 58629ed..bcbd961 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_remus.h"
 
 #define PAGE_TO_MEMKB(pages) ((pages) * 4)
 #define BACKEND_STRING_SIZE 5
@@ -842,11 +843,6 @@ out:
     GC_FREE;
     return ptr;
 }
-
-static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds, int rc);
-static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__remus_devices_state *rds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
@@ -895,63 +891,14 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
-
-    if (libxl_defbool_val(info->netbuf)) {
-        if (!libxl__netbuffer_enabled(gc)) {
-            LOG(ERROR, "Remus: No support for network buffering");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
-    }
-
-    if (libxl_defbool_val(info->diskbuf))
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
-
-    rds->ao = ao;
-    rds->domid = domid;
-    rds->callback = libxl__remus_setup_done;
-
     /* Point of no return */
-    libxl__remus_devices_setup(egc, rds);
+    libxl__remus_setup(egc, dss);
     return AO_INPROGRESS;
 
  out:
     return AO_ABORT(rc);
 }
 
-static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds, int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (!rc) {
-        libxl__domain_suspend(egc, dss);
-        return;
-    }
-
-    LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
-        dss->domid, rc);
-    rds->callback = libxl__remus_setup_failed;
-    libxl__remus_devices_teardown(egc, rds);
-}
-
-static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__remus_devices_state *rds, int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (rc)
-        LOG(ERROR, "Remus: failed to teardown device after setup failed"
-            " for guest with domid %u, rc %d", dss->domid, rc);
-
-    dss->callback(egc, dss, rc);
-}
-
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc)
 {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a3fce46..4693d32 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -19,6 +19,7 @@
 
 #include "libxl_internal.h"
 #include "libxl_arch.h"
+#include "libxl_remus.h"
 
 #include <xc_dom.h>
 #include <xen/hvm/hvm_info_table.h>
@@ -1807,194 +1808,6 @@ static void domain_suspend_callback_common_done(libxl__egc *egc,
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-/*----- remus callbacks -----*/
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state2 *dss2, int ok);
-static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
-                                         int rc);
-static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc);
-
-static void libxl__remus_domain_suspend_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-
-    /* Convenience aliases */
-    libxl__domain_suspend_state2 *const dss2 = &dss->dss2;
-
-    dss2->callback_common_done = remus_domain_suspend_callback_common_done;
-    domain_suspend_callback_common(egc, dss2);
-}
-
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state2 *dss2, int ok)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
-
-    if (!ok)
-        goto out;
-
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_postsuspend_cb;
-    libxl__remus_devices_postsuspend(egc, rds);
-    return;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
-}
-
-static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
-                                         int rc)
-{
-    int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-
-    if (rc)
-        goto out;
-
-    ok = 1;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
-}
-
-static void libxl__remus_domain_resume_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
-
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_preresume_cb;
-    libxl__remus_devices_preresume(egc, rds);
-}
-
-static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc)
-{
-    int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (rc)
-        goto out;
-
-    /* Resumes the domain and the device model */
-    rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
-    if (rc)
-        goto out;
-
-    ok = 1;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
-}
-
-/*----- remus asynchronous checkpoint callback -----*/
-
-static void remus_checkpoint_dm_saved(libxl__egc *egc,
-                                      libxl__domain_suspend_state *dss, int rc);
-static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
-                                    int rc);
-static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
-                                  const struct timeval *requested_abs);
-
-static void libxl__remus_domain_checkpoint_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    libxl__egc *egc = dss->shs.egc;
-    STATE_AO_GC(dss->ao);
-
-    /* This would go into tailbuf. */
-    if (dss->hvm) {
-        libxl__domain_save_device_model(egc, dss, remus_checkpoint_dm_saved);
-    } else {
-        remus_checkpoint_dm_saved(egc, dss, 0);
-    }
-}
-
-static void remus_checkpoint_dm_saved(libxl__egc *egc,
-                                      libxl__domain_suspend_state *dss, int rc)
-{
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
-
-    STATE_AO_GC(dss->ao);
-
-    if (rc) {
-        LOG(ERROR, "Failed to save device model. Terminating Remus..");
-        goto out;
-    }
-
-    rds->callback = remus_devices_commit_cb;
-    libxl__remus_devices_commit(egc, rds);
-
-    return;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
-}
-
-static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
-                                    int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-
-    STATE_AO_GC(dss->ao);
-
-    if (rc) {
-        LOG(ERROR, "Failed to do device commit op."
-            " Terminating Remus..");
-        goto out;
-    }
-
-    /*
-     * At this point, we have successfully checkpointed the guest and
-     * committed it at the backup. We'll come back after the checkpoint
-     * interval to checkpoint the guest again. Until then, let the guest
-     * continue execution.
-     */
-
-    /* Set checkpoint interval timeout */
-    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
-                                     remus_next_checkpoint,
-                                     dss->interval);
-
-    if (rc)
-        goto out;
-
-    return;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
-}
-
-static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
-                                  const struct timeval *requested_abs)
-{
-    libxl__domain_suspend_state *dss =
-                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
-
-    STATE_AO_GC(dss->ao);
-
-    /*
-     * Time to checkpoint the guest again. We return 1 to libxc
-     * (xc_domain_save.c). in order to continue executing the infinite loop
-     * (suspend, checkpoint, resume) in xc_domain_save().
-     */
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
-}
-
 /*----- main code for suspending, in order of execution -----*/
 void libxl__domain_suspend2(libxl__egc *egc,
                             libxl__domain_suspend_state2 *dss2)
@@ -2222,10 +2035,6 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
-static void remus_teardown_done(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc);
-
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
@@ -2241,34 +2050,11 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss2->guest_evtchn.port, &dss2->guest_evtchn_lockfd);
 
-    if (!dss->remus) {
-        dss->callback(egc, dss, rc);
+    if (dss->remus) {
+        libxl__remus_teardown(egc, dss, rc);
         return;
     }
 
-    /*
-     * With Remus, if we reach this point, it means either
-     * backup died or some network error occurred preventing us
-     * from sending checkpoints. Teardown the network buffers and
-     * release netlink resources.  This is an async op.
-     */
-    LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
-        " teardown Remus devices...", rc);
-    dss->rds.callback = remus_teardown_done;
-    libxl__remus_devices_teardown(egc, &dss->rds);
-}
-
-static void remus_teardown_done(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (rc)
-        LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
-            " rc %d", dss->domid, rc);
-
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
new file mode 100644
index 0000000..b555715
--- /dev/null
+++ b/tools/libxl/libxl_remus.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_remus.h"
+
+
+/*----- remus: setup the environment -----*/
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds, int rc);
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_devices_state *rds, int rc);
+
+void libxl__remus_setup(libxl__egc *egc,
+                        libxl__domain_suspend_state *dss)
+{
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+    const libxl_domain_remus_info *const info = dss->remus;
+
+    STATE_AO_GC(dss->ao);
+
+    if (libxl_defbool_val(info->netbuf)) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+    }
+
+    if (libxl_defbool_val(info->diskbuf))
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
+
+    rds->ao = ao;
+    rds->domid = dss->domid;
+    rds->callback = libxl__remus_setup_done;
+
+    libxl__remus_devices_setup(egc, rds);
+    return;
+
+out:
+    libxl__remus_setup_failed(egc, rds, ERROR_FAIL);
+}
+
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
+        dss->domid, rc);
+    rds->callback = libxl__remus_setup_failed;
+    libxl__remus_devices_teardown(egc, rds);
+}
+
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_devices_state *rds,
+                                      int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", dss->domid, rc);
+
+    dss->callback(egc, dss, rc);
+}
+
+
+/*----- remus: teardown the environment -----*/
+static void remus_teardown_done(libxl__egc *egc,
+                                libxl__remus_devices_state *rds,
+                                int rc);
+
+void libxl__remus_teardown(libxl__egc *egc,
+                           libxl__domain_suspend_state *dss,
+                           int rc)
+{
+    EGC_GC;
+
+    /*
+     * If we reach this point, it means either backup died or some
+     * network error occurred preventing us from sending checkpoints.
+     * Teardown the network buffers and release netlink resources.
+     * This is an async op.
+     */
+    LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
+        " teardown Remus devices...", rc);
+    dss->rds.callback = remus_teardown_done;
+    libxl__remus_devices_teardown(egc, &dss->rds);
+}
+
+static void remus_teardown_done(libxl__egc *egc,
+                                libxl__remus_devices_state *rds,
+                                int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
+            " rc %d", dss->domid, rc);
+
+    dss->callback(egc, dss, rc);
+}
+
+
+/*----- remus: suspend the guest -----*/
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+                                libxl__domain_suspend_state2 *dss2, int ok);
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc);
+
+void libxl__remus_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *const dss2 = &dss->dss2;
+
+    dss2->callback_common_done = remus_domain_suspend_callback_common_done;
+    libxl__domain_suspend2(egc, dss2);
+}
+
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+                                libxl__domain_suspend_state2 *dss2, int ok)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
+    if (!ok)
+        goto out;
+
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_postsuspend_cb;
+    libxl__remus_devices_postsuspend(egc, rds);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    if (rc)
+        goto out;
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/*----- remus: resume the guest -----*/
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc);
+
+void libxl__remus_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    STATE_AO_GC(dss->ao);
+
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_preresume_cb;
+    libxl__remus_devices_preresume(egc, rds);
+}
+
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        goto out;
+
+    /* Resumes the domain and the device model */
+    rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
+    if (rc)
+        goto out;
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/*----- remus: wait a new checkpoint -----*/
+static void remus_checkpoint_dm_saved(libxl__egc *egc,
+                                      libxl__domain_suspend_state *dss, int rc);
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
+
+void libxl__remus_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    libxl__egc *egc = dss->shs.egc;
+    STATE_AO_GC(dss->ao);
+
+    /* This would go into tailbuf. */
+    if (dss->hvm) {
+        libxl__domain_save_device_model(egc, dss, remus_checkpoint_dm_saved);
+    } else {
+        remus_checkpoint_dm_saved(egc, dss, 0);
+    }
+}
+
+static void remus_checkpoint_dm_saved(libxl__egc *egc,
+                                      libxl__domain_suspend_state *dss, int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rds->callback = remus_devices_commit_cb;
+    libxl__remus_devices_commit(egc, rds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    }
+
+    /*
+     * At this point, we have successfully checkpointed the guest and
+     * committed it at the backup. We'll come back after the checkpoint
+     * interval to checkpoint the guest again. Until then, let the guest
+     * continue execution.
+     */
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+                                     remus_next_checkpoint,
+                                     dss->interval);
+
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__domain_suspend_state *dss =
+                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+
+    STATE_AO_GC(dss->ao);
+
+    /*
+     * Time to checkpoint the guest again. We return 1 to libxc
+     * (xc_domain_save.c). in order to continue executing the infinite loop
+     * (suspend, checkpoint, resume) in xc_domain_save().
+     */
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
+}
diff --git a/tools/libxl/libxl_remus.h b/tools/libxl/libxl_remus.h
new file mode 100644
index 0000000..53e5e81
--- /dev/null
+++ b/tools/libxl/libxl_remus.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_REMUS_H
+#define LIBXL_REMUS_H
+
+void libxl__remus_setup(libxl__egc *egc,
+                        libxl__domain_suspend_state *dss);
+void libxl__remus_teardown(libxl__egc *egc,
+                           libxl__domain_suspend_state *dss,
+                           int rc);
+void libxl__remus_domain_suspend_callback(void *data);
+void libxl__remus_domain_resume_callback(void *data);
+void libxl__remus_domain_checkpoint_callback(void *data);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 10/29] rename remus device to checkpoint device
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (8 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 09/29] move remus related codes to libxl_remus.c Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 11/29] adjust the indentation Yang Hongyang
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

This patch is auto generated by the following commands:
 1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c
 2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' tools/libxl/Makefile
 3. perl -pi -e 's/\blibxl__remus_devices/libxl__checkpoint_devices/g' tools/libxl/*.[ch]
 4. perl -pi -e 's/\blibxl__remus_device\b/libxl__checkpoint_device/g' tools/libxl/*.[ch]
 5. perl -pi -e 's/\blibxl__remus_device_instance_ops\b/libxl__checkpoint_device_instance_ops/g' tools/libxl/*.[ch]
 6. perl -pi -e 's/\blibxl__remus_callback\b/libxl__checkpoint_callback/g' tools/libxl/*.[ch]
 7. perl -pi -e 's/\bremus_device_init\b/checkpoint_device_init/g' tools/libxl/*.[ch]
 8. perl -pi -e 's/\bremus_devices_setup\b/checkpoint_devices_setup/g' tools/libxl/*.[ch]
 9. perl -pi -e 's/\bdefine_remus_checkpoint_api\b/define_checkpoint_api/g' tools/libxl/*.[ch]
10. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch]
11. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] tools/libxl/*.idl
12. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] tools/libxl/*.idl
13. perl -pi -e 's/\bremus\b/checkpoint/g' tools/libxl/libxl_checkpoint_device.[ch]
14. perl -pi -e 's/\bremus device/checkpoint device/g' tools/libxl/libxl_internal.h
15. perl -pi -e 's/\bRemus device/checkpoint device/g' tools/libxl/libxl_internal.h
16. perl -pi -e 's/\bremus abstract/checkpoint abstract/g' tools/libxl/libxl_internal.h
17. perl -pi -e 's/\bremus invocation/checkpoint invocation/g' tools/libxl/libxl_internal.h
18. perl -pi -e 's/\blibxl__remus_device_\(/libxl__checkpoint_device_(/g' tools/libxl/libxl_internal.h

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 tools/libxl/Makefile                  |   2 +-
 tools/libxl/libxl_checkpoint_device.c | 327 ++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h          | 112 ++++++------
 tools/libxl/libxl_netbuffer.c         | 108 +++++------
 tools/libxl/libxl_nonetbuffer.c       |  10 +-
 tools/libxl/libxl_remus.c             |  78 ++++----
 tools/libxl/libxl_remus_device.c      | 327 ----------------------------------
 tools/libxl/libxl_remus_disk_drbd.c   |  52 +++---
 tools/libxl/libxl_types.idl           |   4 +-
 9 files changed, 510 insertions(+), 510 deletions(-)
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 delete mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 7eeda0e..1e27754 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
new file mode 100644
index 0000000..109cd23
--- /dev/null
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
+    &remus_device_drbd_disk,
+    NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(cds);
+        if (rc) goto out;
+    }
+
+    rc = init_subkind_drbd_disk(cds);
+    if (rc) goto out;
+
+    rc = 0;
+out:
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(cds);
+
+    cleanup_subkind_drbd_disk(cds);
+}
+
+/*----- setup() and teardown() -----*/
+
+/* callbacks */
+
+static void all_devices_setup_cb(libxl__egc *egc,
+                                 libxl__multidev *multidev,
+                                 int rc);
+static void device_setup_iterate(libxl__egc *egc,
+                                 libxl__ao_device *aodev);
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc);
+
+/* checkpoint device setup and teardown */
+
+static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
+                                              libxl__checkpoint_devices_state *cds,
+                                              libxl__device_kind kind,
+                                              void *libxl_dev)
+{
+    libxl__checkpoint_device *dev = NULL;
+
+    STATE_AO_GC(cds->ao);
+    GCNEW(dev);
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->cds = cds;
+
+    return dev;
+}
+
+static void checkpoint_devices_setup(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds);
+
+void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
+{
+    int i, rc;
+
+    STATE_AO_GC(cds->ao);
+
+    rc = init_device_subkind(cds);
+    if (rc)
+        goto out;
+
+    cds->num_devices = 0;
+    cds->num_nics = 0;
+    cds->num_disks = 0;
+
+    if (cds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
+        cds->nics = libxl_device_nic_list(CTX, cds->domid, &cds->num_nics);
+
+    if (cds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
+        cds->disks = libxl_device_disk_list(CTX, cds->domid, &cds->num_disks);
+
+    if (cds->num_nics == 0 && cds->num_disks == 0)
+        goto out;
+
+    GCNEW_ARRAY(cds->devs, cds->num_nics + cds->num_disks);
+
+    for (i = 0; i < cds->num_nics; i++) {
+        cds->devs[cds->num_devices++] = checkpoint_device_init(egc, cds,
+                                                LIBXL__DEVICE_KIND_VIF,
+                                                &cds->nics[i]);
+    }
+
+    for (i = 0; i < cds->num_disks; i++) {
+        cds->devs[cds->num_devices++] = checkpoint_device_init(egc, cds,
+                                                LIBXL__DEVICE_KIND_VBD,
+                                                &cds->disks[i]);
+    }
+
+    checkpoint_devices_setup(egc, cds);
+
+    return;
+
+out:
+    cds->callback(egc, cds, rc);
+}
+
+static void checkpoint_devices_setup(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds)
+{
+    int i, rc;
+
+    STATE_AO_GC(cds->ao);
+
+    libxl__multidev_begin(ao, &cds->multidev);
+    cds->multidev.callback = all_devices_setup_cb;
+    for (i = 0; i < cds->num_devices; i++) {
+        libxl__checkpoint_device *dev = cds->devs[i];
+        dev->ops_index = -1;
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);
+
+        dev->aodev.rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
+        dev->aodev.callback = device_setup_iterate;
+        device_setup_iterate(egc,&dev->aodev);
+    }
+
+    rc = 0;
+    libxl__multidev_prepared(egc, &cds->multidev, rc);
+}
+
+
+static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
+{
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    EGC_GC;
+
+    if (aodev->rc != ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED &&
+        aodev->rc != ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH)
+        /* might be success or disaster */
+        goto out;
+
+    do {
+        dev->ops = remus_ops[++dev->ops_index];
+        if (!dev->ops) {
+            libxl_device_nic * nic = NULL;
+            libxl_device_disk * disk = NULL;
+            uint32_t domid;
+            int devid;
+            if (dev->kind == LIBXL__DEVICE_KIND_VIF) {
+                nic = (libxl_device_nic *)dev->backend_dev;
+                domid = nic->backend_domid;
+                devid = nic->devid;
+            } else if (dev->kind == LIBXL__DEVICE_KIND_VBD) {
+                disk = (libxl_device_disk *)dev->backend_dev;
+                domid = disk->backend_domid;
+                devid = libxl__device_disk_dev_number(disk->vdev, NULL, NULL);
+            } else {
+                LOG(ERROR,"device kind not handled by checkpoint: %s",
+                    libxl__device_kind_to_string(dev->kind));
+                aodev->rc = ERROR_FAIL;
+                goto out;
+            }
+            LOG(ERROR,"device not handled by checkpoint"
+                " (device=%s:%"PRId32"/%"PRId32")",
+                libxl__device_kind_to_string(dev->kind),
+                domid, devid);
+            aodev->rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
+            goto out;
+        }
+    } while (dev->ops->kind != dev->kind);
+
+    /* found the next ops_index to try */
+    assert(dev->aodev.callback == device_setup_iterate);
+    dev->ops->setup(egc,dev);
+    return;
+
+ out:
+    libxl__multidev_one_callback(egc,aodev);
+}
+
+static void all_devices_setup_cb(libxl__egc *egc,
+                                 libxl__multidev *multidev,
+                                 int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
+
+    cds->callback(egc, cds, rc);
+}
+
+void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds)
+{
+    int i;
+    libxl__checkpoint_device *dev;
+
+    STATE_AO_GC(cds->ao);
+
+    libxl__multidev_begin(ao, &cds->multidev);
+    cds->multidev.callback = devices_teardown_cb;
+    for (i = 0; i < cds->num_devices; i++) {
+        dev = cds->devs[i];
+        if (!dev->ops || !dev->matched)
+            continue;
+
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);
+        dev->ops->teardown(egc,dev);
+    }
+
+    libxl__multidev_prepared(egc, &cds->multidev, 0);
+}
+
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc)
+{
+    int i;
+
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
+
+    /* clean nic */
+    for (i = 0; i < cds->num_nics; i++)
+        libxl_device_nic_dispose(&cds->nics[i]);
+    free(cds->nics);
+    cds->nics = NULL;
+    cds->num_nics = 0;
+
+    /* clean disk */
+    for (i = 0; i < cds->num_disks; i++)
+        libxl_device_disk_dispose(&cds->disks[i]);
+    free(cds->disks);
+    cds->disks = NULL;
+    cds->num_disks = 0;
+
+    cleanup_device_subkind(cds);
+
+    cds->callback(egc, cds, rc);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc);
+
+/* API implementations */
+
+#define define_checkpoint_api(api)                                \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
+                                libxl__checkpoint_devices_state *cds)        \
+{                                                                       \
+    int i;                                                              \
+    libxl__checkpoint_device *dev;                                           \
+                                                                        \
+    STATE_AO_GC(cds->ao);                                               \
+                                                                        \
+    libxl__multidev_begin(ao, &cds->multidev);                          \
+    cds->multidev.callback = devices_checkpoint_cb;                     \
+    for (i = 0; i < cds->num_devices; i++) {                            \
+        dev = cds->devs[i];                                             \
+        if (!dev->matched || !dev->ops->api)                            \
+            continue;                                                   \
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);\
+        dev->ops->api(egc,dev);                                         \
+    }                                                                   \
+                                                                        \
+    libxl__multidev_prepared(egc, &cds->multidev, 0);                   \
+}
+
+define_checkpoint_api(postsuspend);
+
+define_checkpoint_api(preresume);
+
+define_checkpoint_api(commit);
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
+
+    cds->callback(egc, cds, rc);
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8d229ac..7e7c3b3 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2628,9 +2628,9 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
-/*----- remus device related state structure -----*/
+/*----- checkpoint device related state structure -----*/
 /*
- * The abstract Remus device layer exposes a common
+ * The abstract checkpoint device layer exposes a common
  * set of API to [external] libxl for manipulating devices attached to
  * a guest protected by Remus. The device layer also exposes a set of
  * [internal] interfaces that every device type must implement.
@@ -2638,34 +2638,34 @@ typedef struct libxl__save_helper_state {
  * The following API are exposed to libxl:
  *
  * One-time configuration operations:
- *  +libxl__remus_devices_setup
+ *  +libxl__checkpoint_devices_setup
  *    > Enable output buffering for NICs, setup disk replication, etc.
- *  +libxl__remus_devices_teardown
+ *  +libxl__checkpoint_devices_teardown
  *    > Disable output buffering and disk replication; teardown any
  *       associated external setups like qdiscs for NICs.
  *
  * Operations executed every checkpoint (in order of invocation):
- *  +libxl__remus_devices_postsuspend
- *  +libxl__remus_devices_preresume
- *  +libxl__remus_devices_commit
+ *  +libxl__checkpoint_devices_postsuspend
+ *  +libxl__checkpoint_devices_preresume
+ *  +libxl__checkpoint_devices_commit
  *
  * Each device type needs to implement the interfaces specified in
- * the libxl__remus_device_instance_ops if it wishes to support Remus.
+ * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the Remus device layer is shown below:
+ * The high-level control flow through the checkpoint device layer is shown below:
  *
  * xl remus
  *  |->  libxl_domain_remus_start
- *    |-> libxl__remus_devices_setup
- *      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
+ *    |-> libxl__checkpoint_devices_setup
+ *      |-> Per-checkpoint libxl__checkpoint_devices_[postsuspend,preresume,commit]
  *        ...
  *        |-> On backup failure, network error or other internal errors:
- *            libxl__remus_devices_teardown
+ *            libxl__checkpoint_devices_teardown
  */
 
-typedef struct libxl__remus_device libxl__remus_device;
-typedef struct libxl__remus_devices_state libxl__remus_devices_state;
-typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops;
+typedef struct libxl__checkpoint_device libxl__checkpoint_device;
+typedef struct libxl__checkpoint_devices_state libxl__checkpoint_devices_state;
+typedef struct libxl__checkpoint_device_instance_ops libxl__checkpoint_device_instance_ops;
 
 /*
  * Interfaces to be implemented by every device subkind that wishes to
@@ -2675,7 +2675,7 @@ typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops
  * synchronous and call dev->aodev.callback directly (as the last
  * thing they do).
  */
-struct libxl__remus_device_instance_ops {
+struct libxl__checkpoint_device_instance_ops {
     /* the device kind this ops belongs to... */
     libxl__device_kind kind;
 
@@ -2686,12 +2686,12 @@ struct libxl__remus_device_instance_ops {
      * Asynchronous.
      */
 
-    void (*postsuspend)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*preresume)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*commit)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*postsuspend)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*preresume)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*commit)(libxl__egc *egc, libxl__checkpoint_device *dev);
 
     /*
-     * setup() and teardown() are refer to the actual remus device.
+     * setup() and teardown() are refer to the actual checkpoint device.
      * Asynchronous.
      * teardown is called even if setup fails.
      */
@@ -2700,45 +2700,45 @@ struct libxl__remus_device_instance_ops {
      * device. If matched, the device will then be managed with this set of
      * subkind operations.
      * Yields 0 if the device successfully set up.
-     * REMUS_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
+     * CHECKPOINT_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
      * any other rc indicates failure.
      */
-    void (*setup)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*teardown)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*setup)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*teardown)(libxl__egc *egc, libxl__checkpoint_device *dev);
 };
 
-int init_subkind_nic(libxl__remus_devices_state *rds);
-void cleanup_subkind_nic(libxl__remus_devices_state *rds);
-int init_subkind_drbd_disk(libxl__remus_devices_state *rds);
-void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds);
+int init_subkind_nic(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds);
+int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
-typedef void libxl__remus_callback(libxl__egc *,
-                                   libxl__remus_devices_state *, int rc);
+typedef void libxl__checkpoint_callback(libxl__egc *,
+                                   libxl__checkpoint_devices_state *, int rc);
 
 /*
- * State associated with a remus invocation, including parameters
- * passed to the remus abstract device layer by the remus
+ * State associated with a checkpoint invocation, including parameters
+ * passed to the checkpoint abstract device layer by the remus
  * save/restore machinery.
  */
-struct libxl__remus_devices_state {
-    /*---- must be set by caller of libxl__remus_device_(setup|teardown) ----*/
+struct libxl__checkpoint_devices_state {
+    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
 
     libxl__ao *ao;
     uint32_t domid;
-    libxl__remus_callback *callback;
+    libxl__checkpoint_callback *callback;
     int device_kind_flags;
 
     /*----- private for abstract layer only -----*/
 
     int num_devices;
     /*
-     * this array is allocated before setup the remus devices by the
-     * remus abstract layer.
-     * devs may be NULL, means there's no remus devices that has been set up.
+     * this array is allocated before setup the checkpoint devices by the
+     * checkpoint abstract layer.
+     * devs may be NULL, means there's no checkpoint devices that has been set up.
      * the size of this array is 'num_devices', which is the total number
      * of libxl nic devices and disk devices(num_nics + num_disks).
      */
-    libxl__remus_device **devs;
+    libxl__checkpoint_device **devs;
 
     libxl_device_nic *nics;
     int num_nics;
@@ -2760,20 +2760,20 @@ struct libxl__remus_devices_state {
 
 /*
  * Information about a single device being handled by remus.
- * Allocated by the remus abstract layer.
+ * Allocated by the checkpoint abstract layer.
  */
-struct libxl__remus_device {
+struct libxl__checkpoint_device {
     /*----- shared between abstract and concrete layers -----*/
     /*
      * if this is true, that means the subkind ops match the device
      */
     bool matched;
 
-    /*----- set by remus device abstruct layer -----*/
-    /* libxl__device_* which this remus device related to */
+    /*----- set by checkpoint device abstruct layer -----*/
+    /* libxl__device_* which this checkpoint device related to */
     const void *backend_dev;
     libxl__device_kind kind;
-    libxl__remus_devices_state *rds;
+    libxl__checkpoint_devices_state *cds;
     libxl__ao_device aodev;
 
     /*----- private for abstract layer only -----*/
@@ -2784,7 +2784,7 @@ struct libxl__remus_device {
      * individual devices.
      */
     int ops_index;
-    const libxl__remus_device_instance_ops *ops;
+    const libxl__checkpoint_device_instance_ops *ops;
 
     /*----- private for concrete (device-specific) layer -----*/
 
@@ -2792,17 +2792,17 @@ struct libxl__remus_device {
     void *concrete_data;
 };
 
-/* the following 5 APIs are async ops, call rds->callback when done */
-_hidden void libxl__remus_devices_setup(libxl__egc *egc,
-                                        libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
-                                           libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
-                                              libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
-                                            libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_commit(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds);
+/* the following 5 APIs are async ops, call cds->callback when done */
+_hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                           libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
+                                              libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
+                                            libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
+                                         libxl__checkpoint_devices_state *cds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2866,7 +2866,7 @@ struct libxl__domain_suspend_state {
     libxl__domain_suspend_state2 dss2;
     int hvm;
     int xcflags;
-    libxl__remus_devices_state rds;
+    libxl__checkpoint_devices_state cds;
     libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index edc6843..2d668dd 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -38,21 +38,21 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 1;
 }
 
-int init_subkind_nic(libxl__remus_devices_state *rds)
+int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     int rc, ret;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->nlsock = nl_socket_alloc();
-    if (!rds->nlsock) {
+    cds->nlsock = nl_socket_alloc();
+    if (!cds->nlsock) {
         LOG(ERROR, "cannot allocate nl socket");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+    ret = nl_connect(cds->nlsock, NETLINK_ROUTE);
     if (ret) {
         LOG(ERROR, "failed to open netlink socket: %s",
             nl_geterror(ret));
@@ -61,7 +61,7 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
     }
 
     /* get list of all qdiscs installed on network devs. */
-    ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+    ret = rtnl_qdisc_alloc_cache(cds->nlsock, &cds->qdisc_cache);
     if (ret) {
         LOG(ERROR, "failed to allocate qdisc cache: %s",
             nl_geterror(ret));
@@ -70,9 +70,9 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
     }
 
     if (dss->remus->netbufscript) {
-        rds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+        cds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
     } else {
-        rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+        cds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
                                       libxl__xen_script_dir_path());
     }
 
@@ -82,22 +82,22 @@ out:
     return rc;
 }
 
-void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* free qdisc cache */
-    if (rds->qdisc_cache) {
-        nl_cache_clear(rds->qdisc_cache);
-        nl_cache_free(rds->qdisc_cache);
-        rds->qdisc_cache = NULL;
+    if (cds->qdisc_cache) {
+        nl_cache_clear(cds->qdisc_cache);
+        nl_cache_free(cds->qdisc_cache);
+        cds->qdisc_cache = NULL;
     }
 
     /* close & free nlsock */
-    if (rds->nlsock) {
-        nl_close(rds->nlsock);
-        nl_socket_free(rds->nlsock);
-        rds->nlsock = NULL;
+    if (cds->nlsock) {
+        nl_close(cds->nlsock);
+        nl_socket_free(cds->nlsock);
+        cds->nlsock = NULL;
     }
 }
 
@@ -111,17 +111,17 @@ void cleanup_subkind_nic(libxl__remus_devices_state *rds)
  * it must ONLY be used for remus because if driver domains
  * were in use it would constitute a security vulnerability.
  */
-static const char *get_vifname(libxl__remus_device *dev,
+static const char *get_vifname(libxl__checkpoint_device *dev,
                                const libxl_device_nic *nic)
 {
     const char *vifname = NULL;
     const char *path;
     int rc;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Convenience aliases */
-    const uint32_t domid = dev->rds->domid;
+    const uint32_t domid = dev->cds->domid;
 
     path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
                      libxl__xs_get_dompath(gc, 0), domid, nic->devid);
@@ -144,19 +144,19 @@ static void free_qdisc(libxl__remus_device_nic *remus_nic)
     remus_nic->qdisc = NULL;
 }
 
-static int init_qdisc(libxl__remus_devices_state *rds,
+static int init_qdisc(libxl__checkpoint_devices_state *cds,
                       libxl__remus_device_nic *remus_nic)
 {
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
     struct rtnl_qdisc *qdisc = NULL;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Now that we have brought up REMUS_IFB device with plug qdisc for
      * this vif, so we need to refill the qdisc cache.
      */
-    ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+    ret = nl_cache_refill(cds->nlsock, cds->qdisc_cache);
     if (ret) {
         LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
         rc = ERROR_FAIL;
@@ -164,7 +164,7 @@ static int init_qdisc(libxl__remus_devices_state *rds,
     }
 
     /* get a handle to the REMUS_IFB interface */
-    ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+    ret = rtnl_link_get_kernel(cds->nlsock, 0, remus_nic->ifb, &ifb);
     if (ret) {
         LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
             nl_geterror(ret));
@@ -187,7 +187,7 @@ static int init_qdisc(libxl__remus_devices_state *rds,
      * There is no need to explicitly free this qdisc as its just a
      * reference from the qdisc cache we allocated earlier.
      */
-    qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+    qdisc = rtnl_qdisc_get_by_parent(cds->qdisc_cache, ifindex, TC_H_ROOT);
     if (qdisc) {
         const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
         /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
@@ -231,19 +231,19 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
  * $REMUS_IFB (for teardown)
  * setup/teardown as command line arg.
  */
-static void setup_async_exec(libxl__remus_device *dev, char *op)
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
 {
     int arraysize, nr = 0;
     char **env = NULL, **args = NULL;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
-    libxl__remus_devices_state *rds = dev->rds;
+    libxl__checkpoint_devices_state *cds = dev->cds;
     libxl__async_exec_state *aes = &dev->aodev.aes;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    char *const script = libxl__strdup(gc, rds->netbufscript);
-    const uint32_t domid = rds->domid;
+    char *const script = libxl__strdup(gc, cds->netbufscript);
+    const uint32_t domid = cds->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
     const char *const ifb = remus_nic->ifb;
@@ -269,7 +269,7 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
     args[nr++] = NULL;
     assert(nr == arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", args[0], args[1]);
     aes->env = env;
     aes->args = args;
@@ -286,13 +286,13 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
 
 /* setup() and teardown() */
 
-static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic;
     const libxl_device_nic *nic = dev->backend_dev;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /*
      * thers's no subkind of nic devices, so nic ops is always matched
@@ -330,16 +330,16 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
                                    int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
-    libxl__remus_devices_state *rds = dev->rds;
+    libxl__checkpoint_devices_state *cds = dev->cds;
     const char *out_path_base, *hotplug_error = NULL;
     int rc;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    const uint32_t domid = rds->domid;
+    const uint32_t domid = cds->domid;
     const int devid = remus_nic->devid;
     const char *const vif = remus_nic->vif;
     const char **const ifb = &remus_nic->ifb;
@@ -373,7 +373,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            rds->netbufscript, vif, hotplug_error);
+            cds->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
@@ -384,17 +384,17 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     }
 
     LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
-    rc = init_qdisc(rds, remus_nic);
+    rc = init_qdisc(cds, remus_nic);
 
 out:
     aodev->rc = rc;
     aodev->callback(egc, aodev);
 }
 
-static void nic_teardown(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     setup_async_exec(dev, "teardown");
 
@@ -415,7 +415,7 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
     if (status)
@@ -440,12 +440,12 @@ enum {
 /* API implementations */
 
 static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
-                           libxl__remus_devices_state *rds,
+                           libxl__checkpoint_devices_state *cds,
                            int buffer_op)
 {
     int rc, ret;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     if (buffer_op == tc_buffer_start)
         ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
@@ -457,7 +457,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
         goto out;
     }
 
-    ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    ret = rtnl_qdisc_add(cds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
     if (ret) {
         rc = ERROR_FAIL;
         goto out;
@@ -474,33 +474,33 @@ out:
     return rc;
 }
 
-static void nic_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_postsuspend(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+    rc = remus_netbuf_op(remus_nic, dev->cds, tc_buffer_start);
 
     dev->aodev.rc = rc;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-static void nic_commit(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_commit(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+    rc = remus_netbuf_op(remus_nic, dev->cds, tc_buffer_release);
 
     dev->aodev.rc = rc;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_nic = {
+const libxl__checkpoint_device_instance_ops remus_device_nic = {
     .kind = LIBXL__DEVICE_KIND_VIF,
     .setup = nic_setup,
     .teardown = nic_teardown,
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 3c659c2..4b68152 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,25 +22,25 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
-int init_subkind_nic(libxl__remus_devices_state *rds)
+int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     return 0;
 }
 
-void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     return;
 }
 
-static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     dev->aodev.rc = ERROR_FAIL;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_nic = {
+const libxl__checkpoint_device_instance_ops remus_device_nic = {
     .kind = LIBXL__DEVICE_KIND_VIF,
     .setup = nic_setup,
 };
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index b555715..211216c 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -21,15 +21,15 @@
 
 /*----- remus: setup the environment -----*/
 static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds, int rc);
+                                    libxl__checkpoint_devices_state *cds, int rc);
 static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__remus_devices_state *rds, int rc);
+                                      libxl__checkpoint_devices_state *cds, int rc);
 
 void libxl__remus_setup(libxl__egc *egc,
                         libxl__domain_suspend_state *dss)
 {
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
     const libxl_domain_remus_info *const info = dss->remus;
 
     STATE_AO_GC(dss->ao);
@@ -39,27 +39,27 @@ void libxl__remus_setup(libxl__egc *egc,
             LOG(ERROR, "Remus: No support for network buffering");
             goto out;
         }
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+        cds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
     }
 
     if (libxl_defbool_val(info->diskbuf))
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
+        cds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
 
-    rds->ao = ao;
-    rds->domid = dss->domid;
-    rds->callback = libxl__remus_setup_done;
+    cds->ao = ao;
+    cds->domid = dss->domid;
+    cds->callback = libxl__remus_setup_done;
 
-    libxl__remus_devices_setup(egc, rds);
+    libxl__checkpoint_devices_setup(egc, cds);
     return;
 
 out:
-    libxl__remus_setup_failed(egc, rds, ERROR_FAIL);
+    libxl__remus_setup_failed(egc, cds, ERROR_FAIL);
 }
 
 static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds, int rc)
+                                    libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -69,15 +69,15 @@ static void libxl__remus_setup_done(libxl__egc *egc,
 
     LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
         dss->domid, rc);
-    rds->callback = libxl__remus_setup_failed;
-    libxl__remus_devices_teardown(egc, rds);
+    cds->callback = libxl__remus_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, cds);
 }
 
 static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__remus_devices_state *rds,
+                                      libxl__checkpoint_devices_state *cds,
                                       int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -90,7 +90,7 @@ static void libxl__remus_setup_failed(libxl__egc *egc,
 
 /*----- remus: teardown the environment -----*/
 static void remus_teardown_done(libxl__egc *egc,
-                                libxl__remus_devices_state *rds,
+                                libxl__checkpoint_devices_state *cds,
                                 int rc);
 
 void libxl__remus_teardown(libxl__egc *egc,
@@ -107,15 +107,15 @@ void libxl__remus_teardown(libxl__egc *egc,
      */
     LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
         " teardown Remus devices...", rc);
-    dss->rds.callback = remus_teardown_done;
-    libxl__remus_devices_teardown(egc, &dss->rds);
+    dss->cds.callback = remus_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->cds);
 }
 
 static void remus_teardown_done(libxl__egc *egc,
-                                libxl__remus_devices_state *rds,
+                                libxl__checkpoint_devices_state *cds,
                                 int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -130,7 +130,7 @@ static void remus_teardown_done(libxl__egc *egc,
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state2 *dss2, int ok);
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
+                                         libxl__checkpoint_devices_state *cds,
                                          int rc);
 
 void libxl__remus_domain_suspend_callback(void *data)
@@ -154,9 +154,9 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     if (!ok)
         goto out;
 
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_postsuspend_cb;
-    libxl__remus_devices_postsuspend(egc, rds);
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    cds->callback = remus_devices_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
     return;
 
 out:
@@ -164,11 +164,11 @@ out:
 }
 
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
+                                         libxl__checkpoint_devices_state *cds,
                                          int rc)
 {
     int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     if (rc)
         goto out;
@@ -182,7 +182,7 @@ out:
 
 /*----- remus: resume the guest -----*/
 static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
+                                       libxl__checkpoint_devices_state *cds,
                                        int rc);
 
 void libxl__remus_domain_resume_callback(void *data)
@@ -192,17 +192,17 @@ void libxl__remus_domain_resume_callback(void *data)
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_preresume_cb;
-    libxl__remus_devices_preresume(egc, rds);
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    cds->callback = remus_devices_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
 }
 
 static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
+                                       libxl__checkpoint_devices_state *cds,
                                        int rc)
 {
     int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -224,7 +224,7 @@ out:
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
 static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
+                                    libxl__checkpoint_devices_state *cds,
                                     int rc);
 static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs);
@@ -248,7 +248,7 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
 
     STATE_AO_GC(dss->ao);
 
@@ -257,8 +257,8 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
         goto out;
     }
 
-    rds->callback = remus_devices_commit_cb;
-    libxl__remus_devices_commit(egc, rds);
+    cds->callback = remus_devices_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
 
     return;
 
@@ -267,10 +267,10 @@ out:
 }
 
 static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
+                                    libxl__checkpoint_devices_state *cds,
                                     int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
deleted file mode 100644
index a6cb7f6..0000000
--- a/tools/libxl/libxl_remus_device.c
+++ /dev/null
@@ -1,327 +0,0 @@
-/*
- * Copyright (C) 2014 FUJITSU LIMITED
- * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published
- * by the Free Software Foundation; version 2.1 only. with the special
- * exception on linking described in file LICENSE.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- */
-
-#include "libxl_osdeps.h" /* must come before any other headers */
-
-#include "libxl_internal.h"
-
-extern const libxl__remus_device_instance_ops remus_device_nic;
-extern const libxl__remus_device_instance_ops remus_device_drbd_disk;
-static const libxl__remus_device_instance_ops *remus_ops[] = {
-    &remus_device_nic,
-    &remus_device_drbd_disk,
-    NULL,
-};
-
-/*----- helper functions -----*/
-
-static int init_device_subkind(libxl__remus_devices_state *rds)
-{
-    /* init device subkind-specific state in the libxl ctx */
-    int rc;
-    STATE_AO_GC(rds->ao);
-
-    if (libxl__netbuffer_enabled(gc)) {
-        rc = init_subkind_nic(rds);
-        if (rc) goto out;
-    }
-
-    rc = init_subkind_drbd_disk(rds);
-    if (rc) goto out;
-
-    rc = 0;
-out:
-    return rc;
-}
-
-static void cleanup_device_subkind(libxl__remus_devices_state *rds)
-{
-    /* cleanup device subkind-specific state in the libxl ctx */
-    STATE_AO_GC(rds->ao);
-
-    if (libxl__netbuffer_enabled(gc))
-        cleanup_subkind_nic(rds);
-
-    cleanup_subkind_drbd_disk(rds);
-}
-
-/*----- setup() and teardown() -----*/
-
-/* callbacks */
-
-static void all_devices_setup_cb(libxl__egc *egc,
-                                 libxl__multidev *multidev,
-                                 int rc);
-static void device_setup_iterate(libxl__egc *egc,
-                                 libxl__ao_device *aodev);
-static void devices_teardown_cb(libxl__egc *egc,
-                                libxl__multidev *multidev,
-                                int rc);
-
-/* remus device setup and teardown */
-
-static libxl__remus_device* remus_device_init(libxl__egc *egc,
-                                              libxl__remus_devices_state *rds,
-                                              libxl__device_kind kind,
-                                              void *libxl_dev)
-{
-    libxl__remus_device *dev = NULL;
-
-    STATE_AO_GC(rds->ao);
-    GCNEW(dev);
-    dev->backend_dev = libxl_dev;
-    dev->kind = kind;
-    dev->rds = rds;
-
-    return dev;
-}
-
-static void remus_devices_setup(libxl__egc *egc,
-                                libxl__remus_devices_state *rds);
-
-void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_devices_state *rds)
-{
-    int i, rc;
-
-    STATE_AO_GC(rds->ao);
-
-    rc = init_device_subkind(rds);
-    if (rc)
-        goto out;
-
-    rds->num_devices = 0;
-    rds->num_nics = 0;
-    rds->num_disks = 0;
-
-    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
-        rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
-
-    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
-        rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
-
-    if (rds->num_nics == 0 && rds->num_disks == 0)
-        goto out;
-
-    GCNEW_ARRAY(rds->devs, rds->num_nics + rds->num_disks);
-
-    for (i = 0; i < rds->num_nics; i++) {
-        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
-                                                LIBXL__DEVICE_KIND_VIF,
-                                                &rds->nics[i]);
-    }
-
-    for (i = 0; i < rds->num_disks; i++) {
-        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
-                                                LIBXL__DEVICE_KIND_VBD,
-                                                &rds->disks[i]);
-    }
-
-    remus_devices_setup(egc, rds);
-
-    return;
-
-out:
-    rds->callback(egc, rds, rc);
-}
-
-static void remus_devices_setup(libxl__egc *egc,
-                                libxl__remus_devices_state *rds)
-{
-    int i, rc;
-
-    STATE_AO_GC(rds->ao);
-
-    libxl__multidev_begin(ao, &rds->multidev);
-    rds->multidev.callback = all_devices_setup_cb;
-    for (i = 0; i < rds->num_devices; i++) {
-        libxl__remus_device *dev = rds->devs[i];
-        dev->ops_index = -1;
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
-
-        dev->aodev.rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
-        dev->aodev.callback = device_setup_iterate;
-        device_setup_iterate(egc,&dev->aodev);
-    }
-
-    rc = 0;
-    libxl__multidev_prepared(egc, &rds->multidev, rc);
-}
-
-
-static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
-{
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
-    EGC_GC;
-
-    if (aodev->rc != ERROR_REMUS_DEVICE_NOT_SUPPORTED &&
-        aodev->rc != ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
-        /* might be success or disaster */
-        goto out;
-
-    do {
-        dev->ops = remus_ops[++dev->ops_index];
-        if (!dev->ops) {
-            libxl_device_nic * nic = NULL;
-            libxl_device_disk * disk = NULL;
-            uint32_t domid;
-            int devid;
-            if (dev->kind == LIBXL__DEVICE_KIND_VIF) {
-                nic = (libxl_device_nic *)dev->backend_dev;
-                domid = nic->backend_domid;
-                devid = nic->devid;
-            } else if (dev->kind == LIBXL__DEVICE_KIND_VBD) {
-                disk = (libxl_device_disk *)dev->backend_dev;
-                domid = disk->backend_domid;
-                devid = libxl__device_disk_dev_number(disk->vdev, NULL, NULL);
-            } else {
-                LOG(ERROR,"device kind not handled by remus: %s",
-                    libxl__device_kind_to_string(dev->kind));
-                aodev->rc = ERROR_FAIL;
-                goto out;
-            }
-            LOG(ERROR,"device not handled by remus"
-                " (device=%s:%"PRId32"/%"PRId32")",
-                libxl__device_kind_to_string(dev->kind),
-                domid, devid);
-            aodev->rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
-            goto out;
-        }
-    } while (dev->ops->kind != dev->kind);
-
-    /* found the next ops_index to try */
-    assert(dev->aodev.callback == device_setup_iterate);
-    dev->ops->setup(egc,dev);
-    return;
-
- out:
-    libxl__multidev_one_callback(egc,aodev);
-}
-
-static void all_devices_setup_cb(libxl__egc *egc,
-                                 libxl__multidev *multidev,
-                                 int rc)
-{
-    STATE_AO_GC(multidev->ao);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
-
-    rds->callback(egc, rds, rc);
-}
-
-void libxl__remus_devices_teardown(libxl__egc *egc,
-                                   libxl__remus_devices_state *rds)
-{
-    int i;
-    libxl__remus_device *dev;
-
-    STATE_AO_GC(rds->ao);
-
-    libxl__multidev_begin(ao, &rds->multidev);
-    rds->multidev.callback = devices_teardown_cb;
-    for (i = 0; i < rds->num_devices; i++) {
-        dev = rds->devs[i];
-        if (!dev->ops || !dev->matched)
-            continue;
-
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
-        dev->ops->teardown(egc,dev);
-    }
-
-    libxl__multidev_prepared(egc, &rds->multidev, 0);
-}
-
-static void devices_teardown_cb(libxl__egc *egc,
-                                libxl__multidev *multidev,
-                                int rc)
-{
-    int i;
-
-    STATE_AO_GC(multidev->ao);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
-
-    /* clean nic */
-    for (i = 0; i < rds->num_nics; i++)
-        libxl_device_nic_dispose(&rds->nics[i]);
-    free(rds->nics);
-    rds->nics = NULL;
-    rds->num_nics = 0;
-
-    /* clean disk */
-    for (i = 0; i < rds->num_disks; i++)
-        libxl_device_disk_dispose(&rds->disks[i]);
-    free(rds->disks);
-    rds->disks = NULL;
-    rds->num_disks = 0;
-
-    cleanup_device_subkind(rds);
-
-    rds->callback(egc, rds, rc);
-}
-
-/*----- checkpointing APIs -----*/
-
-/* callbacks */
-
-static void devices_checkpoint_cb(libxl__egc *egc,
-                                  libxl__multidev *multidev,
-                                  int rc);
-
-/* API implementations */
-
-#define define_remus_checkpoint_api(api)                                \
-void libxl__remus_devices_##api(libxl__egc *egc,                        \
-                                libxl__remus_devices_state *rds)        \
-{                                                                       \
-    int i;                                                              \
-    libxl__remus_device *dev;                                           \
-                                                                        \
-    STATE_AO_GC(rds->ao);                                               \
-                                                                        \
-    libxl__multidev_begin(ao, &rds->multidev);                          \
-    rds->multidev.callback = devices_checkpoint_cb;                     \
-    for (i = 0; i < rds->num_devices; i++) {                            \
-        dev = rds->devs[i];                                             \
-        if (!dev->matched || !dev->ops->api)                            \
-            continue;                                                   \
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);\
-        dev->ops->api(egc,dev);                                         \
-    }                                                                   \
-                                                                        \
-    libxl__multidev_prepared(egc, &rds->multidev, 0);                   \
-}
-
-define_remus_checkpoint_api(postsuspend);
-
-define_remus_checkpoint_api(preresume);
-
-define_remus_checkpoint_api(commit);
-
-static void devices_checkpoint_cb(libxl__egc *egc,
-                                  libxl__multidev *multidev,
-                                  int rc)
-{
-    STATE_AO_GC(multidev->ao);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
-
-    rds->callback(egc, rds, rc);
-}
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index afe9b61..50b897d 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -26,30 +26,30 @@ typedef struct libxl__remus_drbd_disk {
     int ackwait;
 } libxl__remus_drbd_disk;
 
-int init_subkind_drbd_disk(libxl__remus_devices_state *rds)
+int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+    cds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
                                        libxl__xen_script_dir_path());
 
     return 0;
 }
 
-void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds)
+void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
     return;
 }
 
 /*----- helper functions, for async calls -----*/
 static void drbd_async_call(libxl__egc *egc,
-                            libxl__remus_device *dev,
-                            void func(libxl__remus_device *),
+                            libxl__checkpoint_device *dev,
+                            void func(libxl__checkpoint_device *),
                             libxl__ev_child_callback callback)
 {
     int pid = -1, rc;
     libxl__ao_device *aodev = &dev->aodev;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Fork and call */
     pid = libxl__ev_child_fork(gc, &aodev->child, callback);
@@ -82,21 +82,21 @@ static void match_async_exec_cb(libxl__egc *egc,
 
 /* implementations */
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev);
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev);
 
-static void drbd_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     match_async_exec(egc, dev);
 }
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int arraysize, nr = 0, rc;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aodev.aes;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* setup env & args */
     arraysize = 1;
@@ -107,12 +107,12 @@ static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
     arraysize = 3;
     nr = 0;
     GCNEW_ARRAY(aes->args, arraysize);
-    aes->args[nr++] = dev->rds->drbd_probe_script;
+    aes->args[nr++] = dev->cds->drbd_probe_script;
     aes->args[nr++] = disk->pdev_path;
     aes->args[nr++] = NULL;
     assert(nr <= arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
     aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
     aes->callback = match_async_exec_cb;
@@ -137,14 +137,14 @@ static void match_async_exec_cb(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *drbd_disk;
     const libxl_device_disk *disk = dev->backend_dev;
 
     STATE_AO_GC(aodev->ao);
 
     if (status) {
-        rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
         /* BUG: seems to assume that any exit status means `no match' */
         /* BUG: exit status will have been logged as an error */
         goto out;
@@ -169,10 +169,10 @@ out:
     aodev->callback(egc, aodev);
 }
 
-static void drbd_teardown(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_teardown(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *drbd_disk = dev->concrete_data;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     close(drbd_disk->ctl_fd);
     dev->aodev.rc = 0;
@@ -189,9 +189,9 @@ static void checkpoint_async_call_done(libxl__egc *egc,
 /* API implementations */
 
 /* this op will not wait and block, so implement as sync op */
-static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_postsuspend(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
 
@@ -205,16 +205,16 @@ static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
 }
 
 
-static void drbd_preresume_async(libxl__remus_device *dev);
+static void drbd_preresume_async(libxl__checkpoint_device *dev);
 
-static void drbd_preresume(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_preresume(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     drbd_async_call(egc, dev, drbd_preresume_async, checkpoint_async_call_done);
 }
 
-static void drbd_preresume_async(libxl__remus_device *dev)
+static void drbd_preresume_async(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
     int ackwait = rdd->ackwait;
@@ -233,7 +233,7 @@ static void checkpoint_async_call_done(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
 
     STATE_AO_GC(aodev->ao);
@@ -251,7 +251,7 @@ out:
     aodev->callback(egc, aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_drbd_disk = {
+const libxl__checkpoint_device_instance_ops remus_device_drbd_disk = {
     .kind = LIBXL__DEVICE_KIND_VBD,
     .setup = drbd_setup,
     .teardown = drbd_teardown,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0866433..fa85e5b 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -61,8 +61,8 @@ libxl_error = Enumeration("error", [
     (-15, "LOCK_FAIL"),
     (-16, "JSON_CONFIG_EMPTY"),
     (-17, "DEVICE_EXISTS"),
-    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
-    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
+    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
+    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
     (-20, "VNUMA_CONFIG_INVALID"),
     ], value_namespace = "")
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 11/29] adjust the indentation
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (9 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 10/29] rename remus device to checkpoint device Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-22 15:20   ` Ian Campbell
  2015-04-01  6:41 ` [RFC PATCH COLO v5 12/29] don't touch remus in checkpoint_device Yang Hongyang
                   ` (17 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_checkpoint_device.c | 23 ++++++++++++-----------
 tools/libxl/libxl_internal.h          | 21 ++++++++++++---------
 tools/libxl/libxl_remus.c             | 12 ++++++++----
 3 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 109cd23..0cfabc3 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -73,9 +73,9 @@ static void devices_teardown_cb(libxl__egc *egc,
 /* checkpoint device setup and teardown */
 
 static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
-                                              libxl__checkpoint_devices_state *cds,
-                                              libxl__device_kind kind,
-                                              void *libxl_dev)
+                libxl__checkpoint_devices_state *cds,
+                libxl__device_kind kind,
+                void *libxl_dev)
 {
     libxl__checkpoint_device *dev = NULL;
 
@@ -89,9 +89,10 @@ static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-                                libxl__checkpoint_devices_state *cds);
+                                     libxl__checkpoint_devices_state *cds);
 
-void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
+void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                     libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
@@ -137,7 +138,7 @@ out:
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-                                libxl__checkpoint_devices_state *cds)
+                                     libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
@@ -223,7 +224,7 @@ static void all_devices_setup_cb(libxl__egc *egc,
 }
 
 void libxl__checkpoint_devices_teardown(libxl__egc *egc,
-                                   libxl__checkpoint_devices_state *cds)
+                                        libxl__checkpoint_devices_state *cds)
 {
     int i;
     libxl__checkpoint_device *dev;
@@ -285,12 +286,12 @@ static void devices_checkpoint_cb(libxl__egc *egc,
 
 /* API implementations */
 
-#define define_checkpoint_api(api)                                \
-void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
-                                libxl__checkpoint_devices_state *cds)        \
+#define define_checkpoint_api(api)                                      \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                   \
+                                libxl__checkpoint_devices_state *cds)   \
 {                                                                       \
     int i;                                                              \
-    libxl__checkpoint_device *dev;                                           \
+    libxl__checkpoint_device *dev;                                      \
                                                                         \
     STATE_AO_GC(cds->ao);                                               \
                                                                         \
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7e7c3b3..4b8590c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2652,7 +2652,8 @@ typedef struct libxl__save_helper_state {
  * Each device type needs to implement the interfaces specified in
  * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the checkpoint device layer is shown below:
+ * The high-level control flow through the checkpoint device layer is shown
+ * below:
  *
  * xl remus
  *  |->  libxl_domain_remus_start
@@ -2713,7 +2714,8 @@ int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
-                                   libxl__checkpoint_devices_state *, int rc);
+                                        libxl__checkpoint_devices_state *,
+                                        int rc);
 
 /*
  * State associated with a checkpoint invocation, including parameters
@@ -2721,7 +2723,7 @@ typedef void libxl__checkpoint_callback(libxl__egc *,
  * save/restore machinery.
  */
 struct libxl__checkpoint_devices_state {
-    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
+    /*-- must be set by caller of libxl__checkpoint_device_(setup|teardown) --*/
 
     libxl__ao *ao;
     uint32_t domid;
@@ -2734,7 +2736,8 @@ struct libxl__checkpoint_devices_state {
     /*
      * this array is allocated before setup the checkpoint devices by the
      * checkpoint abstract layer.
-     * devs may be NULL, means there's no checkpoint devices that has been set up.
+     * devs may be NULL, means there's no checkpoint devices that has been
+     * set up.
      * the size of this array is 'num_devices', which is the total number
      * of libxl nic devices and disk devices(num_nics + num_disks).
      */
@@ -2794,15 +2797,15 @@ struct libxl__checkpoint_device {
 
 /* the following 5 APIs are async ops, call cds->callback when done */
 _hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
-                                        libxl__checkpoint_devices_state *cds);
+                libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
-                                           libxl__checkpoint_devices_state *cds);
+                libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
-                                              libxl__checkpoint_devices_state *cds);
+                libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
-                                            libxl__checkpoint_devices_state *cds);
+                libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
-                                         libxl__checkpoint_devices_state *cds);
+                libxl__checkpoint_devices_state *cds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 211216c..5afa618 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -21,9 +21,11 @@
 
 /*----- remus: setup the environment -----*/
 static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__checkpoint_devices_state *cds, int rc);
+                                    libxl__checkpoint_devices_state *cds,
+                                    int rc);
 static void libxl__remus_setup_failed(libxl__egc *egc,
-                                      libxl__checkpoint_devices_state *cds, int rc);
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
 
 void libxl__remus_setup(libxl__egc *egc,
                         libxl__domain_suspend_state *dss)
@@ -57,7 +59,8 @@ out:
 }
 
 static void libxl__remus_setup_done(libxl__egc *egc,
-                                    libxl__checkpoint_devices_state *cds, int rc)
+                                    libxl__checkpoint_devices_state *cds,
+                                    int rc)
 {
     libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
@@ -222,7 +225,8 @@ out:
 
 /*----- remus: wait a new checkpoint -----*/
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
-                                      libxl__domain_suspend_state *dss, int rc);
+                                      libxl__domain_suspend_state *dss,
+                                      int rc);
 static void remus_devices_commit_cb(libxl__egc *egc,
                                     libxl__checkpoint_devices_state *cds,
                                     int rc);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 12/29] don't touch remus in checkpoint_device
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (10 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 11/29] adjust the indentation Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 13/29] Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Checkpoint device is an abstract layer to do checkpoint.
COLO can also use it to do checkpoint. But there are
still some codes in checkpoint device which touch remus:
1. remus_ops: we use remus ops directly in checkpoint
   device. Store it in checkpoint device state.
2. concrete layer's private member: add a new structure
   remus state, and move them to remus state.
3. init/cleanup device subkind: we call (init|cleanup)_subkind_nic
   and (init|cleanup)_subkind_drbd_disk directly in checkpoint
   device. Call them before calling libxl__checkpoint_devices_setup()
   or after calling libxl__checkpoint_devices_teardown().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 tools/libxl/libxl.c                   |  2 +-
 tools/libxl/libxl_checkpoint_device.c | 52 ++------------------
 tools/libxl/libxl_dom.c               |  3 +-
 tools/libxl/libxl_internal.h          | 37 ++++++++++-----
 tools/libxl/libxl_netbuffer.c         | 51 +++++++++++---------
 tools/libxl/libxl_remus.c             | 89 +++++++++++++++++++++++++++--------
 tools/libxl/libxl_remus.h             |  5 +-
 tools/libxl/libxl_remus_disk_drbd.c   |  9 ++--
 8 files changed, 136 insertions(+), 112 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index bcbd961..7bc4fc4 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -892,7 +892,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Point of no return */
-    libxl__remus_setup(egc, dss);
+    libxl__remus_setup(egc, &dss->rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 0cfabc3..2b7318f 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,46 +17,6 @@
 
 #include "libxl_internal.h"
 
-extern const libxl__checkpoint_device_instance_ops remus_device_nic;
-extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
-static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
-    &remus_device_nic,
-    &remus_device_drbd_disk,
-    NULL,
-};
-
-/*----- helper functions -----*/
-
-static int init_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-    /* init device subkind-specific state in the libxl ctx */
-    int rc;
-    STATE_AO_GC(cds->ao);
-
-    if (libxl__netbuffer_enabled(gc)) {
-        rc = init_subkind_nic(cds);
-        if (rc) goto out;
-    }
-
-    rc = init_subkind_drbd_disk(cds);
-    if (rc) goto out;
-
-    rc = 0;
-out:
-    return rc;
-}
-
-static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-    /* cleanup device subkind-specific state in the libxl ctx */
-    STATE_AO_GC(cds->ao);
-
-    if (libxl__netbuffer_enabled(gc))
-        cleanup_subkind_nic(cds);
-
-    cleanup_subkind_drbd_disk(cds);
-}
-
 /*----- setup() and teardown() -----*/
 
 /* callbacks */
@@ -94,14 +54,10 @@ static void checkpoint_devices_setup(libxl__egc *egc,
 void libxl__checkpoint_devices_setup(libxl__egc *egc,
                                      libxl__checkpoint_devices_state *cds)
 {
-    int i, rc;
+    int i;
 
     STATE_AO_GC(cds->ao);
 
-    rc = init_device_subkind(cds);
-    if (rc)
-        goto out;
-
     cds->num_devices = 0;
     cds->num_nics = 0;
     cds->num_disks = 0;
@@ -134,7 +90,7 @@ void libxl__checkpoint_devices_setup(libxl__egc *egc,
     return;
 
 out:
-    cds->callback(egc, cds, rc);
+    cds->callback(egc, cds, 0);
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
@@ -172,7 +128,7 @@ static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
 
     do {
-        dev->ops = remus_ops[++dev->ops_index];
+        dev->ops = dev->cds->ops[++dev->ops_index];
         if (!dev->ops) {
             libxl_device_nic * nic = NULL;
             libxl_device_disk * disk = NULL;
@@ -271,8 +227,6 @@ static void devices_teardown_cb(libxl__egc *egc,
     cds->disks = NULL;
     cds->num_disks = 0;
 
-    cleanup_device_subkind(cds);
-
     cds->callback(egc, cds, rc);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 4693d32..e09a1eb 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1865,7 +1865,6 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     dss2->save_dm = 1;
 
     if (r_info != NULL) {
-        dss->interval = r_info->interval;
         if (libxl_defbool_val(r_info->compression))
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
@@ -2051,7 +2050,7 @@ static void domain_suspend_done(libxl__egc *egc,
                            dss2->guest_evtchn.port, &dss2->guest_evtchn_lockfd);
 
     if (dss->remus) {
-        libxl__remus_teardown(egc, dss, rc);
+        libxl__remus_teardown(egc, &dss->rs, rc);
         return;
     }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4b8590c..e12c7b5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2729,6 +2729,8 @@ struct libxl__checkpoint_devices_state {
     uint32_t domid;
     libxl__checkpoint_callback *callback;
     int device_kind_flags;
+    /* The ops must be pointer array, and the last ops must be NULL */
+    const libxl__checkpoint_device_instance_ops **ops;
 
     /*----- private for abstract layer only -----*/
 
@@ -2749,16 +2751,6 @@ struct libxl__checkpoint_devices_state {
     int num_disks;
 
     libxl__multidev multidev;
-
-    /*----- private for concrete (device-specific) layer only -----*/
-
-    /* private for nic device subkind ops */
-    char *netbufscript;
-    struct nl_sock *nlsock;
-    struct nl_cache *qdisc_cache;
-
-    /* private for drbd disk subkind ops */
-    char *drbd_probe_script;
 };
 
 /*
@@ -2806,6 +2798,27 @@ _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
                 libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
                 libxl__checkpoint_devices_state *cds);
+
+/*----- Remus related state structure -----*/
+typedef struct libxl__remus_state libxl__remus_state;
+struct libxl__remus_state {
+    /* private */
+    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
+    int interval; /* checkpoint interval */
+
+    /* abstract layer */
+    libxl__checkpoint_devices_state cds;
+
+    /*----- private for concrete (device-specific) layer only -----*/
+    /* private for nic device subkind ops */
+    char *netbufscript;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+
+    /* private for drbd disk subkind ops */
+    char *drbd_probe_script;
+};
+
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2869,9 +2882,7 @@ struct libxl__domain_suspend_state {
     libxl__domain_suspend_state2 dss2;
     int hvm;
     int xcflags;
-    libxl__checkpoint_devices_state cds;
-    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
-    int interval; /* checkpoint interval (for Remus) */
+    libxl__remus_state rs;
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 2d668dd..69a261b 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -41,18 +41,19 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     int rc, ret;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
 
     STATE_AO_GC(cds->ao);
 
-    cds->nlsock = nl_socket_alloc();
-    if (!cds->nlsock) {
+    rs->nlsock = nl_socket_alloc();
+    if (!rs->nlsock) {
         LOG(ERROR, "cannot allocate nl socket");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    ret = nl_connect(cds->nlsock, NETLINK_ROUTE);
+    ret = nl_connect(rs->nlsock, NETLINK_ROUTE);
     if (ret) {
         LOG(ERROR, "failed to open netlink socket: %s",
             nl_geterror(ret));
@@ -61,7 +62,7 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
     }
 
     /* get list of all qdiscs installed on network devs. */
-    ret = rtnl_qdisc_alloc_cache(cds->nlsock, &cds->qdisc_cache);
+    ret = rtnl_qdisc_alloc_cache(rs->nlsock, &rs->qdisc_cache);
     if (ret) {
         LOG(ERROR, "failed to allocate qdisc cache: %s",
             nl_geterror(ret));
@@ -70,10 +71,10 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
     }
 
     if (dss->remus->netbufscript) {
-        cds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+        rs->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
     } else {
-        cds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
-                                      libxl__xen_script_dir_path());
+        rs->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                     libxl__xen_script_dir_path());
     }
 
     rc = 0;
@@ -84,20 +85,22 @@ out:
 
 void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
+
     STATE_AO_GC(cds->ao);
 
     /* free qdisc cache */
-    if (cds->qdisc_cache) {
-        nl_cache_clear(cds->qdisc_cache);
-        nl_cache_free(cds->qdisc_cache);
-        cds->qdisc_cache = NULL;
+    if (rs->qdisc_cache) {
+        nl_cache_clear(rs->qdisc_cache);
+        nl_cache_free(rs->qdisc_cache);
+        rs->qdisc_cache = NULL;
     }
 
     /* close & free nlsock */
-    if (cds->nlsock) {
-        nl_close(cds->nlsock);
-        nl_socket_free(cds->nlsock);
-        cds->nlsock = NULL;
+    if (rs->nlsock) {
+        nl_close(rs->nlsock);
+        nl_socket_free(rs->nlsock);
+        rs->nlsock = NULL;
     }
 }
 
@@ -150,13 +153,14 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
     struct rtnl_qdisc *qdisc = NULL;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
 
     STATE_AO_GC(cds->ao);
 
     /* Now that we have brought up REMUS_IFB device with plug qdisc for
      * this vif, so we need to refill the qdisc cache.
      */
-    ret = nl_cache_refill(cds->nlsock, cds->qdisc_cache);
+    ret = nl_cache_refill(rs->nlsock, rs->qdisc_cache);
     if (ret) {
         LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
         rc = ERROR_FAIL;
@@ -164,7 +168,7 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
     }
 
     /* get a handle to the REMUS_IFB interface */
-    ret = rtnl_link_get_kernel(cds->nlsock, 0, remus_nic->ifb, &ifb);
+    ret = rtnl_link_get_kernel(rs->nlsock, 0, remus_nic->ifb, &ifb);
     if (ret) {
         LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
             nl_geterror(ret));
@@ -187,7 +191,7 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
      * There is no need to explicitly free this qdisc as its just a
      * reference from the qdisc cache we allocated earlier.
      */
-    qdisc = rtnl_qdisc_get_by_parent(cds->qdisc_cache, ifindex, TC_H_ROOT);
+    qdisc = rtnl_qdisc_get_by_parent(rs->qdisc_cache, ifindex, TC_H_ROOT);
     if (qdisc) {
         const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
         /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
@@ -238,11 +242,12 @@ static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__checkpoint_devices_state *cds = dev->cds;
     libxl__async_exec_state *aes = &dev->aodev.aes;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
 
     STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    char *const script = libxl__strdup(gc, cds->netbufscript);
+    char *const script = libxl__strdup(gc, rs->netbufscript);
     const uint32_t domid = cds->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
@@ -333,6 +338,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
     const char *out_path_base, *hotplug_error = NULL;
     int rc;
 
@@ -373,7 +379,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            cds->netbufscript, vif, hotplug_error);
+            rs->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
@@ -444,6 +450,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
                            int buffer_op)
 {
     int rc, ret;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
 
     STATE_AO_GC(cds->ao);
 
@@ -457,7 +464,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
         goto out;
     }
 
-    ret = rtnl_qdisc_add(cds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    ret = rtnl_qdisc_add(rs->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
     if (ret) {
         rc = ERROR_FAIL;
         goto out;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 5afa618..e393c2e 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -18,6 +18,45 @@
 #include "libxl_internal.h"
 #include "libxl_remus.h"
 
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
+    &remus_device_drbd_disk,
+    NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(cds);
+        if (rc) goto out;
+    }
+
+    rc = init_subkind_drbd_disk(cds);
+    if (rc) goto out;
+
+    rc = 0;
+out:
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(cds);
+
+    cleanup_subkind_drbd_disk(cds);
+}
 
 /*----- remus: setup the environment -----*/
 static void libxl__remus_setup_done(libxl__egc *egc,
@@ -27,11 +66,12 @@ static void libxl__remus_setup_failed(libxl__egc *egc,
                                       libxl__checkpoint_devices_state *cds,
                                       int rc);
 
-void libxl__remus_setup(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss)
+void libxl__remus_setup(libxl__egc *egc, libxl__remus_state *rs)
 {
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
     /* Convenience aliases */
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &rs->cds;
     const libxl_domain_remus_info *const info = dss->remus;
 
     STATE_AO_GC(dss->ao);
@@ -50,19 +90,24 @@ void libxl__remus_setup(libxl__egc *egc,
     cds->ao = ao;
     cds->domid = dss->domid;
     cds->callback = libxl__remus_setup_done;
+    cds->ops = remus_ops;
+    rs->interval = info->interval;
+
+    if (init_device_subkind(cds))
+        goto out;
 
     libxl__checkpoint_devices_setup(egc, cds);
     return;
 
 out:
-    libxl__remus_setup_failed(egc, cds, ERROR_FAIL);
+    dss->callback(egc, dss, ERROR_FAIL);
 }
 
 static void libxl__remus_setup_done(libxl__egc *egc,
                                     libxl__checkpoint_devices_state *cds,
                                     int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -80,13 +125,15 @@ static void libxl__remus_setup_failed(libxl__egc *egc,
                                       libxl__checkpoint_devices_state *cds,
                                       int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
         LOG(ERROR, "Remus: failed to teardown device after setup failed"
             " for guest with domid %u, rc %d", dss->domid, rc);
 
+    cleanup_device_subkind(cds);
+
     dss->callback(egc, dss, rc);
 }
 
@@ -97,9 +144,11 @@ static void remus_teardown_done(libxl__egc *egc,
                                 int rc);
 
 void libxl__remus_teardown(libxl__egc *egc,
-                           libxl__domain_suspend_state *dss,
+                           libxl__remus_state *rs,
                            int rc)
 {
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
     EGC_GC;
 
     /*
@@ -110,21 +159,23 @@ void libxl__remus_teardown(libxl__egc *egc,
      */
     LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
         " teardown Remus devices...", rc);
-    dss->cds.callback = remus_teardown_done;
-    libxl__checkpoint_devices_teardown(egc, &dss->cds);
+    dss->rs.cds.callback = remus_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->rs.cds);
 }
 
 static void remus_teardown_done(libxl__egc *egc,
                                 libxl__checkpoint_devices_state *cds,
                                 int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
         LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
             " rc %d", dss->domid, rc);
 
+    cleanup_device_subkind(cds);
+
     dss->callback(egc, dss, rc);
 }
 
@@ -157,7 +208,7 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     if (!ok)
         goto out;
 
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &dss->rs.cds;
     cds->callback = remus_devices_postsuspend_cb;
     libxl__checkpoint_devices_postsuspend(egc, cds);
     return;
@@ -171,7 +222,7 @@ static void remus_devices_postsuspend_cb(libxl__egc *egc,
                                          int rc)
 {
     int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
 
     if (rc)
         goto out;
@@ -195,7 +246,7 @@ void libxl__remus_domain_resume_callback(void *data)
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &dss->rs.cds;
     cds->callback = remus_devices_preresume_cb;
     libxl__checkpoint_devices_preresume(egc, cds);
 }
@@ -205,7 +256,7 @@ static void remus_devices_preresume_cb(libxl__egc *egc,
                                        int rc)
 {
     int ok = 0;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -252,7 +303,7 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
     /* Convenience aliases */
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &dss->rs.cds;
 
     STATE_AO_GC(dss->ao);
 
@@ -274,7 +325,7 @@ static void remus_devices_commit_cb(libxl__egc *egc,
                                     libxl__checkpoint_devices_state *cds,
                                     int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
 
     STATE_AO_GC(dss->ao);
 
@@ -292,9 +343,9 @@ static void remus_devices_commit_cb(libxl__egc *egc,
      */
 
     /* Set checkpoint interval timeout */
-    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+    rc = libxl__ev_time_register_rel(gc, &dss->rs.checkpoint_timeout,
                                      remus_next_checkpoint,
-                                     dss->interval);
+                                     dss->rs.interval);
 
     if (rc)
         goto out;
@@ -309,7 +360,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs)
 {
     libxl__domain_suspend_state *dss =
-                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+                            CONTAINER_OF(ev, *dss, rs.checkpoint_timeout);
 
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_remus.h b/tools/libxl/libxl_remus.h
index 53e5e81..15bbbe8 100644
--- a/tools/libxl/libxl_remus.h
+++ b/tools/libxl/libxl_remus.h
@@ -16,10 +16,9 @@
 #ifndef LIBXL_REMUS_H
 #define LIBXL_REMUS_H
 
-void libxl__remus_setup(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss);
+void libxl__remus_setup(libxl__egc *egc, libxl__remus_state *rs);
 void libxl__remus_teardown(libxl__egc *egc,
-                           libxl__domain_suspend_state *dss,
+                           libxl__remus_state *rs,
                            int rc);
 void libxl__remus_domain_suspend_callback(void *data);
 void libxl__remus_domain_resume_callback(void *data);
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 50b897d..433e4b0 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -28,10 +28,12 @@ typedef struct libxl__remus_drbd_disk {
 
 int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
+
     STATE_AO_GC(cds->ao);
 
-    cds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
-                                       libxl__xen_script_dir_path());
+    rs->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                      libxl__xen_script_dir_path());
 
     return 0;
 }
@@ -96,6 +98,7 @@ static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
     int arraysize, nr = 0, rc;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aodev.aes;
+    libxl__remus_state *rs = CONTAINER_OF(dev->cds, *rs, cds);
     STATE_AO_GC(dev->cds->ao);
 
     /* setup env & args */
@@ -107,7 +110,7 @@ static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
     arraysize = 3;
     nr = 0;
     GCNEW_ARRAY(aes->args, arraysize);
-    aes->args[nr++] = dev->cds->drbd_probe_script;
+    aes->args[nr++] = rs->drbd_probe_script;
     aes->args[nr++] = disk->pdev_path;
     aes->args[nr++] = NULL;
     assert(nr <= arraysize);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 13/29] Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (11 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 12/29] don't touch remus in checkpoint_device Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 14/29] Allow slave sends data to master Yang Hongyang
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

 Currently, all callbacks return an integer value or void. We cannot
 return some data to xc via callback. Update libxl_save_msgs_gen.pl
 to support this case.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h       |  3 ++
 tools/libxl/libxl_save_callout.c   | 31 ++++++++++++++++++
 tools/libxl/libxl_save_helper.c    | 17 ++++++++++
 tools/libxl/libxl_save_msgs_gen.pl | 65 ++++++++++++++++++++++++++++++++++----
 4 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e12c7b5..7bfabd8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3180,6 +3180,9 @@ _hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
  * When they are ready to indicate completion, they call this. */
 void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
                            libxl__save_helper_state *shs, int return_value);
+void libxl__xc_domain_saverestore_async_callback_done_with_data(libxl__egc *egc,
+                           libxl__save_helper_state *shs,
+                           const void *data, uint64_t size);
 
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 40b25e4..477e633 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -145,6 +145,15 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
     shs->egc = 0;
 }
 
+void libxl__xc_domain_saverestore_async_callback_done_with_data(libxl__egc *egc,
+                           libxl__save_helper_state *shs,
+                           const void *data, uint64_t size)
+{
+    shs->egc = egc;
+    libxl__srm_callout_sendreply_data(data, size, shs);
+    shs->egc = 0;
+}
+
 /*----- helper execution -----*/
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
@@ -370,6 +379,28 @@ void libxl__srm_callout_sendreply(int r, void *user)
         helper_failed(egc, shs, ERROR_FAIL);
 }
 
+void libxl__srm_callout_sendreply_data(const void *data, uint64_t size, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(shs->ao);
+    int errnoval;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   &size, sizeof(size), shs->stdin_what,
+                                   "callback return data length");
+    if (errnoval)
+        goto out;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   data, size, shs->stdin_what,
+                                   "callback return data");
+
+out:
+    if (errnoval)
+        helper_failed(egc, shs, ERROR_FAIL);
+}
+
 void libxl__srm_callout_callback_log(uint32_t level, uint32_t errnoval,
                   const char *context, const char *formatted, void *user)
 {
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 74826a1..44c5807 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -155,6 +155,23 @@ int helper_getreply(void *user)
     return v;
 }
 
+uint8_t *helper_getreply_data(void *user)
+{
+    uint64_t size;
+    int r = read_exactly(0, &size, sizeof(size));
+    uint8_t *data;
+
+    if (r <= 0)
+        exit(-2);
+
+    data = helper_allocbuf(size, user);
+    r = read_exactly(0, data, size);
+    if (r <= 0)
+        exit(-2);
+
+    return data;
+}
+
 /*----- other callbacks -----*/
 
 static int toolstack_save_fd;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 6b4b65e..41ee000 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -15,6 +15,7 @@ our @msgs = (
     #         and its null-ness needs to be passed through to the helper's xc
     #   W  - needs a return value; callback is synchronous
     #   A  - needs a return value; callback is asynchronous
+    #   B  - return value is an pointer
     [  1, 'sr',     "log",                   [qw(uint32_t level
                                                  uint32_t errnoval
                                                  STRING context
@@ -99,23 +100,28 @@ our $libxl = "libxl__srm";
 our $callback = "${libxl}_callout_callback";
 our $receiveds = "${libxl}_callout_received";
 our $sendreply = "${libxl}_callout_sendreply";
+our $sendreply_data = "${libxl}_callout_sendreply_data";
 our $getcallbacks = "${libxl}_callout_get_callbacks";
 our $enumcallbacks = "${libxl}_callout_enumcallbacks";
 sub cbtype ($) { "${libxl}_".$_[0]."_autogen_callbacks"; };
 
 f_decl($sendreply, 'callout', 'void', "(int r, void *user)");
+f_decl($sendreply_data, 'callout', 'void',
+       "(const void *data, uint64_t size, void *user)");
 
 our $helper = "helper";
 our $encode = "${helper}_stub";
 our $allocbuf = "${helper}_allocbuf";
 our $transmit = "${helper}_transmitmsg";
 our $getreply = "${helper}_getreply";
+our $getreply_data = "${helper}_getreply_data";
 our $setcallbacks = "${helper}_setcallbacks";
 
 f_decl($allocbuf, 'helper', 'unsigned char *', '(int len, void *user)');
 f_decl($transmit, 'helper', 'void',
        '(unsigned char *msg_freed, int len, void *user)');
 f_decl($getreply, 'helper', 'int', '(void *user)');
+f_decl($getreply_data, 'helper', 'uint8_t *', '(void *user)');
 
 sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
 
@@ -259,12 +265,36 @@ foreach my $msginfo (@msgs) {
 
     $f_more_sr->("    case $msgnum: { /* $name */\n");
     if ($flags =~ m/W/) {
-        $f_more_sr->("        int r;\n");
+        if ($flags =~ m/B/) {
+            $f_more_sr->("        uint8_t *data;\n".
+                         "        uint64_t size;\n");
+        } else {
+            $f_more_sr->("        int r;\n");
+        }
     }
 
-    my $c_rtype_helper = $flags =~ m/[WA]/ ? 'int' : 'void';
-    my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_rtype_helper;
+    if ($flags =~ m/[WA]/) {
+        if ($flags =~ m/B/) {
+            $c_rtype_helper = 'uint8_t *'
+        } else {
+            $c_rtype_helper = 'int'
+        }
+    } else {
+        $c_rtype_helper = 'void';
+    }
+    my $c_rtype_callout;
+    if ($flags =~ m/W/) {
+        if ($flags =~ m/B/) {
+            $c_rtype_callout = 'uint8_t *';
+        } else {
+            $c_rtype_callout = 'int';
+        }
+    } else {
+        $c_rtype_callout = 'void';
+    }
     my $c_decl = '(';
+    my $c_helper_decl = '';
     my $c_callback_args = '';
 
     f_more("${encode}_$name",
@@ -305,7 +335,15 @@ END_ALWAYS
         f_more("${encode}_$name", "	${typeid}_put(buf, &len, $c_args);\n");
     }
     $f_more_sr->($c_recv);
+    $c_helper_decl = $c_decl;
+    if ($flags =~ m/W/ and $flags =~ m/B/) {
+        $c_decl .= "uint64_t *size, "
+    }
     $c_decl .= "void *user)";
+    $c_helper_decl .= "void *user)";
+    if ($flags =~ m/W/ and $flags =~ m/B/) {
+        $c_callback_args .= "&size, "
+    }
     $c_callback_args .= "user";
 
     $f_more_sr->("        if (msg != endmsg) return 0;\n");
@@ -326,10 +364,12 @@ END_ALWAYS
     my $c_make_callback = "$c_callback($c_callback_args)";
     if ($flags !~ m/W/) {
 	$f_more_sr->("        $c_make_callback;\n");
+    } elsif ($flags =~ m/B/) {
+        $f_more_sr->("        data = $c_make_callback;\n".
+                     "        $sendreply_data(data, size, user);\n");
     } else {
         $f_more_sr->("        r = $c_make_callback;\n".
                      "        $sendreply(r, user);\n");
-	f_decl($sendreply, 'callout', 'void', '(int r, void *user)');
     }
     if ($flags =~ m/x/) {
         my $c_v = "(1u<<$msgnum)";
@@ -340,7 +380,7 @@ END_ALWAYS
     }
     $f_more_sr->("        return 1;\n    }\n\n");
     f_decl("${callback}_$name", 'callout', $c_rtype_callout, $c_decl);
-    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_decl);
+    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_helper_decl);
     f_more("${encode}_$name",
 "        if (buf) break;
         buf = ${helper}_allocbuf(len, user);
@@ -352,12 +392,23 @@ END_ALWAYS
     ${transmit}(buf, len, user);
 ");
     if ($flags =~ m/[WA]/) {
-	f_more("${encode}_$name",
-               (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
+        if ($flags =~ m/B/) {
+            f_more("${encode}_$name",
+                   (<<END_ALWAYS.($debug ? <<END_DEBUG : '')));
+    uint8_t *r = ${helper}_getreply_data(user);
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: $name got reply data\\n");
+END_DEBUG
+        } else {
+            f_more("${encode}_$name",
+                   (<<END_ALWAYS.($debug ? <<END_DEBUG : '')));
     int r = ${helper}_getreply(user);
 END_ALWAYS
     fprintf(stderr,"libxl-save-helper: $name got reply %d\\n",r);
 END_DEBUG
+    }
+
+    f_more("${encode}_$name", (<<END_ALWAYS));
     return r;
 END_ALWAYS
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 14/29] Allow slave sends data to master
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (12 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 13/29] Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 15/29] secondary vm suspend/resume/checkpoint code Yang Hongyang
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

In colo mode, slave needs to send data to master, but the io_fd
only can be written in master, and only can be read in slave.
Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |  2 +-
 tools/libxl/libxl_create.c   | 14 ++++++++++----
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_types.idl  |  7 +++++++
 tools/libxl/xl_cmdimpl.c     |  7 +++++++
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 7bc4fc4..40a49c7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -883,7 +883,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->callback = remus_failover_cb;
     dss->domid = domid;
     dss->fd = send_fd;
-    /* TODO do something with recv_fd */
+    dss->recv_fd = recv_fd;
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index af04248..392420f 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1513,8 +1513,8 @@ static void domain_create_cb(libxl__egc *egc,
                              int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-                            uint32_t *domid,
-                            int restore_fd, int checkpointed_stream,
+                            uint32_t *domid, int restore_fd,
+                            int send_fd, int checkpointed_stream,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1527,6 +1527,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.send_fd = send_fd;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
@@ -1555,7 +1556,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, -1, 0,
                             ao_how, aop_console_how);
 }
 
@@ -1565,7 +1566,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, restore_fd,
+    int send_fd = -1;
+
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
+        send_fd = params->send_fd;
+
+    return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
                             params->checkpointed_stream, ao_how, aop_console_how);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7bfabd8..971d975 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2874,6 +2874,7 @@ struct libxl__domain_suspend_state {
 
     uint32_t domid;
     int fd;
+    int recv_fd;
     libxl_domain_type type;
     int live;
     int debug;
@@ -3140,6 +3141,7 @@ struct libxl__domain_create_state {
     libxl_domain_config *guest_config;
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd;
+    int send_fd;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index fa85e5b..292d754 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -196,6 +196,12 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
     (3, "reference_tsc"),
     ])
 
+libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
+    (0, "NONE"),
+    (1, "REMUS"),
+    (2, "COLO"),
+    ], init_val = 0)
+
 #
 # Complex libxl types
 #
@@ -344,6 +350,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
+    ("send_fd", integer),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 394b55d..4574d05 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -154,6 +154,7 @@ struct domain_create {
     const char *extra_config; /* extra config string */
     const char *restore_file;
     int migrate_fd; /* -1 means none */
+    int send_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
 };
 
@@ -2510,6 +2511,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
     void *config_data = 0;
     int config_len = 0;
     int restore_fd = -1;
+    int send_fd = -1;
     const libxl_asyncprogress_how *autoconnect_console_how;
     struct save_file_header hdr;
 
@@ -2526,6 +2528,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
         if (migrate_fd >= 0) {
             restore_source = "<incoming migration stream>";
             restore_fd = migrate_fd;
+            send_fd = dom_info->send_fd;
         } else {
             restore_source = restore_file;
             restore_fd = open(restore_file, O_RDONLY);
@@ -2700,6 +2703,7 @@ start:
         libxl_domain_restore_params_init(&params);
 
         params.checkpointed_stream = dom_info->checkpointed_stream;
+        params.send_fd = send_fd;
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
                                           &params,
@@ -4243,6 +4247,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.monitor = monitor;
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
+    dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = remus;
 
@@ -4413,6 +4418,7 @@ int main_restore(int argc, char **argv)
     dom_info.config_file = config_file;
     dom_info.restore_file = checkpoint_file;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
@@ -4859,6 +4865,7 @@ int main_create(int argc, char **argv)
     dom_info.config_file = filename;
     dom_info.extra_config = extra_config;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 15/29] secondary vm suspend/resume/checkpoint code
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (13 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 14/29] Allow slave sends data to master Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 16/29] primary vm suspend/get_dirty_pfn/resume/checkpoint code Yang Hongyang
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send LIBXL_COLO_SVM_READY to master.
   b. If it is not resumed the first time, call libxl__checkpoint_devices_preresume().
   c. If it is resumed the first time, call libxl__xc_domain_restore_done()
      to build the secondary vm. We should also enable secondary vm's logdirty.
      Otherwise, call libxl__domain_resume() to resume secondary vm.
   d. If it is resumed the first time, call libxl__checkpoint_devices_setup()
      to setup checkpoint devices.
   e. Send LIBXL_COLO_SVM_RESUMED to master.
2. Wait a new checkpoint
   a. Call libxl__checkpoint_devices_commit().
   a. Read LIBXL_COLO_NEW_CHECKPOINT from master.
3. Suspend secondary vm
   a. Suspend secondary vm.
   b. Call libxl__checkpoint_devices_postsuspend().
   c. Get secondary vm's dirty page information.
   d. Send LIBXL_COLO_SVM_SUSPENDED to master.
   e. Send secondary vm's dirty page information to master(count + pfn list).

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     |   20 +
 tools/libxl/Makefile               |    1 +
 tools/libxl/libxl_colo.h           |   38 ++
 tools/libxl/libxl_colo_restore.c   | 1158 ++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_create.c         |  116 +++-
 tools/libxl/libxl_dom.c            |    2 +-
 tools/libxl/libxl_internal.h       |   23 +
 tools/libxl/libxl_save_callout.c   |    6 +-
 tools/libxl/libxl_save_msgs_gen.pl |    6 +-
 9 files changed, 1363 insertions(+), 7 deletions(-)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_restore.c

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 601b108..6e621a6 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -93,6 +93,26 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
+    /* Called after a new checkpoint to suspend the guest.
+     */
+    int (*suspend)(void* data);
+
+    /* Called after the secondary vm is ready to resume.
+     * Callback function resumes the guest & the device model,
+     *  returns to xc_domain_restore.
+     */
+    int (*postcopy)(void* data);
+
+    /* callback to wait a new checkpoint
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint */
+    int (*checkpoint)(void* data);
+
+    /* Enable qemu-dm logging dirty pages to xen */
+    int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */
+
     /* callback to restore toolstack specific data */
     int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
             uint32_t size, void* data);
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 1e27754..8acfd5d 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -57,6 +57,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_colo_restore.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
new file mode 100644
index 0000000..91df275
--- /dev/null
+++ b/tools/libxl/libxl_colo.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_COLO_H
+#define LIBXL_COLO_H
+
+/*
+ * values to control suspend/resume primary vm and secondary vm
+ * at the same time
+ */
+enum {
+    LIBXL_COLO_NEW_CHECKPOINT = 1,
+    LIBXL_COLO_SVM_SUSPENDED,
+    LIBXL_COLO_SVM_READY,
+    LIBXL_COLO_SVM_RESUMED,
+};
+
+extern void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+                                     int ret, int retval, int errnoval);
+extern void libxl__colo_restore_setup(libxl__egc *egc,
+                                      libxl__colo_restore_state *crs);
+extern void libxl__colo_restore_teardown(libxl__egc *egc,
+                                         libxl__colo_restore_state *crs,
+                                         int rc);
+
+#endif
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
new file mode 100644
index 0000000..7b825d4
--- /dev/null
+++ b/tools/libxl/libxl_colo_restore.c
@@ -0,0 +1,1158 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+#include "libxl_bitops.h"
+
+#define XC_PAGE_SHIFT           12
+#define PAGE_SHIFT              XC_PAGE_SHIFT
+#define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1))
+#define NRPAGES(x) (ROUNDUP(x, PAGE_SHIFT) >> PAGE_SHIFT)
+
+enum {
+    LIBXL_COLO_SETUPED,
+    LIBXL_COLO_SUSPENDED,
+    LIBXL_COLO_RESUMED,
+};
+
+typedef struct libxl__colo_restore_checkpoint_state libxl__colo_restore_checkpoint_state;
+struct libxl__colo_restore_checkpoint_state {
+    xc_hypercall_buffer_t _dirty_bitmap;
+    xc_hypercall_buffer_t *dirty_bitmap;
+    unsigned long p2m_size;
+    libxl__domain_suspend_state2 dss2;
+    libxl__datacopier_state dc;
+    uint8_t section;
+    libxl__logdirty_switch lds;
+    libxl__colo_restore_state *crs;
+    int status;
+    bool preresume;
+    /* used for teardown */
+    int teardown_devices;
+    int saved_rc;
+
+    void (*callback)(libxl__egc *,
+                     libxl__colo_restore_checkpoint_state *,
+                     int);
+
+    /*
+     * 0: secondary vm's dirty bitmap for domain @domid
+     * 1: secondary vm is ready(domain @domid)
+     * 2: secondary vm is resumed(domain @domid)
+     * 3. new checkpoint is triggered(domain @domid)
+     */
+    const char *copywhat[4];
+};
+
+
+static void libxl__colo_restore_domain_resume_callback(void *data);
+static void libxl__colo_restore_domain_checkpoint_callback(void *data);
+static void libxl__colo_restore_domain_suspend_callback(void *data);
+
+static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    NULL,
+};
+
+/* ===================== colo: common functions ===================== */
+static void colo_enable_logdirty(libxl__colo_restore_state *crs, libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid,
+                          XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                          NULL, 0, NULL, 0, NULL) < 0) {
+        LOG(ERROR, "cannot enable secondary vm's logdirty");
+        lds->callback(egc, lds, ERROR_FAIL);
+        return;
+    }
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(domid, 1, lds, egc);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_disable_logdirty(libxl__colo_restore_state *crs,
+                                  libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_OFF,
+                          NULL, 0, NULL, 0, NULL) < 0)
+        LOG(WARN, "cannot disable secondary vm's logdirty");
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(domid, 0, lds, egc);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_resume_vm(libxl__egc *egc,
+                           libxl__colo_restore_checkpoint_state *crcs,
+                           int restore_device_model)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (!crs->saved_cb) {
+        /* TODO: sync mmu for hvm? */
+        if (restore_device_model) {
+            rc = libxl__domain_restore(gc, crs->domid);
+            if (rc) {
+                LOG(ERROR, "cannot restore device model for secondary vm");
+                crcs->callback(egc, crcs, rc);
+                return;
+            }
+        }
+        rc = libxl__domain_resume(gc, crs->domid, 0);
+        if (rc)
+            LOG(ERROR, "cannot resume secondary vm");
+
+        crcs->callback(egc, crcs, rc);
+        return;
+    }
+
+    /*
+     * TODO: get store mfn and console mfn
+     *  We should call the callback restore_results in
+     *  xc_domain_restore() before resuming the guest.
+     */
+    libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
+
+    return;
+}
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    rc = 0;
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+}
+
+
+/* ================ colo: setup restore environment ================ */
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid);
+
+static int init_dss2(libxl__domain_suspend_state2 *dss2)
+{
+    int rc = ERROR_FAIL;
+    libxl_domain_type type;
+
+    STATE_AO_GC(dss2->ao);
+
+    type = libxl__domain_type(gc, dss2->domid);
+    if (type == LIBXL_DOMAIN_TYPE_INVALID)
+        goto out;
+
+    libxl__xswait_init(&dss2->pvcontrol);
+    libxl__ev_evtchn_init(&dss2->guest_evtchn);
+    libxl__ev_xswatch_init(&dss2->guest_watch);
+    libxl__ev_time_init(&dss2->guest_timeout);
+
+    if (type == LIBXL_DOMAIN_TYPE_HVM)
+        dss2->hvm = 1;
+    else
+        dss2->hvm = 0;
+
+    dss2->guest_evtchn.port = -1;
+    dss2->guest_evtchn_lockfd = -1;
+    dss2->guest_responded = 0;
+    dss2->dm_savefile = libxl__device_model_savefile(gc, dss2->domid);
+    dss2->save_dm = 0;
+
+    /* Secondary vm is not created, so we cannot get evtchn port */
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+void libxl__colo_restore_setup(libxl__egc *egc,
+                               libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs;
+    DECLARE_HYPERCALL_BUFFER(unsigned long, dirty_bitmap);
+    int rc = ERROR_FAIL;
+    int bsize;
+
+    /* Convenience aliases */
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->shs.callbacks.restore.a;
+    const int domid = crs->domid;
+
+    STATE_AO_GC(crs->ao);
+
+    GCNEW(crcs);
+    crs->crcs = crcs;
+    crcs->crs = crs;
+
+    if (xc_domain_maximum_gpfn(CTX->xch, domid, &crcs->p2m_size) < 0) {
+        rc = ERROR_FAIL;
+        goto err;
+    }
+
+    crcs->copywhat[0] = GCSPRINTF("secondary vm's dirty bitmap for domain %"PRIu32,
+                                  domid);
+    crcs->copywhat[1] = GCSPRINTF("secondary vm is ready(domain %"PRIu32")",
+                                  domid);
+    crcs->copywhat[2] = GCSPRINTF("secondary vm is resumed(domain %"PRIu32")",
+                                  domid);
+    crcs->copywhat[3] = GCSPRINTF("new checkpoint is triggered(domain %"PRIu32")",
+                                  domid);
+
+    bsize = bitmap_size(crcs->p2m_size);
+    dirty_bitmap = xc_hypercall_buffer_alloc_pages(CTX->xch, dirty_bitmap,
+                                                   NRPAGES(bsize));
+    if (!dirty_bitmap) {
+        rc = ERROR_NOMEM;
+        goto err;
+    }
+    memset(dirty_bitmap, 0, bsize);
+    crcs->_dirty_bitmap = *HYPERCALL_BUFFER(dirty_bitmap);
+    crcs->dirty_bitmap = &crcs->_dirty_bitmap;
+
+    /* setup dss2 */
+    crcs->dss2.ao = ao;
+    crcs->dss2.domid = domid;
+    if (init_dss2(&crcs->dss2))
+        goto err_init_dss2;
+
+    callbacks->suspend = libxl__colo_restore_domain_suspend_callback;
+    callbacks->postcopy = libxl__colo_restore_domain_resume_callback;
+    callbacks->checkpoint = libxl__colo_restore_domain_checkpoint_callback;
+
+    /*
+     * Secondary vm is running in colo mode, so we need to call
+     * libxl__xc_domain_restore_done() to create secondary vm.
+     * But we will exit in domain_create_cb(). So replace the
+     * callback here.
+     */
+    crs->saved_cb = dcs->callback;
+    dcs->callback = libxl__colo_domain_create_cb;
+    crcs->status = LIBXL_COLO_SETUPED;
+
+    logdirty_init(&crcs->lds);
+    crcs->lds.ao = ao;
+
+    rc = 0;
+
+out:
+    crs->callback(egc, crs, rc);
+    return;
+
+err_init_dss2:
+    xc_hypercall_buffer_free_pages(CTX->xch, dirty_bitmap, NRPAGES(bsize));
+    crcs->dirty_bitmap = NULL;
+err:
+    goto out;
+}
+
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid)
+{
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    crcs->callback(egc, crcs, rc);
+}
+
+
+/* ================ colo: teardown restore environment ================ */
+static void colo_restore_teardown_done(libxl__egc *egc,
+                                       libxl__checkpoint_devices_state *cds,
+                                       int rc);
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc);
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc);
+
+static void do_failover(libxl__egc *egc, libxl__colo_restore_state *crs)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const int status = crcs->status;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    switch(status) {
+    case LIBXL_COLO_SETUPED:
+        /* We don't enable logdirty now */
+        colo_resume_vm(egc, crcs, 0);
+        return;
+    case LIBXL_COLO_SUSPENDED:
+    case LIBXL_COLO_RESUMED:
+        /* disable logdirty first */
+        lds->callback = colo_disable_logdirty_done;
+        colo_disable_logdirty(crs, egc);
+        return;
+    default:
+        LOG(ERROR, "invalid status: %d", status);
+        crcs->callback(egc, crcs, ERROR_FAIL);
+    }
+}
+
+void libxl__colo_restore_teardown(libxl__egc *egc,
+                                  libxl__colo_restore_state *crs,
+                                  int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, crcs->dirty_bitmap);
+    int bsize = bitmap_size(crcs->p2m_size);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    EGC_GC;
+
+    if (!dirty_bitmap)
+        goto teardown_devices;
+
+    xc_hypercall_buffer_free_pages(CTX->xch, dirty_bitmap, NRPAGES(bsize));
+
+teardown_devices:
+    crcs->saved_rc = rc;
+    if (!crcs->teardown_devices) {
+        colo_restore_teardown_done(egc, &crs->cds, 0);
+        return;
+    }
+
+    crs->cds.callback = colo_restore_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &crs->cds);
+}
+
+static void colo_restore_teardown_done(libxl__egc *egc,
+                                       libxl__checkpoint_devices_state *cds,
+                                       int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    EGC_GC;
+
+    if (rc)
+        LOG(ERROR, "COLO: failed to teardown device for guest with domid %u,"
+            " rc %d", cds->domid, rc);
+
+    if (crcs->teardown_devices)
+        cleanup_device_subkind(cds);
+
+    rc = crcs->saved_rc;
+    if (!rc) {
+        crcs->callback = do_failover_done;
+        do_failover(egc, crs);
+        return;
+    }
+
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+    crs->callback(egc, crs, rc);
+}
+
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc)
+        LOG(ERROR, "cannot do failover");
+
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+
+    crs->callback(egc, crs, rc);
+}
+
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    STATE_AO_GC(lds->ao);
+
+    if (rc)
+        LOG(WARN, "cannot disable logdirty");
+
+    if (crcs->status == LIBXL_COLO_SUSPENDED) {
+        /*
+         * failover when reading state from master, so no need to
+         * call libxl__domain_restore().
+         */
+        colo_resume_vm(egc, crcs, 0);
+        return;
+    }
+
+    /* If we cannot disable logdirty, we still can do failover */
+    crcs->callback(egc, crcs, 0);
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. resume
+ * 2. checkpoint
+ * 3. suspend
+ */
+static void colo_common_read_send_data_done(libxl__egc *egc,
+                                            libxl__datacopier_state *dc,
+                                            int onwrite, int errnoval);
+/* ===================== colo: resume secondary vm ===================== */
+/*
+ * Do the following things when resuming secondary vm the first time:
+ *  1. resume secondary vm
+ *  2. enable log dirty
+ *  3. setup checkpoint devices
+ *  4. write LIBXL_COLO_SVM_READY
+ *  5. unpause secondary vm
+ *  6. write LIBXL_COLO_SVM_RESUMED
+ *
+ * Do the following things when resuming secondary vm:
+ *  1. write LIBXL_COLO_SVM_READY
+ *  2. resume secondary vm
+ *  3. write LIBXL_COLO_SVM_RESUMED
+ */
+static void colo_send_svm_ready(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs);
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc);
+static void colo_restore_preresume_cb(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+static void colo_restore_resume_vm(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc);
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int retval);
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc);
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc);
+static void colo_setup_checkpoint_devices(libxl__egc *egc,
+                                          libxl__colo_restore_state *crs);
+static void colo_restore_setup_cds_done(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc);
+static void colo_unpause_svm(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state *crcs);
+
+static void libxl__colo_restore_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    if (crcs->teardown_devices)
+        colo_send_svm_ready(shs->egc, crcs);
+    else
+        colo_restore_resume_vm(shs->egc, crcs);
+}
+
+static void colo_send_svm_ready(libxl__egc *egc,
+                               libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    uint8_t section = LIBXL_COLO_SVM_READY;
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    const int send_fd = crs->send_fd;
+    libxl__datacopier_state *const dc = &crcs->dc;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[1];
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    crcs->callback = colo_send_svm_ready_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell master that secondary vm is ready */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc)
+{
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &crcs->crs->cds;
+
+    if (!crcs->preresume) {
+        crcs->preresume = true;
+        colo_unpause_svm(egc, crcs);
+        return;
+    }
+
+    cds->callback = colo_restore_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
+}
+
+static void colo_restore_preresume_cb(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    colo_restore_resume_vm(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_restore_resume_vm(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+
+    crcs->callback = colo_resume_vm_done;
+    colo_resume_vm(egc, crcs, 1);
+}
+
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot resume secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_RESUMED;
+
+    /* avoid calling libxl__xc_domain_restore_done() more than once */
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+
+        lds->callback = colo_enable_logdirty_done;
+        colo_enable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    uint8_t section = LIBXL_COLO_SVM_RESUMED;
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    const int send_fd = crs->send_fd;
+    libxl__datacopier_state *const dc = &crcs->dc;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[2];
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    crcs->callback = NULL;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell master that secondary vm is resumed */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        /*
+         * log-dirty already enabled? There's no test op,
+         * so attempt to disable then reenable it
+         */
+        lds->callback = colo_reenable_logdirty;
+        colo_disable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_setup_checkpoint_devices(egc, crs);
+}
+
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    lds->callback = colo_reenable_logdirty_done;
+    colo_enable_logdirty(crs, egc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crcs->crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    colo_setup_checkpoint_devices(egc, crcs->crs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+/*
+ * We cannot setup checkpoint devices in libxl__colo_restore_setup(),
+ * because the guest is not ready.
+ */
+static void colo_setup_checkpoint_devices(libxl__egc *egc,
+                                          libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &crs->cds;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crs->ao);
+
+    /* TODO: disk/nic support */
+    cds->device_kind_flags = 0;
+    cds->callback = colo_restore_setup_cds_done;
+    cds->ao = ao;
+    cds->domid = crs->domid;
+    cds->ops = colo_restore_ops;
+
+    if (init_device_subkind(cds))
+        goto out;
+
+    crcs->teardown_devices = 1;
+
+    libxl__checkpoint_devices_setup(egc, cds);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_restore_setup_cds_done(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
+
+    colo_send_svm_ready(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_unpause_svm(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    const uint32_t domid = crcs->crs->domid;
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(crcs->crs->ao);
+
+    /* We have enabled secondary vm's logdirty, so we can unpause it now */
+    rc = libxl__domain_unpause(gc, domid);
+    if (rc) {
+        LOG(ERROR, "cannot unpause secondary vm");
+        goto out;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+
+/* ===================== colo: wait new checkpoint ===================== */
+static void colo_restore_commit_cb(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc);
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__colo_restore_checkpoint_state *crcs,
+                                  int real_size);
+
+static void libxl__colo_restore_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &dcs->crs.cds;
+
+    cds->callback = colo_restore_commit_cb;
+    libxl__checkpoint_devices_commit(shs->egc, cds);
+}
+
+static void colo_restore_commit_cb(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const int recv_fd = dcs->crs.recv_fd;
+    libxl__save_helper_state *const shs = &dcs->shs;
+    libxl__datacopier_state *const dc = &crcs->dc;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = recv_fd;
+    dc->writefd = -1;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[3];
+    dc->readwhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    dc->readbuf = &crcs->section;
+    dc->bytes_to_read = 1;
+    crcs->callback = colo_stream_read_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, 0);
+}
+
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__colo_restore_checkpoint_state *crcs,
+                                  int real_size)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok = 0;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    STATE_AO_GC(dcs->ao);
+
+    if (real_size != 1) {
+        LOG(ERROR, "reading data fails: %lld", (long long)real_size);
+        goto out;
+    }
+
+    if (crcs->section != LIBXL_COLO_NEW_CHECKPOINT) {
+        LOG(ERROR, "invalid section: %d", crcs->section);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
+}
+
+
+/* ===================== colo: suspend secondary vm ===================== */
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. suspend secondary vm
+ *  2. get secondary vm's dirty page information
+ *  3. send LIBXL_COLO_SVM_SUSPENDED
+ *  4. send secondary vm's dirty page information(count + pfn list)
+ */
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state2 *dss2,
+                                 int ok);
+static void colo_restore_postsuspend_cb(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc);
+static void colo_append_pfn_type(libxl__egc *egc,
+                                 libxl__datacopier_state *dc,
+                                 unsigned long *dirty_bitmap,
+                                 unsigned long p2m_size);
+
+static void libxl__colo_restore_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    STATE_AO_GC(dcs->ao);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *const dss2 = &crcs->dss2;
+
+    /* suspend secondary vm */
+    dss2->callback_common_done = colo_suspend_vm_done;
+
+    libxl__domain_suspend2(shs->egc, dss2);
+}
+
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state2 *dss2,
+                                 int ok)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(dss2, *crcs, dss2);
+    libxl__colo_restore_state *crs = crcs->crs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &crs->cds;
+
+    STATE_AO_GC(crs->ao);
+
+    if (!ok) {
+        LOG(ERROR, "cannot suspend secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_SUSPENDED;
+
+    cds->callback = colo_restore_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+
+    return;
+
+out:
+    ok = 0;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->shs, ok);
+}
+
+static void colo_restore_postsuspend_cb(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, crcs->dirty_bitmap);
+    uint8_t section = LIBXL_COLO_SVM_SUSPENDED;
+    int i, ok = 0;
+    uint64_t count;
+
+    /* Convenience aliases */
+    const int send_fd = crs->send_fd;
+    const unsigned long p2m_size = crcs->p2m_size;
+    const uint32_t domid = crs->domid;
+    libxl__datacopier_state *const dc = &crcs->dc;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    /*
+     * Secondary vm is running, so there are some dirty pages
+     * that are non-dirty in master. Get dirty bitmap and
+     * send it to master.
+     */
+    if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                          HYPERCALL_BUFFER(dirty_bitmap), p2m_size,
+                          NULL, 0, NULL) != p2m_size) {
+        LOG(ERROR, "getting secondary vm's dirty bitmap fails");
+        goto out;
+    }
+
+    count = 0;
+    for (i = 0; i < p2m_size; i++) {
+        if (test_bit(i, dirty_bitmap))
+            count++;
+    }
+
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = crcs->copywhat[0];
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    crcs->callback = NULL;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell master that secondary vm is suspended */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+
+    /* send dirty pages to master */
+    libxl__datacopier_prefixdata(egc, dc, &count, sizeof(count));
+    colo_append_pfn_type(egc, dc, dirty_bitmap, p2m_size);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->shs, ok);
+}
+
+static void colo_append_pfn_type(libxl__egc *egc,
+                                 libxl__datacopier_state *dc,
+                                 unsigned long *dirty_bitmap,
+                                 unsigned long p2m_size)
+{
+    int i, count;
+    /* Hack, buf->buf is private member... */
+    libxl__datacopier_buf *buf = NULL;
+    int max_batch = sizeof(buf->buf) / sizeof(uint64_t);
+    int buf_size = max_batch * sizeof(uint64_t);
+    uint64_t *pfn;
+
+    STATE_AO_GC(dc->ao);
+
+    pfn = libxl__zalloc(NOGC, buf_size);
+
+    count = 0;
+    for (i = 0; i < p2m_size; i++) {
+        if (!test_bit(i, dirty_bitmap))
+            continue;
+
+        pfn[count++] = i;
+        if (count == max_batch) {
+            libxl__datacopier_prefixdata(egc, dc, pfn, buf_size);
+            count = 0;
+        }
+    }
+
+    if (count)
+        libxl__datacopier_prefixdata(egc, dc, pfn, count * sizeof(uint64_t));
+
+    free(pfn);
+}
+
+
+/* ===================== colo: common callback ===================== */
+static void colo_common_read_send_data_done(libxl__egc *egc,
+                                            libxl__datacopier_state *dc,
+                                            int onwrite, int errnoval)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(dc, *crcs, dc);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok;
+    STATE_AO_GC(dc->ao);
+
+    if (onwrite == -1) {
+        LOG(ERROR, "reading/sending data fails");
+        ok = 0;
+        goto out;
+    }
+
+    if (errnoval < 0 || (onwrite == 1 && errnoval)) {
+        /* failure happens when reading/writing, do failover? */
+        ok = 2;
+        goto out;
+    }
+
+    if (!crcs->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    if (onwrite == 0)
+        crcs->callback(egc, crcs, dc->used);
+    else
+        crcs->callback(egc, crcs, 0);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->shs, ok);
+}
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 392420f..ba6e1fe 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -19,6 +19,7 @@
 
 #include "libxl_internal.h"
 #include "libxl_arch.h"
+#include "libxl_colo.h"
 
 #include <xc_dom.h>
 #include <xenguest.h>
@@ -956,6 +957,96 @@ static void domcreate_console_available(libxl__egc *egc,
                                         dcs->aop_console_how.for_event));
 }
 
+static void libxl__colo_restore_teardown_done(libxl__egc *egc,
+                                              libxl__colo_restore_state *crs,
+                                              int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    STATE_AO_GC(crs->ao);
+
+    /* convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+    const int domid = crs->domid;
+    const libxl_ctx *const ctx = libxl__gc_owner(gc);
+    xc_interface *const xch = ctx->xch;
+
+    if (!rc)
+        /* failover, no need to destroy the secondary vm */
+        goto out;
+
+    if (shs->retval)
+        /*
+         * shs->retval stores the return value of xc_domain_restore().
+         * If it is not 0, we have destroyed the secondary vm in
+         * xc_domain_restore();
+         */
+        goto out;
+
+    xc_domain_destroy(xch, domid);
+
+out:
+    dcs->callback(egc, dcs, rc, crs->domid);
+}
+
+void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+                              int ret, int retval, int errnoval)
+{
+    libxl__domain_create_state *dcs = dcs_void;
+    int rc = 1;
+
+    /* convenience aliases */
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    STATE_AO_GC(crs->ao);
+
+    /* teardown and failover */
+    crs->callback = libxl__colo_restore_teardown_done;
+
+    if (ret == 0 && retval == 0)
+        rc = 0;
+
+    LOG(INFO, "%s", rc ? "colo fails" : "failover");
+    libxl__colo_restore_teardown(egc, crs, rc);
+}
+
+static void libxl__colo_restore_cp_done(libxl__egc *egc,
+                                        libxl__colo_restore_state *crs,
+                                        int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    int ok = 0;
+
+    /* convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->shs;
+
+    if (!rc)
+        ok = 1;
+
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, ok);
+}
+
+static void libxl__colo_restore_setup_done(libxl__egc *egc,
+                                           libxl__colo_restore_state *crs,
+                                           int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    /* convenience aliases */
+    const int hvm = crs->hvm;
+    const int superpages = crs->superpages;
+    const int pae = crs->pae;
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "colo restore setup fails: %d", rc);
+        libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+        return;
+    }
+
+    crs->callback = libxl__colo_restore_cp_done;
+    libxl__xc_domain_restore(egc, dcs,
+                             hvm, pae, superpages);
+}
+
 static void domcreate_bootloader_done(libxl__egc *egc,
                                       libxl__bootloader_state *bl,
                                       int rc)
@@ -971,6 +1062,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->shs.callbacks.restore.a;
+    const int checkpointed_stream = dcs->checkpointed_stream;
+    libxl__colo_restore_state *const crs = &dcs->crs;
 
     if (rc) {
         domcreate_rebuild_done(egc, dcs, rc);
@@ -999,6 +1092,13 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 
     /* Restore */
 
+    /* COLO only supports HVM now */
+    if (info->type != LIBXL_DOMAIN_TYPE_HVM &&
+        checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     rc = libxl__build_pre(gc, domid, d_config, state);
     if (rc)
         goto out;
@@ -1021,8 +1121,20 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
-    libxl__xc_domain_restore(egc, dcs,
-                             hvm, pae, superpages);
+
+    if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        crs->ao = ao;
+        crs->domid = domid;
+        crs->send_fd = dcs->send_fd;
+        crs->recv_fd = restore_fd;
+        crs->hvm = hvm;
+        crs->superpages = superpages;
+        crs->pae = pae;
+        crs->callback = libxl__colo_restore_setup_done;
+        libxl__colo_restore_setup(egc, crs);
+    } else
+        libxl__xc_domain_restore(egc, dcs,
+                                 hvm, pae, superpages);
     return;
 
  out:
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e09a1eb..0eb0f8c 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1114,7 +1114,7 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
 static void switch_logdirty_done(libxl__egc *egc,
                                  libxl__logdirty_switch *lds, int ok);
 
-static void logdirty_init(libxl__logdirty_switch *lds)
+void logdirty_init(libxl__logdirty_switch *lds)
 {
     lds->cmd_path = 0;
     libxl__ev_xswatch_init(&lds->watch);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 971d975..686a39e 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2843,6 +2843,7 @@ struct libxl__logdirty_switch {
     libxl__ev_xswatch watch;
     libxl__ev_time timeout;
 };
+_hidden void logdirty_init(libxl__logdirty_switch *lds);
 
 /*
  * libxl__domain_suspend_state is for saving guest, not
@@ -3135,6 +3136,27 @@ typedef void libxl__domain_create_cb(libxl__egc *egc,
                                      libxl__domain_create_state*,
                                      int rc, uint32_t domid);
 
+/* colo related structure */
+typedef struct libxl__colo_restore_state libxl__colo_restore_state;
+typedef void libxl__colo_callback(libxl__egc *,
+                                  libxl__colo_restore_state *, int rc);
+struct libxl__colo_restore_state {
+    /* must set by caller of libxl__colo_(setup|teardown) */
+    libxl__ao *ao;
+    uint32_t domid;
+    int send_fd;
+    int recv_fd;
+    int hvm;
+    int pae;
+    int superpages;
+    libxl__colo_callback *callback;
+
+    /* private, colo restore checkpoint state */
+    libxl__domain_create_cb *saved_cb;
+    void *crcs;
+    libxl__checkpoint_devices_state cds;
+};
+
 struct libxl__domain_create_state {
     /* filled in by user */
     libxl__ao *ao;
@@ -3148,6 +3170,7 @@ struct libxl__domain_create_state {
     int guest_domid;
     int checkpointed_stream;
     libxl__domain_build_state build_state;
+    libxl__colo_restore_state crs;
     libxl__bootloader_state bl;
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 477e633..9082b66 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -15,6 +15,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_colo.h"
 
 /* stream_fd is as from the caller (eventually, the application).
  * It may be 0, 1 or 2, in which case we need to dup it elsewhere.
@@ -65,7 +66,10 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     dcs->shs.ao = ao;
     dcs->shs.domid = domid;
     dcs->shs.recv_callback = libxl__srm_callout_received_restore;
-    dcs->shs.completion_callback = libxl__xc_domain_restore_done;
+    if (dcs->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
+        dcs->shs.completion_callback = libxl__colo_restore_done;
+    else
+        dcs->shs.completion_callback = libxl__xc_domain_restore_done;
     dcs->shs.caller_state = dcs;
     dcs->shs.need_results = 1;
     dcs->shs.toolstack_data_file = 0;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 41ee000..0239cac 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -24,9 +24,9 @@ our @msgs = (
                                                  STRING doing_what),
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
-    [  3, 'scxA',   "suspend", [] ],
-    [  4, 'scxA',   "postcopy", [] ],
-    [  5, 'scxA',   "checkpoint", [] ],
+    [  3, 'srcxA',   "suspend", [] ],
+    [  4, 'srcxA',   "postcopy", [] ],
+    [  5, 'srcxA',   "checkpoint", [] ],
     [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
     #                toolstack_save          done entirely `by hand'
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 16/29] primary vm suspend/get_dirty_pfn/resume/checkpoint code
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (14 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 15/29] secondary vm suspend/resume/checkpoint code Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 17/29] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Yang Hongyang
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read LIBXL_COLO_SVM_SUSPENDED to master
   d. Read secondary vm's dirty page information to master(count + pfn list)
2. Get dirty pfn list
   a. Return secondary vm's dirty pfn list
3. Resume primary vm
   a. Read LIBXL_COLO_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read LIBXL_COLO_SVM_RESUMED from slave
4. Wait a new checkpoint
    a. Wait a new checkpoint(not implemented)
    b. Send LIBXL_COLO_NEW_CHECKPOINT to slave

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     |  12 +
 tools/libxl/Makefile               |   2 +-
 tools/libxl/libxl.c                |   6 +-
 tools/libxl/libxl_colo.h           |  10 +
 tools/libxl/libxl_colo_save.c      | 642 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c            |  13 +-
 tools/libxl/libxl_internal.h       |  31 +-
 tools/libxl/libxl_save_msgs_gen.pl |   1 +
 tools/libxl/libxl_types.idl        |   1 +
 9 files changed, 710 insertions(+), 8 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_save.c

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6e621a6..266d96b 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -74,6 +74,18 @@ struct save_callbacks {
      */
     int (*toolstack_save)(uint32_t domid, uint8_t **buf, uint32_t *len, void *data);
 
+    /* Called after the guest is suspended.
+     *
+     * returns the list of dirty pfn:
+     *  struct {
+     *      uint64_t count;
+     *      uint64_t pfn[];
+     *  };
+     *
+     *  Note: the caller must free the return value.
+     */
+    uint8_t *(*get_dirty_pfn)(void *data);
+
     /* to be provided as the last argument to each callback function */
     void* data;
 };
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 8acfd5d..b2eaf14 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -57,7 +57,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
-LIBXL_OBJS-y += libxl_colo_restore.o
+LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 40a49c7..b6c5429 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -18,6 +18,7 @@
 
 #include "libxl_internal.h"
 #include "libxl_remus.h"
+#include "libxl_colo.h"
 
 #define PAGE_TO_MEMKB(pages) ((pages) * 4)
 #define BACKEND_STRING_SIZE 5
@@ -892,7 +893,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Point of no return */
-    libxl__remus_setup(egc, &dss->rs);
+    if (libxl_defbool_val(info->colo))
+        libxl__colo_save_setup(egc, &dss->css);
+    else
+        libxl__remus_setup(egc, &dss->rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 91df275..26a2563 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -35,4 +35,14 @@ extern void libxl__colo_restore_teardown(libxl__egc *egc,
                                          libxl__colo_restore_state *crs,
                                          int rc);
 
+extern void libxl__colo_save_domain_suspend_callback(void *data);
+extern void libxl__colo_save_domain_resume_callback(void *data);
+extern void libxl__colo_save_domain_checkpoint_callback(void *data);
+extern void libxl__colo_save_get_dirty_pfn_callback(void *data);
+extern void libxl__colo_save_setup(libxl__egc *egc,
+                                   libxl__colo_save_state *css);
+extern void libxl__colo_save_teardown(libxl__egc *egc,
+                                      libxl__colo_save_state *css,
+                                      int rc);
+
 #endif
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
new file mode 100644
index 0000000..bb5b434
--- /dev/null
+++ b/tools/libxl/libxl_colo_save.c
@@ -0,0 +1,642 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+
+static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    NULL,
+};
+
+/* ================= helper functions ================= */
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    rc = 0;
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+}
+
+/* ================= colo: setup save environment ================= */
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_devices_state *cds,
+                                 int rc);
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc);
+
+void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &css->cds;
+
+    STATE_AO_GC(dss->ao);
+
+    if (dss->type != LIBXL_DOMAIN_TYPE_HVM) {
+        LOG(ERROR, "COLO only supports hvm now");
+        goto out;
+    }
+
+    css->send_fd = dss->fd;
+    css->recv_fd = dss->recv_fd;
+    css->svm_running = false;
+
+    /* TODO: disk/nic support */
+    cds->device_kind_flags = 0;
+    cds->ops = colo_ops;
+    cds->callback = colo_save_setup_done;
+    cds->ao = ao;
+    cds->domid = dss->domid;
+
+    if (init_device_subkind(cds))
+        goto out;
+
+    libxl__checkpoint_devices_setup(egc, &css->cds);
+
+    return;
+
+out:
+    libxl__ao_complete(egc, ao, ERROR_FAIL);
+}
+
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_devices_state *cds,
+                                 int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    STATE_AO_GC(cds->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+        dss->domid);
+    css->cds.callback = colo_save_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, &css->cds);
+}
+
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc)
+{
+    STATE_AO_GC(cds->ao);
+
+    if (rc)
+        LOG(ERROR, "COLO: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", cds->domid, rc);
+
+    cleanup_device_subkind(cds);
+    libxl__ao_complete(egc, ao, rc);
+}
+
+
+/* ================= colo: teardown save environment ================= */
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_devices_state *cds,
+                               int rc);
+
+void libxl__colo_save_teardown(libxl__egc *egc,
+                               libxl__colo_save_state *css,
+                               int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    LOG(WARN, "COLO: Domain suspend terminated with rc %d,"
+        " teardown COLO devices...", rc);
+    dss->css.cds.callback = colo_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->css.cds);
+    return;
+}
+
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_devices_state *cds,
+                               int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    cleanup_device_subkind(cds);
+    dss->callback(egc, dss, rc);
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. suspend
+ * 2. resume
+ * 3. checkpoint
+ */
+static void colo_common_read_send_data_done(libxl__egc *egc,
+                                            libxl__datacopier_state *dc,
+                                            int onwrite, int errnoval);
+/* ===================== colo: suspend primary vm ===================== */
+/*
+ * Do the following things when suspending primary vm:
+ * 1. suspend primary vm
+ * 2. do postsuspend
+ * 3. read LIBXL_COLO_SVM_SUSPENDED
+ * 4. read secondary vm's dirty pages
+ */
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state2 *dss2,
+                                         int ok);
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds,
+                                int rc);
+static void colo_read_pfn(libxl__egc *egc, libxl__colo_save_state *css);
+
+void libxl__colo_save_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state2 *dss2 = &dss->dss2;
+
+    dss2->callback_common_done = colo_suspend_primary_vm_done;
+    libxl__domain_suspend2(egc, dss2);
+}
+
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state2 *dss2,
+                                         int ok)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(dss2, *dss, dss2);
+
+    STATE_AO_GC(dss2->ao);
+
+    if (!ok) {
+        LOG(ERROR, "cannot suspend primary vm");
+        goto out;
+    }
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->css.cds;
+
+    cds->callback = colo_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds,
+                                int rc)
+{
+    int ok = 0;
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__datacopier_state *const dc = &css->dc;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    if (!css->svm_running) {
+        ok = 1;
+        goto out;
+    }
+
+    /*
+     * read LIBXL_COLO_SVM_SUSPENDED and the count of
+     * secondary vm's dirty pages.
+     */
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = css->recv_fd;
+    dc->writefd = -1;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = "secondary vm is suspended";
+    dc->readwhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    dc->readbuf = css->temp_buff;
+    dc->bytes_to_read = sizeof(css->temp_buff);
+    css->callback = colo_read_pfn;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void colo_read_pfn(libxl__egc *egc, libxl__colo_save_state *css)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    int rc;
+
+    STATE_AO_GC(css->cds.ao);
+
+    /* Convenience aliases */
+    libxl__datacopier_state *const dc = &css->dc;
+
+    assert(!css->buff);
+    css->section = css->temp_buff[0];
+    css->count = *(uint64_t *)(&css->temp_buff[1]);
+
+    if (css->section != LIBXL_COLO_SVM_SUSPENDED) {
+        LOG(ERROR, "invalid section: %d, expected: %d",
+            css->section, LIBXL_COLO_SVM_SUSPENDED);
+        goto out;
+    }
+
+    css->buff = libxl__zalloc(NOGC, sizeof(uint64_t) * (css->count + 1));
+    css->buff[0] = css->count;
+
+    if (css->count == 0) {
+        /* no dirty pages */
+        ok = 1;
+        goto out;
+    }
+
+    /* read the pfn of secondary vm's dirty pages */
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = css->recv_fd;
+    dc->writefd = -1;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = "secondary vm's dirty bitmap";
+    dc->readwhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    dc->readbuf = css->buff + 1;
+    dc->bytes_to_read = css->count * sizeof(uint64_t);
+    css->callback = NULL;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/* ===================== colo: get dirty pfn ===================== */
+void libxl__colo_save_get_dirty_pfn_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    uint64_t size;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    assert(css->buff);
+    size = sizeof(uint64_t) * (css->count + 1);
+
+    libxl__xc_domain_saverestore_async_callback_done_with_data(egc, shs,
+                                                               (uint8_t *)css->buff,
+                                                               size);
+    free(css->buff);
+    css->buff = NULL;
+}
+
+
+/* ===================== colo: resume primary vm ===================== */
+/*
+ * Do the following things when resuming primary vm:
+ *  1. read LIBXL_COLO_SVM_READY
+ *  2. do preresume
+ *  3. resume primary vm
+ *  4. read LIBXL_COLO_SVM_RESUMED
+ */
+static void colo_preresume_dm_saved(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss, int rc);
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css);
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_devices_state *cds,
+                              int rc);
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css);
+
+void libxl__colo_save_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+
+    /* This would go into tailbuf. */
+    if (dss->hvm) {
+        libxl__domain_save_device_model(egc, dss, colo_preresume_dm_saved);
+    } else {
+        colo_preresume_dm_saved(egc, dss, 0);
+    }
+
+    return;
+}
+
+static void colo_preresume_dm_saved(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss, int rc)
+{
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+    libxl__datacopier_state *const dc = &css->dc;
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating COLO..");
+        goto out;
+    }
+
+    /* read LIBXL_COLO_SVM_READY */
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = css->recv_fd;
+    dc->writefd = -1;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = "secondary vm is ready";
+    dc->readwhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    dc->readbuf = &css->section;
+    dc->bytes_to_read = sizeof(css->section);
+    css->callback = colo_read_svm_ready_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (css->section != LIBXL_COLO_SVM_READY) {
+        LOG(ERROR, "invalid section: %d, expected: %d",
+            css->section, LIBXL_COLO_SVM_READY);
+        goto out;
+    }
+
+    css->svm_running = true;
+    css->cds.callback = colo_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, &css->cds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_devices_state *cds,
+                              int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__datacopier_state *const dc = &css->dc;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    /* Resumes the domain and the device model */
+    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) {
+        LOG(ERROR, "cannot resume primary vm");
+        goto out;
+    }
+
+    /* read LIBXL_COLO_SVM_RESUMED */
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = css->recv_fd;
+    dc->writefd = -1;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = "secondary vm is resumed";
+    dc->readwhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    dc->readbuf = &css->section;
+    dc->bytes_to_read = sizeof(css->section);
+    css->callback = colo_read_svm_resumed_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (css->section != LIBXL_COLO_SVM_RESUMED) {
+        LOG(ERROR, "invalid section: %d, expected: %d",
+            css->section, LIBXL_COLO_SVM_RESUMED);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+
+/* ===================== colo: wait new checkpoint ===================== */
+/*
+ * Do the following things:
+ * 1. do commit
+ * 2. wait for a new checkpoint
+ * 3. write LIBXL_COLO_NEW_CHECKPOINT
+ */
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_devices_state *cds,
+                                  int rc);
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+
+void libxl__colo_save_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    libxl__egc *egc = dss->shs.egc;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->css.cds;
+
+    cds->callback = colo_device_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
+}
+
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_devices_state *cds,
+                                  int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    /* TODO: wait a new checkpoint */
+    colo_start_new_checkpoint(egc, cds, 0);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+    uint8_t section = LIBXL_COLO_NEW_CHECKPOINT;
+
+    /* Convenience aliases */
+    libxl__datacopier_state *const dc = &css->dc;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc)
+        goto out;
+
+    /* write LIBXL_COLO_NEW_CHECKPOINT */
+    memset(dc, 0, sizeof(*dc));
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = css->send_fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = "new checkpoint is triggered";
+    dc->writewhat = "colo stream";
+    dc->callback = colo_common_read_send_data_done;
+    css->callback = NULL;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc) {
+        LOG(ERROR, "libxl__datacopier_start() fails");
+        goto out;
+    }
+
+    /* tell slave that a new checkpoint is triggered */
+    libxl__datacopier_prefixdata(egc, dc, &section, sizeof(section));
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+
+/* ===================== colo: common callback ===================== */
+static void colo_common_read_send_data_done(libxl__egc *egc,
+                                            libxl__datacopier_state *dc,
+                                            int onwrite, int errnoval)
+{
+    int ok = 0;
+    libxl__colo_save_state *css = CONTAINER_OF(dc, *css, dc);
+    libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(dc->ao);
+
+    if (onwrite == -1) {
+        LOG(ERROR, "reading/sending data fails");
+        ok = 0;
+        goto out;
+    }
+
+    if (errnoval < 0 || (onwrite == 1 && errnoval)) {
+        /* failure happens when reading/writing, do failover? */
+        ok = 2;
+        goto out;
+    }
+
+    if (dc->bytes_to_read != 0) {
+        /* EOF is read */
+        LOG(ERROR, "reading EOF unexpectedly");
+        ok = 0;
+        goto out;
+    }
+
+    if (!css->callback) {
+        /* Everything is OK */
+        ok = 1;
+        goto out;
+    }
+
+    if (onwrite == 0)
+        css->callback(egc, css);
+    else
+        css->callback(egc, css);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 0eb0f8c..6e1f90b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -20,6 +20,7 @@
 #include "libxl_internal.h"
 #include "libxl_arch.h"
 #include "libxl_remus.h"
+#include "libxl_colo.h"
 
 #include <xc_dom.h>
 #include <xen/hvm/hvm_info_table.h>
@@ -1887,7 +1888,12 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     }
 
     memset(callbacks, 0, sizeof(*callbacks));
-    if (r_info != NULL) {
+    if (r_info != NULL && libxl_defbool_val(r_info->colo)) {
+        callbacks->suspend = libxl__colo_save_domain_suspend_callback;
+        callbacks->postcopy = libxl__colo_save_domain_resume_callback;
+        callbacks->checkpoint = libxl__colo_save_domain_checkpoint_callback;
+        callbacks->get_dirty_pfn = libxl__colo_save_get_dirty_pfn_callback;
+    } else if (r_info != NULL) {
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
@@ -2049,7 +2055,10 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss2->guest_evtchn.port, &dss2->guest_evtchn_lockfd);
 
-    if (dss->remus) {
+    if (dss->remus && libxl_defbool_val(dss->remus->colo)) {
+        libxl__colo_save_teardown(egc, &dss->css, rc);
+        return;
+    } else if (dss->remus) {
         libxl__remus_teardown(egc, &dss->rs, rc);
         return;
     }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 686a39e..7065f71 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2632,7 +2632,7 @@ typedef struct libxl__save_helper_state {
 /*
  * The abstract checkpoint device layer exposes a common
  * set of API to [external] libxl for manipulating devices attached to
- * a guest protected by Remus. The device layer also exposes a set of
+ * a guest protected by Remus/COLO. The device layer also exposes a set of
  * [internal] interfaces that every device type must implement.
  *
  * The following API are exposed to libxl:
@@ -2650,7 +2650,7 @@ typedef struct libxl__save_helper_state {
  *  +libxl__checkpoint_devices_commit
  *
  * Each device type needs to implement the interfaces specified in
- * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
+ * the libxl__checkpoint_device_instance_ops if it wishes to support Remus/COLO.
  *
  * The high-level control flow through the checkpoint device layer is shown
  * below:
@@ -2670,7 +2670,7 @@ typedef struct libxl__checkpoint_device_instance_ops libxl__checkpoint_device_in
 
 /*
  * Interfaces to be implemented by every device subkind that wishes to
- * support Remus. Functions must be implemented unless otherwise
+ * support Remus/COLO. Functions must be implemented unless otherwise
  * stated. Many of these functions are asynchronous. They call
  * dev->aodev.callback when done.  The actual implementations may be
  * synchronous and call dev->aodev.callback directly (as the last
@@ -2821,6 +2821,24 @@ struct libxl__remus_state {
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
+/*----- colo related state structure -----*/
+typedef struct libxl__colo_save_state libxl__colo_save_state;
+struct libxl__colo_save_state {
+    libxl__checkpoint_devices_state cds;
+    int send_fd;
+    int recv_fd;
+
+    /* private */
+    libxl__datacopier_state dc;
+    uint8_t section;
+    uint64_t count;
+    uint64_t *buff;
+    /* read section and count, and then store it in temp_buff */
+    uint8_t temp_buff[9];
+    void (*callback)(libxl__egc *, libxl__colo_save_state *);
+    bool svm_running;
+};
+
 /*----- Domain suspend (save) state structure -----*/
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
@@ -2884,7 +2902,12 @@ struct libxl__domain_suspend_state {
     libxl__domain_suspend_state2 dss2;
     int hvm;
     int xcflags;
-    libxl__remus_state rs;
+    union {
+        /* for Remus */
+        libxl__remus_state rs;
+        /* for COLO */
+        libxl__colo_save_state css;
+    };
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 0239cac..fbb2d67 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -36,6 +36,7 @@ our @msgs = (
                                               'unsigned long', 'console_mfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
+    [ 10, 'scxAB',  "get_dirty_pfn", [] ],
 );
 
 #----------------------------------------
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 292d754..e204c58 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -675,6 +675,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("netbuf",       libxl_defbool),
     ("netbufscript", string),
     ("diskbuf",      libxl_defbool),
+    ("colo",         libxl_defbool)
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 17/29] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (15 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 16/29] primary vm suspend/get_dirty_pfn/resume/checkpoint code Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 18/29] COLO: xc related codes Yang Hongyang
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

In colo mode, secondary vm is running. We will use the io_fd to
ensure that both primary vm and secondary vm are resumed
at the same time. So we should call postcopy later.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_save.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index cef6995..045a050 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -2082,10 +2082,15 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
  out_rc:
     completed = 1;
 
-    if ( !rc && callbacks->postcopy )
+    /*
+     * COLO: secondary vm is running. We will use the io_fd to
+     * ensure that both primary vm and secondary vm are resumed
+     * at the same time. So we should call postcopy later.
+     */
+    if ( !rc && callbacks->postcopy && !callbacks->get_dirty_pfn )
         callbacks->postcopy(callbacks->data);
 
-    /* guest has been resumed. Now we can compress data
+    /* Remus: guest has been resumed. Now we can compress data
      * at our own pace.
      */
     if (!rc && compressing)
@@ -2113,6 +2118,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
 
     discard_file_cache(xch, io_fd, 1 /* flush */);
 
+    /*
+     * COLO: send qemu device state and resume both
+     * primary vm and secondary vm now.
+     */
+    if ( !rc && callbacks->postcopy && callbacks->get_dirty_pfn )
+        callbacks->postcopy(callbacks->data);
+
     /* Enable compression now, finally */
     compressing = (flags & XCFLAGS_CHECKPOINT_COMPRESS);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 18/29] COLO: xc related codes
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (16 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 17/29] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 19/29] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Save:
1. send XC_SAVE_ID_LAST_CHECKPOINT, so secondary vm can be resumed
2. call callbacks->get_dirty_pfn() after suspend primary vm if we
   are doing checkpoint.

Restore:
1. call the callbacks resume/checkpoint/suspend if secondary vm's
   status is the same as primary vm's status.
2. zero out tdata because we will use it zero out pagebuf.tdata.
3. don't apply the secondary vm's state when we failed to get new
   secondary vm's state, because we have applied it every checkpoint.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/xc_domain_restore.c | 82 +++++++++++++++++++++++++++++++++++------
 tools/libxc/xc_domain_save.c    | 57 +++++++++++++++++++++++++++-
 2 files changed, 125 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index a382701..5cad21c 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1454,7 +1454,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     int nraces = 0;
 
     /* The new domain's shared-info frame number. */
-    unsigned long shared_info_frame;
+    unsigned long shared_info_frame = 0;
     unsigned char shared_info_page[PAGE_SIZE]; /* saved contents from file */
     shared_info_any_t *old_shared_info = 
         (shared_info_any_t *)shared_info_page;
@@ -1504,6 +1504,8 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
 
     DPRINTF("%s: starting restore of new domid %u", __func__, dom);
 
+    n = m = 0;
+
     pagebuf_init(&pagebuf);
     memset(&tailbuf, 0, sizeof(tailbuf));
     tailbuf.ishvm = hvm;
@@ -1629,7 +1631,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
      * We uncanonicalise page tables as we go.
      */
 
-    n = m = 0;
  loadpages:
     for ( ; ; )
     {
@@ -1793,26 +1794,45 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
         goto finish;
     }
 
+new_checkpoint:
     // DPRINTF("Buffered checkpoint\n");
 
     if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
         PERROR("error when buffering batch, finishing");
-        /*
-         * Remus: discard the current incomplete checkpoint and restore
-         * backup from the last complete checkpoint.
-         */
-        goto finish;
+        if ( callbacks && callbacks->checkpoint )
+        {
+            /* COLO: discard the current incomplete checkpoint */
+            rc = 0;
+            goto failover;
+        }
+        else
+        {
+            /*
+             * Remus: discard the current incomplete checkpoint and restore
+             * backup from the last complete checkpoint.
+             */
+            goto finish;
+        }
     }
     memset(&tmptail, 0, sizeof(tmptail));
     tmptail.ishvm = hvm;
     if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
                      ext_vcpucontext, vcpuextstate_size) < 0 ) {
         ERROR ("error buffering image tail, finishing");
-        /*
-         * Remus: discard the current incomplete checkpoint and restore
-         * backup from the last complete checkpoint.
-         */
-        goto finish;
+        if ( callbacks && callbacks->checkpoint )
+        {
+            /* COLO: discard the current incomplete checkpoint */
+            rc = 0;
+            goto failover;
+        }
+        else
+        {
+            /*
+             * Remus: discard the current incomplete checkpoint and restore
+             * backup from the last complete checkpoint.
+             */
+            goto finish;
+        }
     }
     tailbuf_free(&tailbuf);
     memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
@@ -2301,6 +2321,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
             free(tdata.data);
             goto out;
         }
+        memset(&tdata, 0, sizeof(tdata));
     }
 
     /* Dump the QEMU state to a state file for QEMU to load */
@@ -2368,6 +2389,43 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     rc = 0;
 
  out:
+    if ( !rc && callbacks && callbacks->checkpoint )
+    {
+#define HANDLE_CALLBACK_RETURN_VALUE(frc)                   \
+    do {                                                    \
+        if ( frc == 0 )                                     \
+        {                                                   \
+            /* Some internal error happens */               \
+            rc = 1;                                         \
+            goto out;                                       \
+        }                                                   \
+        else if ( frc == 2 )                                \
+        {                                                   \
+            /* Reading/writing error, do failover */        \
+            rc = 0;                                         \
+            goto failover;                                  \
+        }                                                   \
+    } while (0)
+        /* COLO */
+
+        /* TODO: call restore_results */
+
+        /* Resume secondary vm */
+        frc = callbacks->postcopy(callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(frc);
+
+        /* wait for new checkpoint */
+        frc = callbacks->checkpoint(callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(frc);
+
+        /* suspend secondary vm */
+        frc = callbacks->suspend(callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(frc);
+
+        goto new_checkpoint;
+    }
+
+failover:
     if ( (rc != 0) && (dom != 0) )
         xc_domain_destroy(xch, dom);
     xc_hypercall_buffer_free(xch, ctxt);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 045a050..07a6560 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -377,6 +377,31 @@ static int suspend_and_state(int (*suspend)(void*), void* data,
     return 0;
 }
 
+static int update_dirty_bitmap(uint8_t *(*get_dirty_pfn)(void *), void *data,
+                               unsigned long p2m_size, unsigned long *to_send)
+{
+    uint64_t *pfn_list;
+    uint64_t count, i;
+    uint64_t pfn;
+
+    pfn_list = (uint64_t *)get_dirty_pfn(data);
+    assert(pfn_list);
+
+    count = pfn_list[0];
+    for (i = 0; i < count; i++) {
+        pfn = pfn_list[i + 1];
+        if (pfn > p2m_size) {
+            errno = EINVAL;
+            return -1;
+        }
+
+        set_bit(pfn, to_send);
+    }
+
+    free(pfn_list);
+    return 0;
+}
+
 /*
 ** Map the top-level page of MFNs from the guest. The guest might not have
 ** finished resuming from a previous restore operation, so we wait a while for
@@ -1773,11 +1798,14 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
         free(buf);
     }
 
-    if ( !callbacks->checkpoint )
+    if ( !callbacks->checkpoint || callbacks->get_dirty_pfn )
     {
         /*
          * If this is not a checkpointed save then this must be the first and
          * last checkpoint.
+         *
+         * If we are in colo mode, send last checkpoint to resume secondary
+         * vm.
          */
         i = XC_SAVE_ID_LAST_CHECKPOINT;
         if ( wrexact(io_fd, &i, sizeof(int)) )
@@ -2123,7 +2151,22 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
      * primary vm and secondary vm now.
      */
     if ( !rc && callbacks->postcopy && callbacks->get_dirty_pfn )
-        callbacks->postcopy(callbacks->data);
+    {
+        errno = 0;
+        if ( !callbacks->postcopy(callbacks->data) )
+        {
+            if (!errno)
+            {
+                errno = EIO;
+                ERROR("Postcopy request failed (without errno, using EINVAL)");
+            }
+            else
+            {
+                ERROR("Postcopy request failed");
+            }
+            rc = errno;
+        }
+    }
 
     /* Enable compression now, finally */
     compressing = (flags & XCFLAGS_CHECKPOINT_COMPRESS);
@@ -2152,6 +2195,16 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error flushing shadow PT");
         }
 
+        if ( callbacks->get_dirty_pfn )
+        {
+            if ( update_dirty_bitmap(callbacks->get_dirty_pfn, callbacks->data,
+                                     dinfo->p2m_size, to_send) )
+            {
+                ERROR("getting secondary vm's dirty pages failed");
+                goto out;
+            }
+        }
+
         goto copypages;
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 19/29] send store mfn and console mfn to xl before resuming secondary vm
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (17 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 18/29] COLO: xc related codes Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 20/29] implement the cmdline for COLO Yang Hongyang
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But
we need store mfn and console mfn when rebuilding secondary vm. So make
restore_results is a function pointers in callbacks struct and struct
{save,restore}_callbacks, and use this callback to send store mfn and
console mfn to xl.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     | 8 ++++++++
 tools/libxc/xc_domain_restore.c    | 2 +-
 tools/libxl/libxl_colo_restore.c   | 5 -----
 tools/libxl/libxl_create.c         | 1 +
 tools/libxl/libxl_save_msgs_gen.pl | 2 +-
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 266d96b..597f515 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -125,6 +125,14 @@ struct restore_callbacks {
     /* Enable qemu-dm logging dirty pages to xen */
     int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */
 
+    /*
+     * callback to send store mfn and console mfn to xl
+     * if we want to resume vm before xc_domain_save()
+     * exits.
+     */
+    void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn,
+                            void *data);
+
     /* callback to restore toolstack specific data */
     int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
             uint32_t size, void* data);
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 5cad21c..cc5c1ad 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -2408,7 +2408,7 @@ new_checkpoint:
     } while (0)
         /* COLO */
 
-        /* TODO: call restore_results */
+        callbacks->restore_results(*store_mfn, *console_mfn, callbacks->data);
 
         /* Resume secondary vm */
         frc = callbacks->postcopy(callbacks->data);
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 7b825d4..554474d 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -152,11 +152,6 @@ static void colo_resume_vm(libxl__egc *egc,
         return;
     }
 
-    /*
-     * TODO: get store mfn and console mfn
-     *  We should call the callback restore_results in
-     *  xc_domain_restore() before resuming the guest.
-     */
     libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
 
     return;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index ba6e1fe..89c18dc 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1121,6 +1121,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
+    callbacks->restore_results = libxl__srm_callout_callback_restore_results;
 
     if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
         crs->ao = ao;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index fbb2d67..2ecd25d 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -32,7 +32,7 @@ our @msgs = (
     #                toolstack_save          done entirely `by hand'
     [  7, 'rcxW',   "toolstack_restore",     [qw(uint32_t domid
                                                 BLOCK tsdata)] ],
-    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+    [  8, 'rcx',    "restore_results",       ['unsigned long', 'store_mfn',
                                               'unsigned long', 'console_mfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 20/29] implement the cmdline for COLO
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (18 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 19/29] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 21/29] tools: xc_doamin_restore: zero ioreq page only one time Yang Hongyang
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option -c to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.pod.1         | 12 +++++++++--
 tools/libxl/libxl.c       | 16 ++++++++++++++
 tools/libxl/xl_cmdimpl.c  | 53 +++++++++++++++++++++++++++++++++++++++--------
 tools/libxl/xl_cmdtable.c |  4 +++-
 4 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 16783c8..adcbe37 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -440,12 +440,15 @@ Print huge (!) amount of debug during the migration process.
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
 
-Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
-mechanism between the two hosts.
+Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
+transport mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
+     COLO support in xl is still in experimental (proof-of-concept) phase.
+     There is no support for network or disk at the moment.
+
 B<OPTIONS>
 
 =over 4
@@ -491,6 +494,11 @@ Disable network output buffering. Requires enabling unsafe mode.
 
 Disable disk replication. Requires enabling unsafe mode.
 
+=item B<-c>
+
+Enable COLO HA. It is conflict with B<-i> and B<-b>, and memory
+checkpoint compression must be disabled.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b6c5429..afe0cc9 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -862,6 +862,22 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    /* The caller must set this defbool */
+    if (libxl_defbool_is_default(info->colo)) {
+        LOG(ERROR, "colo mode must be enabled/disabled");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (libxl_defbool_val(info->colo)) {
+        libxl_defbool_setdefault(&info->compression, false);
+        if (libxl_defbool_val(info->compression)) {
+            LOG(ERROR, "cannot use memory checkpoint compression in COLO mode");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
     libxl_defbool_setdefault(&info->allow_unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 4574d05..6c5b792 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4250,6 +4250,9 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = remus;
+    if (remus == LIBXL_CHECKPOINTED_STREAM_COLO)
+        /* COLO uses stdout to send control message to master */
+        dom_info.quiet = 1;
 
     rc = create_domain(&dom_info);
     if (rc < 0) {
@@ -4264,7 +4267,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
         /* If we are here, it means that the sender (primary) has crashed.
          * TODO: Split-Brain Check.
          */
-        fprintf(stderr, "migration target: Remus Failover for domain %u\n",
+        fprintf(stderr, "migration target: %s Failover for domain %u\n",
+                remus == LIBXL_CHECKPOINTED_STREAM_COLO ? "COLO" : "Remus",
                 domid);
 
         /*
@@ -4281,15 +4285,21 @@ static void migrate_receive(int debug, int daemonize, int monitor,
             rc = libxl_domain_rename(ctx, domid, migration_domname,
                                      common_domname);
             if (rc)
-                fprintf(stderr, "migration target (Remus): "
+                fprintf(stderr, "migration target (%s): "
                         "Failed to rename domain from %s to %s:%d\n",
+                        remus == LIBXL_CHECKPOINTED_STREAM_COLO ? "COLO" : "Remus",
                         migration_domname, common_domname, rc);
         }
 
+        if (remus == LIBXL_CHECKPOINTED_STREAM_COLO)
+            /* The guest is running after failover in COLO mode */
+            exit(rc ? -ERROR_FAIL: 0);
+
         rc = libxl_domain_unpause(ctx, domid);
         if (rc)
-            fprintf(stderr, "migration target (Remus): "
+            fprintf(stderr, "migration target (%s): "
                     "Failed to unpause domain %s (id: %u):%d\n",
+                    remus == LIBXL_CHECKPOINTED_STREAM_COLO ? "COLO" : "Remus",
                     common_domname, domid, rc);
 
         exit(rc ? -ERROR_FAIL: 0);
@@ -4435,7 +4445,7 @@ int main_migrate_receive(int argc, char **argv)
     int debug = 0, daemonize = 1, monitor = 1, remus = 0;
     int opt;
 
-    SWITCH_FOREACH_OPT(opt, "Fedr", NULL, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrc", NULL, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -4447,8 +4457,10 @@ int main_migrate_receive(int argc, char **argv)
         debug = 1;
         break;
     case 'r':
-        remus = 1;
+        remus = LIBXL_CHECKPOINTED_STREAM_REMUS;
         break;
+    case 'c':
+        remus = LIBXL_CHECKPOINTED_STREAM_COLO;
     }
 
     if (argc-optind != 0) {
@@ -7892,15 +7904,18 @@ int main_remus(int argc, char **argv)
     pid_t child = -1;
     uint8_t *config_data;
     int config_len;
+    int interval = 0;
 
     memset(&r_info, 0, sizeof(libxl_domain_remus_info));
     /* Defaults */
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
+    libxl_defbool_setdefault(&r_info.colo, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:ec", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
+        interval = 1;
         break;
     case 'F':
         libxl_defbool_set(&r_info.allow_unsafe, true);
@@ -7926,11 +7941,28 @@ int main_remus(int argc, char **argv)
     case 'e':
         daemonize = 0;
         break;
+    case 'c':
+        libxl_defbool_set(&r_info.colo, true);
     }
 
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (libxl_defbool_val(r_info.colo)) {
+        if (!interval)
+            r_info.interval = 0;
+
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
+            perror("option -c is conflict with -i or -b");
+            exit(-1);
+        }
+
+        if (libxl_defbool_is_default(r_info.compression)) {
+            perror("option -u must be specified when using COLO");
+            exit(-1);
+        }
+    }
+
     if (!r_info.netbufscript)
         r_info.netbufscript = default_remus_netbufscript;
 
@@ -7945,8 +7977,9 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            if (asprintf(&rune, "exec %s %s xl migrate-receive -r %s",
+            if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
                          ssh_command, host,
+                         libxl_defbool_val(r_info.colo) ? "-c" : "-r",
                          daemonize ? "" : " -e") < 0)
                 return 1;
         }
@@ -7975,7 +8008,8 @@ int main_remus(int argc, char **argv)
      * domain to force failover
      */
     if (libxl_domain_info(ctx, 0, domid)) {
-        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        fprintf(stderr, "%s: Primary domain has been destroyed.\n",
+                libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
         close(send_fd);
         return 0;
     }
@@ -7987,7 +8021,8 @@ int main_remus(int argc, char **argv)
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
-        fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n");
+        fprintf(stderr, "%s: Backup failed? resuming domain at primary.\n",
+                libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
         libxl_domain_resume(ctx, domid, 1, 0);
     }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 9284887..8ba256a 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -514,7 +514,9 @@ struct cmd_spec cmd_table[] = {
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
       "                        Works only in unsafe mode.\n"
       "-n                      Disable network output buffering. Works only in unsafe mode.\n"
-      "-d                      Disable disk replication. Works only in unsafe mode."
+      "-d                      Disable disk replication. Works only in unsafe mode.\n"
+      "-c                      Enable COLO HA. It is conflict with -i and -b, and memory\n"
+      "                        checkpoint must be disabled"
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 21/29] tools: xc_doamin_restore: zero ioreq page only one time
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (19 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 20/29] implement the cmdline for COLO Yang Hongyang
@ 2015-04-01  6:41 ` Yang Hongyang
  2015-04-01  6:55 ` [RFC PATCH COLO v5 22/29] Support colo mode for qemu disk Yang Hongyang
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

ioreq page contains evtchn which will be set when we resume the
secondary vm the first time. The hypervisor will check if the
evtchn is corrupted, so we cannot zero the ioreq page more
than one time.

The ioreq->state is always STATE_IOREQ_NONE after the vm is
suspended, so it is OK if we only zero it one time.
---
 tools/libxc/xc_domain_restore.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index cc5c1ad..276db37 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1501,6 +1501,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     struct restore_ctx _ctx;
     struct restore_ctx *ctx = &_ctx;
     struct domain_info_context *dinfo = &ctx->dinfo;
+    int skip_clear_ioreq_page = 0;
 
     DPRINTF("%s: starting restore of new domid %u", __func__, dom);
 
@@ -2331,13 +2332,30 @@ new_checkpoint:
     }
 
     /* These comms pages need to be zeroed at the start of day */
-    if ( xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[0]) ||
-         xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[1]) ||
-         xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[2]) )
+    if ( xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[2]) )
     {
         PERROR("error zeroing magic pages");
         goto out;
     }
+    if ( !skip_clear_ioreq_page )
+    {
+        if ( xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[0]) ||
+             xc_clear_domain_page(xch, dom, tailbuf.u.hvm.magicpfns[1]) )
+        {
+            PERROR("error zeroing magic pages");
+            goto out;
+        }
+        /*
+         * ioreq page contains evtchn which will be set when we resume the
+         * secondary vm the first time. The hypervisor will check if the
+         * evtchn is corrupted, so we cann't clear the ioreq page more
+         * than one time.
+         *
+         * The ioreq->state is always STATE_IOREQ_NONE after the vm is
+         * suspended, so it is OK if we only clear it one time.
+         */
+        skip_clear_ioreq_page = 1;
+    }
 
     if ( (frc = xc_hvm_param_set(xch, dom,
                                  HVM_PARAM_IOREQ_PFN, tailbuf.u.hvm.magicpfns[0]))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 22/29] Support colo mode for qemu disk
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (20 preceding siblings ...)
  2015-04-01  6:41 ` [RFC PATCH COLO v5 21/29] tools: xc_doamin_restore: zero ioreq page only one time Yang Hongyang
@ 2015-04-01  6:55 ` Yang Hongyang
  2015-04-01  6:57 ` [RFC PATCH COLO v5 23/29] COLO: use qemu block replication Yang Hongyang
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:55 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

Usage: disk = ['...,colo,colo-params=xxx,active-disk=xxx,hidden-disk=xxx...']
The format of colo-params: host:port:exportname=xx

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/man/xl.pod.1           |   2 +-
 tools/libxl/libxl.c         |  42 ++++++-
 tools/libxl/libxl_create.c  |  25 ++++-
 tools/libxl/libxl_device.c  |  38 +++++++
 tools/libxl/libxl_dm.c      | 262 ++++++++++++++++++++++++++++++++++++++++++--
 tools/libxl/libxl_types.idl |   5 +
 tools/libxl/libxlu_disk_l.l |   5 +
 7 files changed, 367 insertions(+), 12 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index adcbe37..431ef5e 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -447,7 +447,7 @@ N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
      COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk at the moment.
+     There is no support for network at the moment.
 
 B<OPTIONS>
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index afe0cc9..08d68df 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2296,6 +2296,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk)
     int rc;
 
     libxl_defbool_setdefault(&disk->discard_enable, !!disk->readwrite);
+    libxl_defbool_setdefault(&disk->colo_enable, false);
+    libxl_defbool_setdefault(&disk->colo_restore_enable, false);
 
     rc = libxl__resolve_domid(gc, disk->backend_domname, &disk->backend_domid);
     if (rc < 0) return rc;
@@ -2496,6 +2498,14 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
                 flexarray_append(back, "params");
                 flexarray_append(back, libxl__sprintf(gc, "%s:%s",
                               libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                if (libxl_defbool_val(disk->colo_enable)) {
+                    flexarray_append(back, "colo-params");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_params));
+                    flexarray_append(back, "active-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->active_disk));
+                    flexarray_append(back, "hidden-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->hidden_disk));
+                }
                 assert(device->backend_kind == LIBXL__DEVICE_KIND_QDISK);
                 break;
             default:
@@ -2610,7 +2620,10 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         goto cleanup;
     }
 
-    /* "params" may not be present; but everything else must be. */
+    /*
+     * "params" and "colo-params" may not be present; but everything
+     * else must be.
+     */
     tmp = xs_read(ctx->xsh, XBT_NULL,
                   libxl__sprintf(gc, "%s/params", be_path), &len);
     if (tmp && strchr(tmp, ':')) {
@@ -2620,6 +2633,33 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         disk->pdev_path = tmp;
     }
 
+    tmp = xs_read(ctx->xsh, XBT_NULL,
+                  libxl__sprintf(gc, "%s/colo-params", be_path), &len);
+    if (tmp) {
+        libxl_defbool_set(&disk->colo_enable, true);
+        disk->colo_params = tmp;
+    } else {
+        libxl_defbool_set(&disk->colo_enable, false);
+    }
+
+    if (libxl_defbool_val(disk->colo_enable)) {
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      libxl__sprintf(gc, "%s/active-disk", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/active-disk", be_path);
+            goto cleanup;
+        }
+        disk->active_disk = tmp;
+
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      libxl__sprintf(gc, "%s/hidden-disk", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/hidden-disk", be_path);
+            goto cleanup;
+        }
+        disk->hidden_disk = tmp;
+    }
+
 
     tmp = libxl__xs_read(gc, XBT_NULL,
                          libxl__sprintf(gc, "%s/type", be_path));
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 89c18dc..1fae0a4 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1663,12 +1663,29 @@ static void domain_create_cb(libxl__egc *egc,
 
     libxl__ao_complete(egc, ao, rc);
 }
-    
+
+static void set_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, true);
+}
+
+static void unset_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, false);
+}
+
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
+    unset_disk_colo_restore(d_config);
     return do_domain_create(ctx, d_config, domid, -1, -1, 0,
                             ao_how, aop_console_how);
 }
@@ -1681,8 +1698,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
 {
     int send_fd = -1;
 
-    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
         send_fd = params->send_fd;
+        set_disk_colo_restore(d_config);
+    } else {
+        unset_disk_colo_restore(d_config);
+    }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
                             params->checkpointed_stream, ao_how, aop_console_how);
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 0f50d04..c45c767 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -196,6 +196,10 @@ static int disk_try_backend(disk_try_backend_args *a,
             goto bad_format;
         }
 
+        if (libxl_defbool_val(a->disk->colo_enable) ||
+            a->disk->active_disk || a->disk->hidden_disk)
+            goto bad_colo;
+
         if (a->disk->backend_domid != LIBXL_TOOLSTACK_DOMID) {
             LOG(DEBUG, "Disk vdev=%s, is using a storage driver domain, "
                        "skipping physical device check", a->disk->vdev);
@@ -218,6 +222,10 @@ static int disk_try_backend(disk_try_backend_args *a,
     case LIBXL_DISK_BACKEND_TAP:
         if (a->disk->script) goto bad_script;
 
+        if (libxl_defbool_val(a->disk->colo_enable) ||
+            a->disk->active_disk || a->disk->hidden_disk)
+            goto bad_colo;
+
         if (a->disk->is_cdrom) {
             LOG(DEBUG, "Disk vdev=%s, backend tap unsuitable for cdroms",
                        a->disk->vdev);
@@ -236,6 +244,16 @@ static int disk_try_backend(disk_try_backend_args *a,
 
     case LIBXL_DISK_BACKEND_QDISK:
         if (a->disk->script) goto bad_script;
+        if (libxl_defbool_val(a->disk->colo_enable)) {
+            if (!a->disk->colo_params)
+                goto bad_colo_params;
+
+            if (!a->disk->active_disk)
+                goto bad_active_disk;
+
+            if (!a->disk->hidden_disk)
+                goto bad_hidden_disk;
+        }
         return backend;
 
     default:
@@ -256,6 +274,26 @@ static int disk_try_backend(disk_try_backend_args *a,
     LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with script=...",
         a->disk->vdev, libxl_disk_backend_to_string(backend));
     return 0;
+
+ bad_colo:
+    LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
+
+ bad_colo_params:
+    LOG(DEBUG, "Disk vdev=%s, backend %s needs colo-params=... for colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
+
+ bad_active_disk:
+    LOG(DEBUG, "Disk vdev=%s, backend %s needs active-disk=... for colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
+
+ bad_hidden_disk:
+    LOG(DEBUG, "Disk vdev=%s, backend %s needs hidden-disk=... for colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
 }
 
 int libxl__device_disk_set_backend(libxl__gc *gc, libxl_device_disk *disk) {
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 1a6333a..e227a0f 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -416,6 +416,211 @@ static char *dm_spice_options(libxl__gc *gc,
     return opt;
 }
 
+/* colo mode */
+enum {
+    LIBXL__COLO_NONE = 0,
+    LIBXL__COLO_PRIMARY,
+    LIBXL__COLO_SECONDARY,
+};
+
+/* The format of colo-params: host:port:exportname=xx */
+static int parse_colo_params(libxl__gc *gc, const char *colo_params,
+                             const char **host, const char **port,
+                             const char **exportname)
+{
+    const char *delim;
+
+    delim = strstr(colo_params, ":");
+    if (!delim)
+        return 1;
+    if (delim == colo_params)
+        return 1;
+    *host = libxl__strndup(gc, colo_params, delim - colo_params);
+    colo_params = delim + 1;
+
+    delim = strstr(colo_params, ":");
+    if (!delim)
+        return 1;
+    if (delim == colo_params)
+        return 1;
+    *port = libxl__strndup(gc, colo_params, delim - colo_params);
+    colo_params = delim + 1;
+
+    if (strncmp(colo_params, "exportname=", strlen("exportname=")))
+        return 1;
+    *exportname = colo_params + strlen("exportname=");
+    if ((*exportname)[0] == 0)
+        return 1;
+
+    return 0;
+}
+
+static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
+                                         int unit, const char *format,
+                                         const libxl_device_disk *disk,
+                                         const char *nbd_target,
+                                         int colo_mode)
+{
+    char *drive = NULL;
+    const char *host = NULL, *port = NULL, *exportname = NULL;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const char *colo_params = disk->colo_params;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
+
+    switch (colo_mode) {
+    case LIBXL__COLO_NONE:
+        drive = libxl__sprintf
+            (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
+             pdev_path, unit, format);
+        break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=scsi,bus=0,unit=x,cache=writeback,driver=quorum,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  children.1.file.host=host,\
+         *  children.1.file.port=port,\
+         *  children.1.file.export=exportname,\
+         *  children.1.file.driver=nbd+colo,\
+         *  children.1.driver=raw,\
+         *  children.1.ignore-errors=on,\
+         *  read-pattern=fifo
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=scsi,bus=0,unit=%d,cache=writeback,driver=quorum,"
+                 "children.0.file.filename=%s,"
+                 "children.0.driver=%s,"
+                 "children.1.file.host=%s,"
+                 "children.1.file.port=%s,"
+                 "children.1.file.export=%s,"
+                 "children.1.file.driver=nbd+colo,"
+                 "children.1.driver=raw,"
+                 "children.1.ignore-errors=on,"
+                 "read-pattern=fifo",
+             unit, pdev_path, format, host, port, exportname);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=scsi,bus=0,unit=x,cache=writeback,driver=qcow2+colo,\
+         *  file=active_disk,\
+         *  backing_reference.drive_id=nbd_target,\
+         *  backing_reference.hidden-disk.file.filename=hidden_disk,\
+         *  backing_reference.hidden-disk.allow-write-backing-file=on,\
+         *  export=exportname,
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=scsi,bus=0,unit=%d,cache=writeback,driver=qcow2+colo,"
+                 "file=%s,"
+                 "backing_reference.drive_id=%s,"
+                 "backing_reference.hidden-disk.file.filename=%s,"
+                 "backing_reference.hidden-disk.allow-write-backing-file=on,"
+                 "export=%s",
+             unit, active_disk, nbd_target, hidden_disk, exportname);
+        break;
+    default:
+        abort();
+    }
+
+    if (!drive)
+        LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
+                   "colo-params is invalid for %s", pdev_path);
+    return drive;
+}
+
+static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
+                                        int unit, const char *format,
+                                        const libxl_device_disk *disk,
+                                        const char *nbd_target,
+                                        int colo_mode)
+{
+    char *drive = NULL;
+    const char *host = NULL, *port = NULL, *exportname = NULL;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const char *colo_params = disk->colo_params;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
+
+    switch (colo_mode) {
+    case LIBXL__COLO_NONE:
+        drive = libxl__sprintf
+            (gc, "file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
+             pdev_path, unit, format);
+        break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=ide,index=x,media=disk,cache=writeback,driver=quorum,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  children.1.file.host=host,\
+         *  children.1.file.port=port,\
+         *  children.1.file.export=exportname,\
+         *  children.1.file.driver=nbd+colo,\
+         *  children.1.driver=raw,\
+         *  children.1.ignored-errors=on,\
+         *  read-pattern=fifo
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=ide,index=%d,media=disk,cache=writeback,driver=quorum,"
+                 "children.0.file.filename=%s,"
+                 "children.0.driver=%s,"
+                 "children.1.file.host=%s,"
+                 "children.1.file.port=%s,"
+                 "children.1.file.export=%s,"
+                 "children.1.file.driver=nbd+colo,"
+                 "children.1.driver=raw,"
+                 "children.1.ignore-errors=on,"
+                 "read-pattern=fifo",
+             unit, pdev_path, format, host, port, exportname);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=ide,index=x,media=disk,cache=writeback,driver=qcow2+colo,\
+         *  file=active_disk,\
+         *  backing_reference.drive_id=nbd_target,\
+         *  backing_reference.hidden-disk.file.filename=hidden_disk,\
+         *  backing_reference.hidden-disk.allow-write-backing-file=on,\
+         *  export=exportname,
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=ide,index=%d,media=disk,cache=writeback,driver=qcow2+colo,"
+                 "file=%s,"
+                 "backing_reference.drive_id=%s,"
+                 "backing_reference.hidden-disk.file.filename=%s,"
+                 "backing_reference.hidden-disk.allow-write-backing-file=on,"
+                 "export=%s",
+             unit, active_disk, nbd_target, hidden_disk, exportname);
+        break;
+    default:
+        abort();
+    }
+
+    if (!drive)
+        LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
+                   "colo-params is invalid for %s", pdev_path);
+    return drive;
+}
+
 static char ** libxl__build_device_model_args_new(libxl__gc *gc,
                                         const char *dm, int guest_domid,
                                         const libxl_domain_config *guest_config,
@@ -803,6 +1008,8 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc,
             const char *format = qemu_disk_format_string(disks[i].format);
             char *drive;
             const char *pdev_path;
+            int colo_mode;
+            char *drive_id;
 
             if (dev_number == -1) {
                 LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "unable to determine"
@@ -846,16 +1053,55 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc,
                  * For other disks we translate devices 0..3 into
                  * hd[a-d] and ignore the rest.
                  */
-                if (strncmp(disks[i].vdev, "sd", 2) == 0)
-                    drive = libxl__sprintf
-                        (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
-                         pdev_path, disk, format);
-                else if (disk < 4)
+                if (libxl_defbool_val(disks[i].colo_enable)) {
+                    if (libxl_defbool_val(disks[i].colo_restore_enable))
+                        colo_mode = LIBXL__COLO_SECONDARY;
+                    else
+                        colo_mode = LIBXL__COLO_PRIMARY;
+                } else {
+                    colo_mode = LIBXL__COLO_NONE;
+                }
+
+                if (colo_mode == LIBXL__COLO_SECONDARY) {
+                    /*
+                     * -drive if=none,driver=format,file=pdev_path,\
+                     * id=nbd_targetx
+                     */
+                    if (strncmp(disks[i].vdev, "sd", 2) == 0) {
+                        drive_id = libxl__sprintf(gc, "nbd_target%d", disk + 4);
+                    } else if (disk < 4) {
+                        drive_id = libxl__sprintf(gc, "nbd_target%d", disk);
+                    } else {
+                        continue; /* Do not emulate this disk */
+                    }
                     drive = libxl__sprintf
-                        (gc, "file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
-                         pdev_path, disk, format);
-                else
+                        (gc, "if=none,driver=%s,file=%s,id=%s",
+                         format, pdev_path, drive_id);
+
+                    flexarray_append(dm_args, "-drive");
+                    flexarray_append(dm_args, drive);
+                } else {
+                    drive_id = NULL;
+                }
+
+                if (strncmp(disks[i].vdev, "sd", 2) == 0) {
+                    drive = qemu_disk_scsi_drive_string(gc, pdev_path, disk,
+                                                        format,
+                                                        &disks[i],
+                                                        drive_id,
+                                                        colo_mode);
+                } else if (disk < 4) {
+                    drive = qemu_disk_ide_drive_string(gc, pdev_path, disk,
+                                                       format,
+                                                       &disks[i],
+                                                       drive_id,
+                                                       colo_mode);
+                } else {
                     continue; /* Do not emulate this disk */
+                }
+
+                if (!drive)
+                    continue;
             }
 
             flexarray_append(dm_args, "-drive");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e204c58..87c52b9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -508,6 +508,11 @@ libxl_device_disk = Struct("device_disk", [
     ("is_cdrom", integer),
     ("direct_io_safe", bool),
     ("discard_enable", libxl_defbool),
+    ("colo_enable", libxl_defbool),
+    ("colo_restore_enable", libxl_defbool),
+    ("colo_params", string),
+    ("active_disk", string),
+    ("hidden_disk", string)
     ])
 
 libxl_device_nic = Struct("device_nic", [
diff --git a/tools/libxl/libxlu_disk_l.l b/tools/libxl/libxlu_disk_l.l
index 1a5deb5..566aa1e 100644
--- a/tools/libxl/libxlu_disk_l.l
+++ b/tools/libxl/libxlu_disk_l.l
@@ -176,6 +176,11 @@ script=[^,]*,?	{ STRIP(','); SAVESTRING("script", script, FROMEQUALS); }
 direct-io-safe,? { DPC->disk->direct_io_safe = 1; }
 discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, true); }
 no-discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, false); }
+colo,?		{ libxl_defbool_set(&DPC->disk->colo_enable, true); }
+no-colo,?	{ libxl_defbool_set(&DPC->disk->colo_enable, false); }
+colo-params=[^,]*,?	{ STRIP(','); SAVESTRING("colo-params", colo_params, FROMEQUALS); }
+active-disk=[^,]*,?	{ STRIP(','); SAVESTRING("active-disk", active_disk, FROMEQUALS); }
+hidden-disk=[^,]*,?	{ STRIP(','); SAVESTRING("hidden-disk", hidden_disk, FROMEQUALS); }
 
  /* the target magic parameter, eats the rest of the string */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 23/29] COLO: use qemu block replication
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (21 preceding siblings ...)
  2015-04-01  6:55 ` [RFC PATCH COLO v5 22/29] Support colo mode for qemu disk Yang Hongyang
@ 2015-04-01  6:57 ` Yang Hongyang
  2015-04-01  6:57 ` [RFC PATCH COLO v5 24/29] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:57 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

From: Wen Congyang <wency@cn.fujitsu.com>

The guest should be paused before doing COLO!!!

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/Makefile             |   1 +
 tools/libxl/libxl_colo_qdisk.c   | 209 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_restore.c |  21 +++-
 tools/libxl/libxl_colo_save.c    |  36 ++++++-
 tools/libxl/libxl_internal.h     |  18 ++++
 tools/libxl/libxl_qmp.c          |  31 ++++++
 6 files changed, 312 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_qdisk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index b2eaf14..12caf4c 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -58,6 +58,7 @@ endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
+LIBXL_OBJS-y += libxl_colo_qdisk.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_colo_qdisk.c b/tools/libxl/libxl_colo_qdisk.c
new file mode 100644
index 0000000..d73572e
--- /dev/null
+++ b/tools/libxl/libxl_colo_qdisk.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+typedef struct libxl__colo_qdisk {
+    libxl__checkpoint_device *dev;
+} libxl__colo_qdisk;
+
+/* ========== init() and cleanup() ========== */
+int init_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+    /*
+     * We don't know if we use qemu block replication, so
+     * we cannot start block replication here.
+     */
+    return 0;
+}
+
+void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== setup() and teardown() ========== */
+static void colo_qdisk_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                             bool primary)
+{
+    const libxl_device_disk *disk = dev->backend_dev;
+    const char *addr = NULL;
+    const char *export_name;
+    int ret, rc = 0;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = dev->cds;
+    const char *colo_params = disk->colo_params;
+    const int domid = cds->domid;
+
+    EGC_GC;
+
+    if (disk->backend != LIBXL_DISK_BACKEND_QDISK ||
+        !libxl_defbool_val(disk->colo_enable)) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    export_name = strstr(colo_params, ":exportname=");
+    if (!export_name) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+    export_name += strlen(":exportname=");
+    if (export_name[0] == 0) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    dev->matched = 1;
+
+    if (primary) {
+        /* NBD server is not ready, so we cannot start block replication now */
+        goto out;
+    } else {
+        libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+        int len;
+
+        if (crs->qdisk_setuped)
+            goto out;
+
+        crs->qdisk_setuped = true;
+
+        len = export_name - strlen(":exportname=") - colo_params;
+        addr = libxl__strndup(gc, colo_params, len);
+    }
+
+    ret = libxl__qmp_block_start_replication(gc, domid, primary, addr);
+    if (ret)
+        rc = ERROR_FAIL;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_qdisk_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                                bool primary)
+{
+    int ret, rc = 0;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = dev->cds;
+    const int domid = cds->domid;
+
+    EGC_GC;
+
+    if (primary) {
+        libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+
+        if (!css->qdisk_setuped)
+            goto out;
+
+        css->qdisk_setuped = false;
+    } else {
+        libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+
+        if (!crs->qdisk_setuped)
+            goto out;
+
+        crs->qdisk_setuped = false;
+    }
+
+    ret = libxl__qmp_block_stop_replication(gc, domid, primary);
+    if (ret)
+        rc = ERROR_FAIL;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/* ========== checkpointing APIs ========== */
+/* should be called after libxl__checkpoint_device_instance_ops.preresume */
+int colo_qdisk_preresume(libxl_ctx *ctx, domid_t domid)
+{
+    GC_INIT(ctx);
+    int ret;
+
+    ret = libxl__qmp_block_do_checkpoint(gc, domid);
+
+    GC_FREE;
+    return ret;
+}
+
+static void colo_qdisk_save_preresume(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+    int ret, rc = 0;
+
+    /* Convenience aliases */
+    const int domid = dev->cds->domid;
+
+    EGC_GC;
+
+    if (css->qdisk_setuped)
+        goto out;
+
+    css->qdisk_setuped = true;
+
+    ret = libxl__qmp_block_start_replication(gc, domid, true, NULL);
+    if (ret)
+        rc = ERROR_FAIL;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/* ======== primary ======== */
+static void colo_qdisk_save_setup(libxl__egc *egc,
+                                  libxl__checkpoint_device *dev)
+{
+    colo_qdisk_setup(egc, dev, true);
+}
+
+static void colo_qdisk_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    colo_qdisk_teardown(egc, dev, true);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_qdisk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = colo_qdisk_save_setup,
+    .teardown = colo_qdisk_save_teardown,
+    .preresume = colo_qdisk_save_preresume,
+};
+
+/* ======== secondary ======== */
+static void colo_qdisk_restore_setup(libxl__egc *egc,
+                                     libxl__checkpoint_device *dev)
+{
+    colo_qdisk_setup(egc, dev, false);
+}
+
+static void colo_qdisk_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    colo_qdisk_teardown(egc, dev, false);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = colo_qdisk_restore_setup,
+    .teardown = colo_qdisk_restore_teardown,
+};
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 554474d..28eb8ab 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -64,7 +64,10 @@ static void libxl__colo_restore_domain_resume_callback(void *data);
 static void libxl__colo_restore_domain_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_suspend_callback(void *data);
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
+
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    &colo_restore_device_qdisk,
     NULL,
 };
 
@@ -163,7 +166,11 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_qdisk(cds);
+    if (rc)  goto out;
+
     rc = 0;
+out:
     return rc;
 }
 
@@ -171,6 +178,8 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
+
+    cleanup_subkind_qdisk(cds);
 }
 
 
@@ -282,6 +291,8 @@ void libxl__colo_restore_setup(libxl__egc *egc,
     logdirty_init(&crcs->lds);
     crcs->lds.ao = ao;
 
+    crs->qdisk_setuped = false;
+
     rc = 0;
 
 out:
@@ -590,6 +601,12 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    rc = colo_qdisk_preresume(CTX, crs->domid);
+    if (rc) {
+        LOG(ERROR, "colo_qdisk_preresume() fails");
+        goto out;
+    }
+
     colo_restore_resume_vm(egc, crcs);
 
     return;
@@ -775,8 +792,8 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
     STATE_AO_GC(crs->ao);
 
-    /* TODO: disk/nic support */
-    cds->device_kind_flags = 0;
+    /* TODO: nic support */
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
     cds->callback = colo_restore_setup_cds_done;
     cds->ao = ao;
     cds->domid = crs->domid;
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index bb5b434..de270e2 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -18,7 +18,10 @@
 #include "libxl_internal.h"
 #include "libxl_colo.h"
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
+
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    &colo_save_device_qdisk,
     NULL,
 };
 
@@ -29,7 +32,11 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_qdisk(cds);
+    if (rc) goto out;
+
     rc = 0;
+out:
     return rc;
 }
 
@@ -37,6 +44,8 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
+
+    cleanup_subkind_qdisk(cds);
 }
 
 /* ================= colo: setup save environment ================= */
@@ -64,9 +73,11 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->send_fd = dss->fd;
     css->recv_fd = dss->recv_fd;
     css->svm_running = false;
+    css->paused = true;
+    css->qdisk_setuped = false;
 
-    /* TODO: disk/nic support */
-    cds->device_kind_flags = 0;
+    /* TODO: nic support */
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
     cds->ops = colo_ops;
     cds->callback = colo_save_setup_done;
     cds->ao = ao;
@@ -452,12 +463,33 @@ static void colo_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    if (!css->paused) {
+        rc = colo_qdisk_preresume(CTX, dss->domid);
+        if (rc) {
+            LOG(ERROR, "colo_qdisk_preresume() fails");
+            goto out;
+        }
+    }
+
     /* Resumes the domain and the device model */
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) {
         LOG(ERROR, "cannot resume primary vm");
         goto out;
     }
 
+    /*
+     * The guest should be paused before doing colo because there is
+     * no disk migration.
+     */
+    if (css->paused) {
+        rc = libxl__domain_unpause(gc, dss->domid);
+        if (rc) {
+            LOG(ERROR, "cannot unpause primary vm");
+            goto out;
+        }
+        css->paused = false;
+    }
+
     /* read LIBXL_COLO_SVM_RESUMED */
     memset(dc, 0, sizeof(*dc));
     dc->ao = ao;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7065f71..3a92cb9 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1641,6 +1641,14 @@ _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enabl
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
 /* Add a virtual CPU */
 _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int index);
+/* Start block replication */
+_hidden int libxl__qmp_block_start_replication(libxl__gc *gc, int domid,
+                                               bool primary, const char *addr);
+/* Do block checkpoint */
+_hidden int libxl__qmp_block_do_checkpoint(libxl__gc *gc, int domid);
+/* Stop block replication */
+_hidden int libxl__qmp_block_stop_replication(libxl__gc *gc, int domid,
+                                              bool primary);
 /* close and free the QMP handler */
 _hidden void libxl__qmp_close(libxl__qmp_handler *qmp);
 /* remove the socket file, if the file has already been removed,
@@ -2712,6 +2720,9 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds);
 int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
+int init_subkind_qdisk(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds);
+int colo_qdisk_preresume(libxl_ctx *ctx, domid_t domid);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
                                         libxl__checkpoint_devices_state *,
@@ -2837,6 +2848,10 @@ struct libxl__colo_save_state {
     uint8_t temp_buff[9];
     void (*callback)(libxl__egc *, libxl__colo_save_state *);
     bool svm_running;
+    bool paused;
+
+    /* private, used by qdisk block replication */
+    bool qdisk_setuped;
 };
 
 /*----- Domain suspend (save) state structure -----*/
@@ -3178,6 +3193,9 @@ struct libxl__colo_restore_state {
     libxl__domain_create_cb *saved_cb;
     void *crcs;
     libxl__checkpoint_devices_state cds;
+
+    /* private, used by qdisk block replication */
+    bool qdisk_setuped;
 };
 
 struct libxl__domain_create_state {
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 1b66d55..33e9eb4 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -965,6 +965,37 @@ int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int idx)
     return qmp_run_command(gc, domid, "cpu-add", args, NULL, NULL);
 }
 
+int libxl__qmp_block_start_replication(libxl__gc *gc, int domid,
+                                       bool primary, const char *addr)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_bool(gc, &args, "enable", true);
+    qmp_parameters_add_bool(gc, &args, "primary", primary);
+    if (!primary)
+        qmp_parameters_add_string(gc, &args, "addr", addr);
+
+    return qmp_run_command(gc, domid, "xen-set-block-replication", args,
+                           NULL, NULL);
+}
+
+int libxl__qmp_block_do_checkpoint(libxl__gc *gc, int domid)
+{
+    return qmp_run_command(gc, domid, "xen-do-block-checkpoint", NULL,
+                           NULL, NULL);
+}
+
+int libxl__qmp_block_stop_replication(libxl__gc *gc, int domid, bool primary)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_bool(gc, &args, "enable", false);
+    qmp_parameters_add_bool(gc, &args, "primary", primary);
+
+    return qmp_run_command(gc, domid, "xen-set-block-replication", args,
+                           NULL, NULL);
+}
+
 int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
                                const libxl_domain_config *guest_config)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 24/29] COLO proxy: implement setup/teardown of COLO proxy module
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (22 preceding siblings ...)
  2015-04-01  6:57 ` [RFC PATCH COLO v5 23/29] COLO: use qemu block replication Yang Hongyang
@ 2015-04-01  6:57 ` Yang Hongyang
  2015-04-01  6:57 ` [RFC PATCH COLO v5 25/29] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:57 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

setup/teardown of COLO proxy module.
we use netlink to communicate with proxy module.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/Makefile           |   1 +
 tools/libxl/libxl_colo.h       |   2 +
 tools/libxl/libxl_colo_proxy.c | 210 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h   |   9 ++
 4 files changed, 222 insertions(+)
 create mode 100644 tools/libxl/libxl_colo_proxy.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 12caf4c..c74ba79 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -59,6 +59,7 @@ endif
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
+LIBXL_OBJS-y += libxl_colo_proxy.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 26a2563..5983aa0 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -45,4 +45,6 @@ extern void libxl__colo_save_teardown(libxl__egc *egc,
                                       libxl__colo_save_state *css,
                                       int rc);
 
+extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
+extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
new file mode 100644
index 0000000..486ed73
--- /dev/null
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+#include <linux/netlink.h>
+
+#define NETLINK_COLO 28
+
+enum colo_netlink_op {
+    COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1),
+    COLO_CHECKPOINT,
+    COLO_FAILOVER,
+    COLO_PROXY_INIT,
+    COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
+};
+
+/* ========= colo-proxy: helper functions ========== */
+
+static int colo_proxy_send(libxl__colo_proxy_state *cps, uint8_t *buff, uint64_t size, int type)
+{
+    struct sockaddr_nl sa;
+    struct nlmsghdr msg;
+    struct iovec iov;
+    struct msghdr mh;
+    int ret;
+
+    STATE_AO_GC(cps->ao);
+
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_pid = 0;
+    sa.nl_groups = 0;
+
+    msg.nlmsg_len = NLMSG_SPACE(0);
+    msg.nlmsg_flags = NLM_F_REQUEST;
+    if (type == COLO_PROXY_INIT) {
+        msg.nlmsg_flags |= NLM_F_ACK;
+    }
+    msg.nlmsg_seq = 0;
+    /* This is untrusty */
+    msg.nlmsg_pid = cps->index;
+    msg.nlmsg_type = type;
+
+    iov.iov_base = &msg;
+    iov.iov_len = msg.nlmsg_len;
+
+    mh.msg_name = &sa;
+    mh.msg_namelen = sizeof(sa);
+    mh.msg_iov = &iov;
+    mh.msg_iovlen = 1;
+    mh.msg_control = NULL;
+    mh.msg_controllen = 0;
+    mh.msg_flags = 0;
+
+    ret = sendmsg(cps->sock_fd, &mh, 0);
+    if (ret <= 0) {
+        LOG(ERROR, "can't send msg to kernel by netlink: %s",
+            strerror(errno));
+    }
+
+    return ret;
+}
+
+/* error: return -1, otherwise return 0 */
+static int64_t colo_proxy_recv(libxl__colo_proxy_state *cps, uint8_t **buff, int flags)
+{
+    struct sockaddr_nl sa;
+    struct iovec iov;
+    struct msghdr mh = {
+        .msg_name = &sa,
+        .msg_namelen = sizeof(sa),
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+    };
+    uint32_t size = 16384;
+    int64_t len = 0;
+    int ret;
+
+    STATE_AO_GC(cps->ao);
+    uint8_t *tmp = libxl__malloc(gc, size);
+
+    iov.iov_base = tmp;
+    iov.iov_len = size;
+next:
+   ret = recvmsg(cps->sock_fd, &mh, flags);
+    if (ret <= 0) {
+        goto out;
+    }
+
+    len += ret;
+    if (mh.msg_flags & MSG_TRUNC) {
+        size += 16384;
+        tmp = libxl__realloc(gc, tmp, size);
+        iov.iov_base = tmp + len;
+        iov.iov_len = size - len;
+        goto next;
+    }
+
+    *buff = tmp;
+    return len;
+
+out:
+    free(tmp);
+    *buff = NULL;
+    return ret;
+}
+
+/* ========= colo-proxy: setup and teardown ========== */
+
+int colo_proxy_setup(libxl__colo_proxy_state *cps)
+{
+    int skfd = 0;
+    struct sockaddr_nl sa;
+    struct nlmsghdr *h;
+    struct timeval tv = {0, 500000}; /* timeout for recvmsg from kernel */
+    int i = 1;
+    int ret = ERROR_FAIL;
+    uint8_t *buff = NULL;
+    int64_t size;
+
+    STATE_AO_GC(cps->ao);
+
+    skfd = socket(PF_NETLINK, SOCK_RAW, NETLINK_COLO);
+    if (skfd < 0) {
+        LOG(ERROR, "can not create a netlink socket: %s", strerror(errno));
+        goto out;
+    }
+    cps->sock_fd = skfd;
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_groups = 0;
+retry:
+    sa.nl_pid = i++;
+
+    if (i > 10) {
+        LOG(ERROR, "netlink bind error");
+        goto out;
+    }
+
+    ret = bind(skfd, (struct sockaddr *)&sa, sizeof(sa));
+    if (ret < 0 && errno == EADDRINUSE) {
+        LOG(ERROR, "colo index %d has already in used", sa.nl_pid);
+        goto retry;
+    }
+
+    cps->index = sa.nl_pid;
+    ret = colo_proxy_send(cps, NULL, 0, COLO_PROXY_INIT);
+    if (ret < 0) {
+        goto out;
+    }
+    setsockopt(cps->sock_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    ret = -1;
+    size = colo_proxy_recv(cps, &buff, 0);
+    /* disable SO_RCVTIMEO */
+    tv.tv_usec = 0;
+    setsockopt(cps->sock_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    if (size < 0) {
+        LOG(ERROR, "Can't recv msg from kernel by netlink: %s",
+            strerror(errno));
+        goto out;
+    }
+
+    if (size) {
+        h = (struct nlmsghdr *)buff;
+
+        if (h->nlmsg_type == NLMSG_ERROR) {
+            struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
+            if (size - sizeof(*h) < sizeof(*err)) {
+                goto out;
+            }
+            ret = -err->error;
+            if (ret) {
+                goto out;
+            }
+        }
+    }
+
+    ret = 0;
+
+out:
+    free(buff);
+    if (ret) {
+        close(cps->sock_fd);
+        cps->sock_fd = -1;
+    }
+    return ret;
+}
+
+void colo_proxy_teardown(libxl__colo_proxy_state *cps)
+{
+    if (cps->sock_fd >= 0) {
+        close(cps->sock_fd);
+        cps->sock_fd = -1;
+    }
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3a92cb9..b091958 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2833,6 +2833,15 @@ struct libxl__remus_state {
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- colo related state structure -----*/
+typedef struct libxl__colo_proxy_state libxl__colo_proxy_state;
+struct libxl__colo_proxy_state {
+    /* set by caller of colo_proxy_setup */
+    libxl__ao *ao;
+
+    int sock_fd;
+    int index;
+};
+
 typedef struct libxl__colo_save_state libxl__colo_save_state;
 struct libxl__colo_save_state {
     libxl__checkpoint_devices_state cds;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 25/29] COLO proxy: preresume, postresume and checkpoint
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (23 preceding siblings ...)
  2015-04-01  6:57 ` [RFC PATCH COLO v5 24/29] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
@ 2015-04-01  6:57 ` Yang Hongyang
  2015-04-01  6:57 ` [RFC PATCH COLO v5 26/29] COLO nic: implement COLO nic subkind Yang Hongyang
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:57 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

preresume, postresume and checkpoint

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_colo.h       |  3 +++
 tools/libxl/libxl_colo_proxy.c | 57 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 5983aa0..872c652 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -47,4 +47,7 @@ extern void libxl__colo_save_teardown(libxl__egc *egc,
 
 extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
 extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
+extern void colo_proxy_preresume(libxl__colo_proxy_state *cps);
+extern void colo_proxy_postresume(libxl__colo_proxy_state *cps);
+extern int colo_proxy_checkpoint(libxl__colo_proxy_state *cps);
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
index 486ed73..2483be3 100644
--- a/tools/libxl/libxl_colo_proxy.c
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -208,3 +208,60 @@ void colo_proxy_teardown(libxl__colo_proxy_state *cps)
         cps->sock_fd = -1;
     }
 }
+
+/* ========= colo-proxy: preresume, postresume and checkpoint ========== */
+
+void colo_proxy_preresume(libxl__colo_proxy_state *cps)
+{
+    colo_proxy_send(cps, NULL, 0, COLO_CHECKPOINT);
+    /* TODO: need to handle if the call fails... */
+}
+
+void colo_proxy_postresume(libxl__colo_proxy_state *cps)
+{
+    /* nothing to do... */
+}
+
+
+typedef struct colo_msg {
+    bool is_checkpoint;
+} colo_msg;
+
+/*
+do checkpoint: return 1
+error: return -1
+do not checkpoint: return 0
+*/
+int colo_proxy_checkpoint(libxl__colo_proxy_state *cps)
+{
+    uint8_t *buff;
+    int64_t size;
+    struct nlmsghdr *h;
+    struct colo_msg *m;
+    int ret = -1;
+
+    size = colo_proxy_recv(cps, &buff, MSG_DONTWAIT);
+
+    /* timeout, return no checkpoint message. */
+    if (size <= 0) {
+        return 0;
+    }
+
+    h = (struct nlmsghdr *) buff;
+
+    if (h->nlmsg_type == NLMSG_ERROR) {
+        goto out;
+    }
+
+    if (h->nlmsg_len < NLMSG_LENGTH(sizeof(*m))) {
+        goto out;
+    }
+
+    m = NLMSG_DATA(h);
+
+    ret = m->is_checkpoint ? 1 : 0;
+
+out:
+    free(buff);
+    return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 26/29] COLO nic: implement COLO nic subkind
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (24 preceding siblings ...)
  2015-04-01  6:57 ` [RFC PATCH COLO v5 25/29] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
@ 2015-04-01  6:57 ` Yang Hongyang
  2015-04-01  6:58 ` [RFC PATCH COLO v5 27/29] setup and control colo proxy on primary side Yang Hongyang
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:57 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/colo-proxy-setup | 128 ++++++++++++++
 tools/libxl/Makefile                 |   1 +
 tools/libxl/libxl_colo_nic.c         | 313 +++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h         |   5 +
 tools/libxl/libxl_types.idl          |   2 +
 6 files changed, 450 insertions(+)
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo_nic.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index d94a9cb..1c28bea 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -25,6 +25,7 @@ XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
 XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
+XEN_SCRIPTS += colo-proxy-setup
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
 
diff --git a/tools/hotplug/Linux/colo-proxy-setup b/tools/hotplug/Linux/colo-proxy-setup
new file mode 100755
index 0000000..850f672
--- /dev/null
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -0,0 +1,128 @@
+#! /bin/bash
+
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+. "$dir/hotplugpath.sh"
+. "$dir/xen-network-ft.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+    echo "Invalid command: $command"
+    log err "Invalid command: $command"
+    exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${forwarddev:?}
+: ${mode:?}
+: ${forwardbr:?}
+: ${index:?}
+: ${bridge:?}
+
+if [ "$mode" != "primary" -a "$mode" != "secondary" ]
+then
+    echo "Invalid mode: $mode"
+    log err "Invalid mode: $mode"
+    exit 1
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 1
+fi
+
+function setup_primary()
+{
+    do_without_error tc qdisc add dev $vifname root handle 1: prio
+    do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \
+        12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \
+        dev $forwarddev
+
+    do_without_error modprobe nf_conntrack_ipv4
+    do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev
+
+    do_without_error /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error /usr/local/sbin/arptables -I INPUT -i $forwarddev -j MARK --set-mark $index
+}
+
+function teardown_primary()
+{
+    do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc qdisc del dev $vifname root handle 1: prio
+
+    do_without_error /usr/local/sbin/iptables -t mangle -F
+    do_without_error /usr/local/sbin/ip6tables -t mangle -F
+    do_without_error /usr/local/sbin/arptables -F
+    do_without_error rmmod xt_PMYCOLO
+}
+
+function setup_secondary()
+{
+    do_without_error brctl delif $bridge $vifname
+    do_without_error brctl addif $forwardbr $vifname
+    do_without_error brctl addif $forwardbr $forwarddev
+    do_without_error modprobe xt_SECCOLO
+
+    do_without_error /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+}
+
+function teardown_secondary()
+{
+    do_without_error brctl delif $forwardbr $forwarddev
+    do_without_error brctl delif $forwardbr $vifname
+    do_without_error brctl addif $bridge $vifname
+
+    do_without_error /usr/local/sbin/iptables -t mangle -F
+    do_without_error /usr/local/sbin/ip6tables -t mangle -F
+    do_without_error rmmod xt_SECCOLO
+}
+
+case "$command" in
+    setup)
+        if [ "$mode" = "primary" ]
+        then
+            setup_primary
+        else
+            setup_secondary
+        fi
+
+        success
+        ;;
+    teardown)
+        if [ "$mode" = "primary" ]
+        then
+            teardown_primary
+        else
+            teardown_secondary
+        fi
+        ;;
+esac
+
+if [ "$mode" = "primary" ]
+then
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev."
+else
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev,"\
+              " forwardbr: $forwardbr."
+fi
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index c74ba79..84d8278 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -60,6 +60,7 @@ LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
 LIBXL_OBJS-y += libxl_colo_proxy.o
+LIBXL_OBJS-y += libxl_colo_nic.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_colo_nic.c b/tools/libxl/libxl_colo_nic.c
new file mode 100644
index 0000000..3d2e493
--- /dev/null
+++ b/tools/libxl/libxl_colo_nic.c
@@ -0,0 +1,313 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+typedef struct libxl__colo_device_nic {
+    int devid;
+    const char *vif;
+} libxl__colo_device_nic;
+
+enum {
+    primary,
+    secondary,
+};
+
+
+/* ========== init() and cleanup() ========== */
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+    return 0;
+}
+
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== helper functions ========== */
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                     libxl__async_exec_state *aes,
+                                     int status);
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int status);
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__checkpoint_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->cds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $forwarddev
+ * $mode(primary/secondary)
+ * $forwardbr
+ * $index
+ * $bridge
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op, int side,
+                             char *colo_proxy_script)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    const libxl_device_nic *nic = dev->backend_dev;
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const char *const vif = colo_nic->vif;
+
+    arraysize = 13;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "forwarddev";
+    env[nr++] = libxl__strdup(gc, nic->forwarddev);
+    env[nr++] = "mode";
+    if (side == primary)
+        env[nr++] = "primary";
+    else
+        env[nr++] = "secondary";
+    env[nr++] = "forwardbr";
+    env[nr++] = libxl__strdup(gc, nic->forwardbr);
+    env[nr++] = "index";
+    env[nr++] = GCSPRINTF("%d", css->cps.index);
+    env[nr++] = "bridge";
+    env[nr++] = libxl__strdup(gc, nic->bridge);
+    env[nr++] = NULL;
+    assert(nr == arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = colo_proxy_script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->cds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = colo_save_teardown_script_cb;
+    else
+        aes->callback = colo_save_setup_script_cb;
+}
+
+/* ========== setup() and teardown() ========== */
+static void colo_nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                           int side, char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices, we begin to setup the nic device
+     */
+    dev->matched = 1;
+
+    if (!nic->forwarddev || !nic->forwardbr) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    GCNEW(colo_nic);
+    dev->concrete_data = colo_nic;
+    colo_nic->devid = nic->devid;
+    colo_nic->vif = get_vifname(dev, nic);
+    if (!colo_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup", side, colo_proxy_script);
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = cds->domid;
+    const int devid = colo_nic->devid;
+    const char *const vif = colo_nic->vif;
+
+    out_path_base = GCSPRINTF("%s/colo_proxy/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "colo_proxy script %s setup failed for vif %s: %s",
+            aes->args[0], vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void colo_nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                              int side, char *colo_proxy_script)
+{
+    int rc;
+    STATE_AO_GC(dev->cds->ao);
+
+    setup_async_exec(dev, "teardown", side, colo_proxy_script);
+
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/* ======== primary ======== */
+static void colo_nic_save_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+
+    colo_nic_setup(egc, dev, primary, css->colo_proxy_script);
+}
+
+static void colo_nic_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+
+    colo_nic_teardown(egc, dev, primary, css->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_save_setup,
+    .teardown = colo_nic_save_teardown,
+};
+
+/* ======== secondary ======== */
+static void colo_nic_restore_setup(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(dev->cds, *crs, cds);
+
+    colo_nic_setup(egc, dev, secondary, crs->colo_proxy_script);
+}
+
+static void colo_nic_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(dev->cds, *crs, cds);
+
+    colo_nic_teardown(egc, dev, secondary, crs->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_restore_setup,
+    .teardown = colo_nic_restore_teardown,
+};
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b091958..63101b4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2723,6 +2723,8 @@ void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 int init_subkind_qdisk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds);
 int colo_qdisk_preresume(libxl_ctx *ctx, domid_t domid);
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
                                         libxl__checkpoint_devices_state *,
@@ -2847,6 +2849,7 @@ struct libxl__colo_save_state {
     libxl__checkpoint_devices_state cds;
     int send_fd;
     int recv_fd;
+    char *colo_proxy_script;
 
     /* private */
     libxl__datacopier_state dc;
@@ -3197,6 +3200,7 @@ struct libxl__colo_restore_state {
     int pae;
     int superpages;
     libxl__colo_callback *callback;
+    char *colo_proxy_script;
 
     /* private, colo restore checkpoint state */
     libxl__domain_create_cb *saved_cb;
@@ -3219,6 +3223,7 @@ struct libxl__domain_create_state {
     /* private to domain_create */
     int guest_domid;
     int checkpointed_stream;
+    const char *colo_proxy_script;
     libxl__domain_build_state build_state;
     libxl__colo_restore_state crs;
     libxl__bootloader_state bl;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 87c52b9..4013cd3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -530,6 +530,8 @@ libxl_device_nic = Struct("device_nic", [
     ("rate_bytes_per_interval", uint64),
     ("rate_interval_usecs", uint32),
     ("gatewaydev", string),
+    ("forwarddev", string),
+    ("forwardbr", string)
     ])
 
 libxl_device_pci = Struct("device_pci", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 27/29] setup and control colo proxy on primary side
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (25 preceding siblings ...)
  2015-04-01  6:57 ` [RFC PATCH COLO v5 26/29] COLO nic: implement COLO nic subkind Yang Hongyang
@ 2015-04-01  6:58 ` Yang Hongyang
  2015-04-01  6:58 ` [RFC PATCH COLO v5 28/29] setup and control colo proxy on secondary side Yang Hongyang
  2015-04-01  6:58 ` [RFC PATCH COLO v5 29/29] cmdline switches and config vars to control colo-proxy Yang Hongyang
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:58 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

setup and control colo proxy on primary side

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_colo_save.c | 124 +++++++++++++++++++++++++++++++++++++++---
 tools/libxl/libxl_internal.h  |   4 ++
 2 files changed, 120 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index de270e2..e325cff 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -18,9 +18,11 @@
 #include "libxl_internal.h"
 #include "libxl_colo.h"
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    &colo_save_device_nic,
     &colo_save_device_qdisk,
     NULL,
 };
@@ -32,9 +34,15 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
-    rc = init_subkind_qdisk(cds);
+    rc = init_subkind_colo_nic(cds);
     if (rc) goto out;
 
+    rc = init_subkind_qdisk(cds);
+    if (rc) {
+        cleanup_subkind_colo_nic(cds);
+        goto out;
+    }
+
     rc = 0;
 out:
     return rc;
@@ -45,6 +53,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
 
+    cleanup_subkind_colo_nic(cds);
     cleanup_subkind_qdisk(cds);
 }
 
@@ -75,9 +84,16 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->svm_running = false;
     css->paused = true;
     css->qdisk_setuped = false;
+    libxl__ev_child_init(&css->child);
 
-    /* TODO: nic support */
-    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
+    if (dss->remus->netbufscript)
+        css->colo_proxy_script = libxl__strdup(gc, dss->remus->netbufscript);
+    else
+        css->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                           libxl__xen_script_dir_path());
+
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VIF) |
+                             (1 << LIBXL__DEVICE_KIND_VBD);
     cds->ops = colo_ops;
     cds->callback = colo_save_setup_done;
     cds->ao = ao;
@@ -103,12 +119,18 @@ static void colo_save_setup_done(libxl__egc *egc,
     STATE_AO_GC(cds->ao);
 
     if (!rc) {
+        css->cps.ao = ao;
+        rc = colo_proxy_setup(&css->cps);
+        if (rc)
+            goto failed;
         libxl__domain_suspend(egc, dss);
         return;
     }
 
     LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
         dss->domid);
+
+failed:
     css->cds.callback = colo_save_setup_failed;
     libxl__checkpoint_devices_teardown(egc, &css->cds);
 }
@@ -156,6 +178,7 @@ static void colo_teardown_done(libxl__egc *egc,
     libxl__domain_suspend_state *dss = CONTAINER_OF(css, *dss, css);
 
     cleanup_device_subkind(cds);
+    colo_proxy_teardown(&css->cps);
     dss->callback(egc, dss, rc);
 }
 
@@ -436,6 +459,8 @@ static void colo_read_svm_ready_done(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_preresume(&css->cps);
+
     css->svm_running = true;
     css->cds.callback = colo_preresume_cb;
     libxl__checkpoint_devices_preresume(egc, &css->cds);
@@ -529,6 +554,8 @@ static void colo_read_svm_resumed_done(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_postresume(&css->cps);
+
     ok = 1;
 
 out:
@@ -537,6 +564,91 @@ out:
 
 
 /* ===================== colo: wait new checkpoint ===================== */
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css);
+static void colo_proxy_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       int pid,
+                                       int status);
+
+static void colo_proxy_async_call(libxl__egc *egc,
+                                  libxl__colo_save_state *css,
+                                  void func(libxl__colo_save_state *),
+                                  libxl__ev_child_callback callback)
+{
+    int pid = -1, rc;
+
+    STATE_AO_GC(css->cds.ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &css->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(css);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    callback(egc, &css->child, -1, 1);
+}
+
+static void colo_proxy_wait_for_checkpoint(libxl__egc *egc,
+                                           libxl__colo_save_state *css)
+{
+    colo_proxy_async_call(egc, css,
+                          colo_proxy_async_wait_for_checkpoint,
+                          colo_proxy_async_call_done);
+}
+
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css)
+{
+    int req;
+
+again:
+    req = colo_proxy_checkpoint(&css->cps);
+    if (req < 0) {
+        /* some error happens */
+        _exit(1);
+    } else if (!req) {
+        /* no checkpoint is needed, wait for 1ms and the check again */
+        usleep(1000);
+        goto again;
+    } else {
+        /* net packets is not consistent, we need to start a checkpoint */
+        _exit(0);
+    }
+}
+
+static void colo_proxy_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       int pid,
+                                       int status)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(child, *css, child);
+
+    EGC_GC;
+
+    if (status) {
+        LOG(ERROR, "failed to wait for new checkpoint");
+        colo_start_new_checkpoint(egc, &css->cds, ERROR_FAIL);
+        return;
+    }
+
+    colo_start_new_checkpoint(egc, &css->cds, 0);
+}
+
 /*
  * Do the following things:
  * 1. do commit
@@ -546,9 +658,6 @@ out:
 static void colo_device_commit_cb(libxl__egc *egc,
                                   libxl__checkpoint_devices_state *cds,
                                   int rc);
-static void colo_start_new_checkpoint(libxl__egc *egc,
-                                      libxl__checkpoint_devices_state *cds,
-                                      int rc);
 
 void libxl__colo_save_domain_checkpoint_callback(void *data)
 {
@@ -577,8 +686,7 @@ static void colo_device_commit_cb(libxl__egc *egc,
         goto out;
     }
 
-    /* TODO: wait a new checkpoint */
-    colo_start_new_checkpoint(egc, cds, 0);
+    colo_proxy_wait_for_checkpoint(egc, css);
     return;
 
 out:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 63101b4..a64efdc 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2864,6 +2864,10 @@ struct libxl__colo_save_state {
 
     /* private, used by qdisk block replication */
     bool qdisk_setuped;
+
+    /* private, used by colo-proxy */
+    libxl__colo_proxy_state cps;
+    libxl__ev_child child;
 };
 
 /*----- Domain suspend (save) state structure -----*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 28/29] setup and control colo proxy on secondary side
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (26 preceding siblings ...)
  2015-04-01  6:58 ` [RFC PATCH COLO v5 27/29] setup and control colo proxy on primary side Yang Hongyang
@ 2015-04-01  6:58 ` Yang Hongyang
  2015-04-01  6:58 ` [RFC PATCH COLO v5 29/29] cmdline switches and config vars to control colo-proxy Yang Hongyang
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:58 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

setup and control colo proxy on secondary side

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_colo_restore.c | 26 +++++++++++++++++++++++---
 tools/libxl/libxl_internal.h     |  3 +++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 28eb8ab..151e7a5 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -64,9 +64,11 @@ static void libxl__colo_restore_domain_resume_callback(void *data);
 static void libxl__colo_restore_domain_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_suspend_callback(void *data);
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    &colo_restore_device_nic,
     &colo_restore_device_qdisk,
     NULL,
 };
@@ -166,8 +168,14 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_colo_nic(cds);
+    if (rc) goto out;
+
     rc = init_subkind_qdisk(cds);
-    if (rc)  goto out;
+    if (rc) {
+        cleanup_subkind_colo_nic(cds);
+        goto out;
+    }
 
     rc = 0;
 out:
@@ -179,6 +187,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
 
+    cleanup_subkind_colo_nic(cds);
     cleanup_subkind_qdisk(cds);
 }
 
@@ -293,6 +302,11 @@ void libxl__colo_restore_setup(libxl__egc *egc,
 
     crs->qdisk_setuped = false;
 
+    crs->cps.ao = ao;
+    rc = colo_proxy_setup(&crs->cps);
+    if (rc)
+        goto err_init_dss2;
+
     rc = 0;
 
 out:
@@ -398,6 +412,8 @@ static void colo_restore_teardown_done(libxl__egc *egc,
     if (crcs->teardown_devices)
         cleanup_device_subkind(cds);
 
+    colo_proxy_teardown(&crs->cps);
+
     rc = crcs->saved_rc;
     if (!rc) {
         crcs->callback = do_failover_done;
@@ -607,6 +623,8 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_preresume(&crs->cps);
+
     colo_restore_resume_vm(egc, crcs);
 
     return;
@@ -643,6 +661,8 @@ static void colo_resume_vm_done(libxl__egc *egc,
 
     crcs->status = LIBXL_COLO_RESUMED;
 
+    colo_proxy_postresume(&crs->cps);
+
     /* avoid calling libxl__xc_domain_restore_done() more than once */
     if (crs->saved_cb) {
         dcs->callback = crs->saved_cb;
@@ -792,8 +812,8 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
     STATE_AO_GC(crs->ao);
 
-    /* TODO: nic support */
-    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VIF) |
+                             (1 << LIBXL__DEVICE_KIND_VBD);
     cds->callback = colo_restore_setup_cds_done;
     cds->ao = ao;
     cds->domid = crs->domid;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a64efdc..bd3c9e3 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3213,6 +3213,9 @@ struct libxl__colo_restore_state {
 
     /* private, used by qdisk block replication */
     bool qdisk_setuped;
+
+    /* private, used by colo proxy */
+    libxl__colo_proxy_state cps;
 };
 
 struct libxl__domain_create_state {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH COLO v5 29/29] cmdline switches and config vars to control colo-proxy
  2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (27 preceding siblings ...)
  2015-04-01  6:58 ` [RFC PATCH COLO v5 28/29] setup and control colo proxy on secondary side Yang Hongyang
@ 2015-04-01  6:58 ` Yang Hongyang
  28 siblings, 0 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:58 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5      |  6 ++++++
 docs/man/xl.pod.1           |  1 -
 tools/libxl/libxl.c         | 12 +++++++++++
 tools/libxl/libxl_create.c  | 14 +++++++++++--
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl.c            |  3 +++
 tools/libxl/xl.h            |  1 +
 tools/libxl/xl_cmdimpl.c    | 51 ++++++++++++++++++++++++++++++++++-----------
 8 files changed, 74 insertions(+), 15 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 8ae19bb..8f7fd28 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -111,6 +111,12 @@ Configures the default script used by Remus to setup network buffering.
 
 Default: C</etc/xen/scripts/remus-netbuf-setup>
 
+=item B<colo.default.proxyscript="PATH">
+
+Configures the default script used by COLO to setup colo-proxy.
+
+Default: C</etc/xen/scripts/colo-proxy-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 431ef5e..47d58da 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -447,7 +447,6 @@ N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
      COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network at the moment.
 
 B<OPTIONS>
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 08d68df..f4079ee 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3398,6 +3398,16 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid,
         flexarray_append(back, nic->ifname);
     }
 
+    if (nic->forwarddev) {
+        flexarray_append(back, "forwarddev");
+        flexarray_append(back, nic->forwarddev);
+    }
+
+    if (nic->forwardbr) {
+        flexarray_append(back, "forwardbr");
+        flexarray_append(back, nic->forwardbr);
+    }
+
     flexarray_append(back, "mac");
     flexarray_append(back,libxl__sprintf(gc,
                                     LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nic->mac)));
@@ -3521,6 +3531,8 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc,
     nic->ip = READ_BACKEND(NOGC, "ip");
     nic->bridge = READ_BACKEND(NOGC, "bridge");
     nic->script = READ_BACKEND(NOGC, "script");
+    nic->forwarddev = READ_BACKEND(NOGC, "forwarddev");
+    nic->forwardbr = READ_BACKEND(NOGC, "forwardbr");
 
     /* vif_ioemu nics use the same xenstore entries as vif interfaces */
     tmp = READ_BACKEND(gc, "type");
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1fae0a4..b1e9372 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1132,6 +1132,11 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         crs->superpages = superpages;
         crs->pae = pae;
         crs->callback = libxl__colo_restore_setup_done;
+        if (dcs->colo_proxy_script)
+            crs->colo_proxy_script = libxl__strdup(gc, dcs->colo_proxy_script);
+        else
+            crs->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                               libxl__xen_script_dir_path());
         libxl__colo_restore_setup(egc, crs);
     } else
         libxl__xc_domain_restore(egc, dcs,
@@ -1628,6 +1633,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid, int restore_fd,
                             int send_fd, int checkpointed_stream,
+                            const char *colo_proxy_script,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1643,6 +1649,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     cdcs->dcs.send_fd = send_fd;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
+    cdcs->dcs.colo_proxy_script = colo_proxy_script;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
     cdcs->domid_out = domid;
 
@@ -1686,7 +1693,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, -1, 0, NULL,
                             ao_how, aop_console_how);
 }
 
@@ -1697,16 +1704,19 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
     int send_fd = -1;
+    char *colo_proxy_script = NULL;
 
     if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
         send_fd = params->send_fd;
+        colo_proxy_script = params->colo_proxy_script;
         set_disk_colo_restore(d_config);
     } else {
         unset_disk_colo_restore(d_config);
     }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
-                            params->checkpointed_stream, ao_how, aop_console_how);
+                            params->checkpointed_stream, colo_proxy_script,
+                            ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4013cd3..c69d046 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -351,6 +351,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
     ("send_fd", integer),
+    ("colo_proxy_script", string),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index f014306..f44f04f 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -45,6 +45,7 @@ char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
 char *default_remus_netbufscript = NULL;
+char *default_colo_proxy_script = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -179,6 +180,8 @@ static void parse_global_config(const char *configfile,
 
     xlu_cfg_replace_string (config, "remus.default.netbufscript",
         &default_remus_netbufscript, 0);
+    xlu_cfg_replace_string (config, "colo.default.proxyscript",
+        &default_colo_proxy_script, 0);
 
     xlu_cfg_destroy(config);
 }
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 5bc138c..33f25d1 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -178,6 +178,7 @@ extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
 extern char *default_remus_netbufscript;
+extern char *default_colo_proxy_script;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 6c5b792..fc4f64d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -153,6 +153,7 @@ struct domain_create {
     const char *config_file;
     const char *extra_config; /* extra config string */
     const char *restore_file;
+    char *colo_proxy_script;
     int migrate_fd; /* -1 means none */
     int send_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
@@ -982,6 +983,10 @@ static int parse_nic_config(libxl_device_nic *nic, XLU_Config **config, char *to
         replace_string(&nic->model, oparg);
     } else if (MATCH_OPTION("rate", token, oparg)) {
         parse_vif_rate(config, oparg, nic);
+    } else if (MATCH_OPTION("forwarddev", token, oparg)) {
+        replace_string(&nic->forwarddev, oparg);
+    } else if (MATCH_OPTION("forwardbr", token, oparg)) {
+        replace_string(&nic->forwardbr, oparg);
     } else if (MATCH_OPTION("accel", token, oparg)) {
         fprintf(stderr, "the accel parameter for vifs is currently not supported\n");
     } else {
@@ -2704,6 +2709,7 @@ start:
 
         params.checkpointed_stream = dom_info->checkpointed_stream;
         params.send_fd = send_fd;
+        params.colo_proxy_script = dom_info->colo_proxy_script;
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
                                           &params,
@@ -4223,7 +4229,8 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 }
 
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int send_fd, int recv_fd, int remus)
+                            int send_fd, int recv_fd, int remus,
+                            char *colo_proxy_script)
 {
     uint32_t domid;
     int rc, rc2;
@@ -4250,6 +4257,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = remus;
+    dom_info.colo_proxy_script = colo_proxy_script;
     if (remus == LIBXL_CHECKPOINTED_STREAM_COLO)
         /* COLO uses stdout to send control message to master */
         dom_info.quiet = 1;
@@ -4444,8 +4452,9 @@ int main_migrate_receive(int argc, char **argv)
 {
     int debug = 0, daemonize = 1, monitor = 1, remus = 0;
     int opt;
+    char *script = NULL;
 
-    SWITCH_FOREACH_OPT(opt, "Fedrc", NULL, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrcn:", NULL, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -4461,6 +4470,8 @@ int main_migrate_receive(int argc, char **argv)
         break;
     case 'c':
         remus = LIBXL_CHECKPOINTED_STREAM_COLO;
+    case 'n':
+        script = optarg;
     }
 
     if (argc-optind != 0) {
@@ -4469,7 +4480,7 @@ int main_migrate_receive(int argc, char **argv)
     }
     migrate_receive(debug, daemonize, monitor,
                     STDOUT_FILENO, STDIN_FILENO,
-                    remus);
+                    remus, script);
 
     return 0;
 }
@@ -7952,8 +7963,10 @@ int main_remus(int argc, char **argv)
         if (!interval)
             r_info.interval = 0;
 
-        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
-            perror("option -c is conflict with -i or -b");
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole) ||
+            !libxl_defbool_is_default(r_info.netbuf) ||
+            !libxl_defbool_is_default(r_info.diskbuf)) {
+            perror("option -c is conflict with -i, -d, -n or -b");
             exit(-1);
         }
 
@@ -7963,8 +7976,12 @@ int main_remus(int argc, char **argv)
         }
     }
 
-    if (!r_info.netbufscript)
-        r_info.netbufscript = default_remus_netbufscript;
+    if (!r_info.netbufscript) {
+        if (libxl_defbool_val(r_info.colo))
+            r_info.netbufscript = default_colo_proxy_script;
+        else
+            r_info.netbufscript = default_remus_netbufscript;
+    }
 
     if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
@@ -7977,11 +7994,21 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
-                         ssh_command, host,
-                         libxl_defbool_val(r_info.colo) ? "-c" : "-r",
-                         daemonize ? "" : " -e") < 0)
-                return 1;
+            if (!libxl_defbool_val(r_info.colo)) {
+                if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
+                             ssh_command, host,
+                             "-r",
+                             daemonize ? "" : " -e") < 0)
+                    return 1;
+            } else {
+                if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s %s %s",
+                             ssh_command, host,
+                             "-c",
+                             r_info.netbufscript ? "-n" : "",
+                             r_info.netbufscript ? r_info.netbufscript : "",
+                             daemonize ? "" : " -e") < 0)
+                    return 1;
+            }
         }
 
         save_domain_core_begin(domid, NULL, &config_data, &config_len);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 01/29] Add readme
  2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
@ 2015-04-08 18:11   ` Wei Liu
  2015-04-14  4:06     ` Hongyang Yang
  0 siblings, 1 reply; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:11 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:37PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  docs/README.colo | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 92 insertions(+)
>  create mode 100644 docs/README.colo
> 
> diff --git a/docs/README.colo b/docs/README.colo
> new file mode 100644
> index 0000000..60f487d
> --- /dev/null
> +++ b/docs/README.colo
> @@ -0,0 +1,92 @@
> +COLO provides fault tolerance for virtual machines by sending continuous
> +checkpoints to a backup, which will activate if the target VM fails. It
> +only supports HVM guest(without pv extensions).
                                   ^ PV

> +
> +Requriements:
> +1. Hardware requriements
> +   There is at least one directly connected nic to forward the nic from client
> +   to secondary vm. The directly connected nic must not be used by any other
> +   purpose. If your guest has more than one nic, you should have directly
> +   connected nic for each guest nic. If you don't have enouth directly connected
> +   nic, you can use vlan.
> +2. Dom0 requirements
> +   - Support dom0
> +   - kernel module:
> +        sch_ingress
> +        cls_basic
> +        cls_tcindex
> +        cls_u32
> +        act_mirred
> +   - libnl-tools >= 3.0. This package provides the command nl-qdisc-list, and
> +     colo need this command.
        ^ COLO

> +   - If your host os has OEM-released xen tools, please uninstall it first.
                     ^OS                 ^ Xen

(and please fix other occurrences of wrong capitalisations as well)

This is a very broad statement and it is not very helpful from both
developers and users' point of view. Can you elaborate on what
functionalities that COLO needs to have exclusive access to?

> +   - You can load the module which is not provided by OEM.

What does this mean?

> +3. Guest requirements
> +   Only HVM guest(without pv extensions) is supported now. If you want to
> +   use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not
> +   supported now because I don't find any way to disable pv extensions.
> +   If you want to use REDHAT or Ubuntu, you need to build the newest
> +   kernel which has the parameter xen_nopv.
> +

FWIW, does "xen_platform_pci=0" in your xl.cfg work for RH and Ubuntu
guests?

> +Network link topology
> +   Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology
> +
> +The steps to setup COLO environment:
> +You need to recompile your host kernel because colo-proxy module need cooperate
> +with linux kernel.
> +Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare
> +1. Build and install xen
> +2. Apply the patch for qemu xen, and rebuild xen tools:
> +    - cd tools/qemu-xen-dir
> +    - use git am to apply the patch:
> +      https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch
> +    - make tools && make install-tools
> +    Note: You must use qemu-xen. qemu-xen-traditional is not supported.

Note that you will eventually need to upstream your changes to QEMU.

> +3. Install COLO proxy module:
> +    3.1 Download COLO proxy, compile and install it:
> +        https://github.com/gao-feng/colo-proxy.git
> +    3.2 Download iptables patch, it is based on v1.4.21 compile and install it:
> +        https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch
> +4. Install the guest
> +    4.1 Add "xen_platform_pci=0" into the guest configfile
> +    4.2 If you use suse, please select physical machine
> +    4.3 copy the disk image to the secondary host
> +5. Update your guest config file for COLO:
> +    5.1 disk
> +        disk = [
> +        'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img' ]

It's unclear which parts are updated compared to the original config,
i.e. can you list the additional bits to enable COLO? Presumably it's
only those options starting with "colo"?

> +    5.2 nic
> +        vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth0, forwardbr=br1' ]
> +    Note:
> +    a. The ip/port in colo-params is the secondary host's IP. Don't use the
> +       directly connected nic's IP.
> +    b. forwarddev is the directly connected nic.
> +    c. If you have more than one disk, colo-params's host/port must be the same
> +       and colo-param's exportname must be different.
> +6. Run COLO:
> +    xl remus -c -u <domname> <secondary host IP>
> +    Note: The ip must not be the directly connected nic's IP.
> +Note:
> +Secondary host only need to do step 1-3.
> +
> +The known problem:
> +1. Secondary vm may crash due to triple fault.
> +2. The heartbeat is not reliable. If you want to test the performance,
> +   please disable the heartbeat(modify the xen codes). You can use the
> +   branch colo-v4-noheartbeat.
> +3. Suspending the vm fails, and the error message is:
> +    libxl: error: libxl_qmp.c:429:qmp_next: timeout
> +
> +Problem 1 and 3 don't happen every time. So you can run colo again to
> +avoid this problem.
> +
> +Virtio-Net:
> +1. If you want to get better performance, you can use virtio-net.
> +
> +Trouble shooting:
> +If there's some error happend when staritng COLO, you can do:
> +1. Make sure you have all necessary modules that DOM0 needed on both side.
> +2. Make sure you have followed all the instructions in this README.
> +3. Try to reboot both primary and secondary host.
> +4. If you still have problems, collect the error logs and contact
> +   Wen Congyang(wency@cn.fujitsu.com)/Yang Hongyang(yanghy@cn.fujitsu.com).

After reading this whole document I think it should be a wiki page
instead of an in-tree README.

Wei.

> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common()
  2015-04-01  6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
@ 2015-04-08 18:11   ` Wei Liu
  2015-04-14  5:56     ` Wen Congyang
  2015-04-22 14:45   ` Ian Campbell
  1 sibling, 1 reply; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:11 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, eddie.dong, yunhong.jiang,
	ian.jackson, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:38PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> libxl__domain_suspend() is to save the guest. I think
> we should call it libxl__domain_save(), but I don't
> rename it.
> 

FWIW this is not public API so we have certain degree of liberty to
rename it if the new name is deemed more appropriate.

> Secondary vm is running in colo mode. So we will do
> the following things again and again:
> 1. suspend both primay vm and secondary vm
> 2. sync the state
> 3. resume both primary vm and secondary vm
> To suspend secondary vm, we need an independent API to
> suspend vm.
> 
[...]
>  
> +/*
> + * libxl__domain_suspend_state is for saving guest, not
> + * for suspending guest. We need to an independent API
> + * to suspend guest only.
> + */
> +struct libxl__domain_suspend_state2 {
> +    /* set by caller of libxl__domain_suspend2 */
> +    libxl__ao *ao;
> +
> +    uint32_t domid;
> +    libxl__ev_evtchn guest_evtchn;;
> +    int guest_evtchn_lockfd;
> +    int hvm;
> +    const char *dm_savefile;
> +    void (*callback_common_done)(libxl__egc*,
> +                                 libxl__domain_suspend_state2*, int ok);
> +    int save_dm;
> +    int guest_responded;
> +    libxl__xswait_state pvcontrol;
> +    libxl__ev_xswatch guest_watch;
> +    libxl__ev_time guest_timeout;
> +};
> +
>  struct libxl__domain_suspend_state {
>      /* set by caller of libxl__domain_suspend */
>      libxl__ao *ao;
> @@ -2827,22 +2851,14 @@ struct libxl__domain_suspend_state {
>      int debug;
>      const libxl_domain_remus_info *remus;
>      /* private */
> -    libxl__ev_evtchn guest_evtchn;
> -    int guest_evtchn_lockfd;
> +    libxl__domain_suspend_state2 dss2;
>      int hvm;
>      int xcflags;
> -    int guest_responded;
> -    libxl__xswait_state pvcontrol;
> -    libxl__ev_xswatch guest_watch;
> -    libxl__ev_time guest_timeout;
> -    const char *dm_savefile;
>      libxl__remus_devices_state rds;
>      libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
>      int interval; /* checkpoint interval (for Remus) */
>      libxl__save_helper_state shs;
>      libxl__logdirty_switch logdirty;
> -    void (*callback_common_done)(libxl__egc*,
> -                                 struct libxl__domain_suspend_state*, int ok);
>      /* private for libxl__domain_save_device_model */
>      libxl__save_device_model_cb *save_dm_callback;
>      libxl__datacopier_state save_dm_datacopier;

You moved dm_savefile to new struct but not these two fields, why?
Since your suspend2 function is just a wrapper around
domain_suspend_callback_common, it doesn't seem to need access to
dm_savefile.

Wei.

> @@ -3116,6 +3132,9 @@ struct libxl__domain_create_state {
>  
>  /*----- Domain suspend (save) functions -----*/
>  
> +/* calls dss2->callback_common_done when done */
> +_hidden void libxl__domain_suspend2(libxl__egc *egc,
> +                                    libxl__domain_suspend_state2 *dss2);
>  /* calls dss->callback when done */
>  _hidden void libxl__domain_suspend(libxl__egc *egc,
>                                     libxl__domain_suspend_state *dss);
> @@ -3155,7 +3174,7 @@ _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>  
>  /* Each time the dm needs to be saved, we must call suspend and then save */
>  _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
> -                                           libxl__domain_suspend_state *dss);
> +                                           libxl__domain_suspend_state2 *dss2);
>  _hidden void libxl__domain_save_device_model(libxl__egc *egc,
>                                       libxl__domain_suspend_state *dss,
>                                       libxl__save_device_model_cb *callback);
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state
  2015-04-01  6:41 ` [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state Yang Hongyang
@ 2015-04-08 18:11   ` Wei Liu
  2015-04-15 13:19     ` Ian Jackson
  0 siblings, 1 reply; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:11 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:39PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Secondary vm is running in colo mode. So we will do
> the following things again and again:
> 1. suspend both primay vm and secondary vm
> 2. sync the state
> 3. resume both primary vm and secondary vm
> We will send qemu's state each time in step2, and
> slave's qemu should read it each time before resuming
> secondary vm. Introduce a new API libxl__domain_restore()
> to do it. This API should be called before resuming
> secondary vm.
> 
> Note: we should update qemu to support it.
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl.c          | 18 ++++++++++++++++++
>  tools/libxl/libxl_dom.c      | 26 ++++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h |  4 ++++
>  tools/libxl/libxl_qmp.c      | 10 ++++++++++
>  4 files changed, 58 insertions(+)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 2a735b3..6e55afc 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -510,6 +510,24 @@ int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid,
>      return rc;
>  }
>  
> +int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
> +{
> +    int rc = 0;
> +
> +    libxl_domain_type type = libxl__domain_type(gc, domid);
> +    if (type != LIBXL_DOMAIN_TYPE_HVM) {
> +        rc = ERROR_FAIL;

ERROR_INVAL?

> +        goto out;
> +    }
> +
> +    rc = libxl__domain_restore_device_model(gc, domid);
> +    if (rc)
> +        LOG(ERROR, "failed to restore device mode for domain %u:%d",
> +            domid, rc);
> +out:
> +    return rc;
> +}
> +
>  int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
>  {
>      int rc = 0;
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index d286851..fd0c5c2 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1343,6 +1343,32 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
>      return ret;
>  }
>  
> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
> +{
> +    char *state_file;
> +    int rc;
> +
> +    switch (libxl__device_model_version_running(gc, domid)) {
> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +        /* not supported now */
> +        return ERROR_FAIL;

           rc = ERROR_FAIL;
	   break;


Please return rc at the end of this function.

> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +        /*
> +         * This function may be called too many times for the same gc,
> +         * so we use NOGC, and free the memory before return to avoid
> +         * OOM.
> +         */

Hmm...

> +        state_file = libxl__sprintf(NOGC,
> +                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
> +                                    domid);
> +        rc = libxl__qmp_restore(gc, domid, state_file);
> +        free(state_file);
> +        return rc;
> +    default:
> +        return ERROR_INVAL;
> +    }
> +}
> +
>  int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
>  {
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index c1ea498..3b4e6c4 100644
[...]
> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
> +{
> +    libxl__json_object *args = NULL;
> +
> +    qmp_parameters_add_string(gc, &args, "filename", (char *)state_file);

No need to cast state_file to char *.

Wei.

> +
> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
> +                           NULL, NULL);
> +}
> +
>  static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
>                        char *device, char *target, char *arg)
>  {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
  2015-04-01  6:41 ` [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Yang Hongyang
@ 2015-04-08 18:12   ` Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:12 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, eddie.dong, yunhong.jiang,
	ian.jackson, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:40PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Secondary vm is running in colo mode. So we need to
> send secondary vm's dirty page information to master.
> libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
> qemu logdirty. But it uses domain_suspend_state, and calls
> libxl__xc_domain_saverestore_async_callback_done()
> before exits.
> 
> Introduce a new API libxl__domain_common_switch_qemu_logdirty().
> This API only uses libxl__logdirty_switch, and calls
> lds->callback before exits.
> 

I have some dumb questions. What is the purpose of this refactoring? How
are the refactoring and new API related to your intended
functionalities? You mention you need to send back secondary vm state to
master, where is it done / will it be done?

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
[...]
>  
>  /*----- callbacks, called by xc_domain_save -----*/
> @@ -2001,6 +2019,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>      libxl__domain_suspend_state2 *dss2 = &dss->dss2;
>  
>      logdirty_init(&dss->logdirty);
> +    dss->logdirty.ao = ao;
>      libxl__xswait_init(&dss2->pvcontrol);
>      libxl__ev_evtchn_init(&dss2->guest_evtchn);
>      libxl__ev_xswatch_init(&dss2->guest_watch);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 3b4e6c4..6470866 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2812,13 +2812,18 @@ typedef void libxl__domain_suspend_cb(libxl__egc*,
>  typedef void libxl__save_device_model_cb(libxl__egc*,
>                                           libxl__domain_suspend_state*, int rc);
>  
> -typedef struct libxl__logdirty_switch {
> +typedef struct libxl__logdirty_switch libxl__logdirty_switch;

This change looks unnecessary.

Wei.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause()
  2015-04-01  6:41 ` [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause() Yang Hongyang
@ 2015-04-08 18:12   ` Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:12 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:41PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> The guest is paused after libxl_domain_create_restore().
> Secondary vm is running in colo mode. So we need to unpause
> the guest. The current API libxl_domain_unpause() is
> not an internal API. Introduce a new API to support it.
> 

It would be good if you can say "No functional change" in commit log in
the future.

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl.c          | 21 +++++++++++++++------
>  tools/libxl/libxl_internal.h |  1 +
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 6e55afc..c3898ce 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -1037,9 +1037,8 @@ out:
>      return AO_INPROGRESS;
>  }
>  
> -int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> +int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>  {
> -    GC_INIT(ctx);
>      char *path;
>      char *state;
>      int ret, rc = 0;
> @@ -1059,12 +1058,22 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>                                           NULL, NULL, NULL);
>          }
>      }
> -    ret = xc_domain_unpause(ctx->xch, domid);
> -    if (ret<0) {
> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
> +
> +    ret = xc_domain_unpause(CTX->xch, domid);
> +    if (ret < 0) {
> +        LOGE(ERROR, "unpausing domain %d", domid);
>          rc = ERROR_FAIL;
>      }
> - out:
> +
> +out:

Nit, this change is not strictly necessary.

Wei.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen
  2015-04-01  6:41 ` [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
@ 2015-04-08 18:12   ` Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:12 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, eddie.dong, yunhong.jiang,
	ian.jackson, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:42PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Currently, libxl__domain_unpause() only supports
> qemu-xen-traditional. Update it to support qemu-xen.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl.c          | 13 +++++--------
>  tools/libxl/libxl_dom.c      | 25 +++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h |  2 ++
>  3 files changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index c3898ce..58629ed 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -1039,8 +1039,6 @@ out:
>  
>  int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>  {
> -    char *path;
> -    char *state;
>      int ret, rc = 0;
>  
>      libxl_domain_type type = libxl__domain_type(gc, domid);
> @@ -1050,12 +1048,11 @@ int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>      }
>  
>      if (type == LIBXL_DOMAIN_TYPE_HVM) {
> -        path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
> -        state = libxl__xs_read(gc, XBT_NULL, path);
> -        if (state != NULL && !strcmp(state, "paused")) {
> -            libxl__qemu_traditional_cmd(gc, domid, "continue");
> -            libxl__wait_for_device_model_deprecated(gc, domid, "running",
> -                                         NULL, NULL, NULL);
> +        rc = libxl__domain_unpause_device_model(gc, domid);
> +        if (rc < 0) {
> +            LOG(ERROR, "failed to unpause device model for domain %u:%d",
> +                domid, rc);
> +            goto out;

You need to rebase. This hunk is changed in staging.

>          }
>      }
>  
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index eb4ed94..a3fce46 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -2272,6 +2272,31 @@ static void remus_teardown_done(libxl__egc *egc,
>      dss->callback(egc, dss, rc);
>  }
>  
> +int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
> +{
> +    char *path;
> +    char *state;
> +
> +    switch (libxl__device_model_version_running(gc, domid)) {
> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +        path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
> +        state = libxl__xs_read(gc, XBT_NULL, path);
> +        if (state != NULL && !strcmp(state, "paused")) {
> +            libxl__qemu_traditional_cmd(gc, domid, "continue");
> +            libxl__wait_for_device_model_deprecated(gc, domid, "running",
> +                                         NULL, NULL, NULL);
> +        }

This needs to use the snippet in upstream. That snippet copes with
device model running inside a stubdom.

> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +        if (libxl__qmp_resume(gc, domid))
> +            return ERROR_FAIL;
> +        break;

Could you add a check here to make sure device model is not running
inside a stubdom?

Wei.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests
  2015-04-01  6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
@ 2015-04-08 18:12   ` Wei Liu
  2015-04-23 12:09     ` Wen Congyang
  2015-04-22 14:54   ` Ian Campbell
  1 sibling, 1 reply; 47+ messages in thread
From: Wei Liu @ 2015-04-08 18:12 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, eddie.dong, yunhong.jiang,
	ian.jackson, xen-devel, rshriram

On Wed, Apr 01, 2015 at 02:41:43PM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> For PVHVM, the hypercall return code is 0, and it can be resumed
> in a new domain context.
> 
> For HVM, do nothing.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxc/xc_resume.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index e67bebd..b862ce3 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -109,6 +109,21 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>      return do_domctl(xch, &domctl);
>  }
>  
> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
> +{
> +    DECLARE_DOMCTL;
> +
> +    /*
> +     * If it is PVHVM, the hypercall return code is 0, and resume
> +     * it in a new domain context.
> +     *
> +     * If it is a HVM, do nothing.
> +     */
> +    domctl.cmd = XEN_DOMCTL_resumedomain;
> +    domctl.domain = domid;
> +    return do_domctl(xch, &domctl);
> +}
> +
>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>  {
>      DECLARE_DOMCTL;
> @@ -138,10 +153,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>       */
>  #if defined(__i386__) || defined(__x86_64__)
>      if ( info.hvm )
> -    {
> -        ERROR("Cannot resume uncooperative HVM guests");
> -        return rc;
> -    }
> +        return xc_domain_resume_hvm(xch, domid);
>  

You're going to skip the rest of the function. I don't think this is
right.

Wei.

>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>      {
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 01/29] Add readme
  2015-04-08 18:11   ` Wei Liu
@ 2015-04-14  4:06     ` Hongyang Yang
  0 siblings, 0 replies; 47+ messages in thread
From: Hongyang Yang @ 2015-04-14  4:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram

Wei,

   Thanks for the review and sorry for the late reply, was debugging some triple
fault bug. However, it is now fixed, and COLO running much more stable now.

   For the readme, it is kind of outdated, we will update it and also address
your comments, then we'll put it onto wiki page. The intree readme will be some
simple desciption and links to wiki pages.

On 04/09/2015 02:11 AM, Wei Liu wrote:
> On Wed, Apr 01, 2015 at 02:41:37PM +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   docs/README.colo | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 92 insertions(+)
>>   create mode 100644 docs/README.colo
>>
>> diff --git a/docs/README.colo b/docs/README.colo
>> new file mode 100644
>> index 0000000..60f487d
>> --- /dev/null
>> +++ b/docs/README.colo
>> @@ -0,0 +1,92 @@
>> +COLO provides fault tolerance for virtual machines by sending continuous
>> +checkpoints to a backup, which will activate if the target VM fails. It
>> +only supports HVM guest(without pv extensions).
>                                     ^ PV
>
>> +
>> +Requriements:
>> +1. Hardware requriements
>> +   There is at least one directly connected nic to forward the nic from client
>> +   to secondary vm. The directly connected nic must not be used by any other
>> +   purpose. If your guest has more than one nic, you should have directly
>> +   connected nic for each guest nic. If you don't have enouth directly connected
>> +   nic, you can use vlan.
>> +2. Dom0 requirements
>> +   - Support dom0
>> +   - kernel module:
>> +        sch_ingress
>> +        cls_basic
>> +        cls_tcindex
>> +        cls_u32
>> +        act_mirred
>> +   - libnl-tools >= 3.0. This package provides the command nl-qdisc-list, and
>> +     colo need this command.
>          ^ COLO
>
>> +   - If your host os has OEM-released xen tools, please uninstall it first.
>                       ^OS                 ^ Xen
>
> (and please fix other occurrences of wrong capitalisations as well)

OK

>
> This is a very broad statement and it is not very helpful from both
> developers and users' point of view. Can you elaborate on what
> functionalities that COLO needs to have exclusive access to?
>
>> +   - You can load the module which is not provided by OEM.
>
> What does this mean?

Means you may need to compile the module yourself, will make it clear.

>
>> +3. Guest requirements
>> +   Only HVM guest(without pv extensions) is supported now. If you want to
>> +   use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not
>> +   supported now because I don't find any way to disable pv extensions.
>> +   If you want to use REDHAT or Ubuntu, you need to build the newest
>> +   kernel which has the parameter xen_nopv.
>> +
>
> FWIW, does "xen_platform_pci=0" in your xl.cfg work for RH and Ubuntu
> guests?

It works only if you compile the newer kernel by yourself which support this
option.

>
>> +Network link topology
>> +   Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology
>> +
>> +The steps to setup COLO environment:
>> +You need to recompile your host kernel because colo-proxy module need cooperate
>> +with linux kernel.
>> +Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare
>> +1. Build and install xen
>> +2. Apply the patch for qemu xen, and rebuild xen tools:
>> +    - cd tools/qemu-xen-dir
>> +    - use git am to apply the patch:
>> +      https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch
>> +    - make tools && make install-tools
>> +    Note: You must use qemu-xen. qemu-xen-traditional is not supported.
>
> Note that you will eventually need to upstream your changes to QEMU.

Sure, we have already posted the block patches to QEMU. It is under review now.
http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00399.html

>
>> +3. Install COLO proxy module:
>> +    3.1 Download COLO proxy, compile and install it:
>> +        https://github.com/gao-feng/colo-proxy.git
>> +    3.2 Download iptables patch, it is based on v1.4.21 compile and install it:
>> +        https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch
>> +4. Install the guest
>> +    4.1 Add "xen_platform_pci=0" into the guest configfile
>> +    4.2 If you use suse, please select physical machine
>> +    4.3 copy the disk image to the secondary host
>> +5. Update your guest config file for COLO:
>> +    5.1 disk
>> +        disk = [
>> +        'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img' ]
>
> It's unclear which parts are updated compared to the original config,
> i.e. can you list the additional bits to enable COLO? Presumably it's
> only those options starting with "colo"?

mostly, will update the doc to make it clear.

>
>> +    5.2 nic
>> +        vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth0, forwardbr=br1' ]
>> +    Note:
>> +    a. The ip/port in colo-params is the secondary host's IP. Don't use the
>> +       directly connected nic's IP.
>> +    b. forwarddev is the directly connected nic.
>> +    c. If you have more than one disk, colo-params's host/port must be the same
>> +       and colo-param's exportname must be different.
>> +6. Run COLO:
>> +    xl remus -c -u <domname> <secondary host IP>
>> +    Note: The ip must not be the directly connected nic's IP.
>> +Note:
>> +Secondary host only need to do step 1-3.
>> +
>> +The known problem:
>> +1. Secondary vm may crash due to triple fault.
>> +2. The heartbeat is not reliable. If you want to test the performance,
>> +   please disable the heartbeat(modify the xen codes). You can use the
>> +   branch colo-v4-noheartbeat.
>> +3. Suspending the vm fails, and the error message is:
>> +    libxl: error: libxl_qmp.c:429:qmp_next: timeout
>> +
>> +Problem 1 and 3 don't happen every time. So you can run colo again to
>> +avoid this problem.
>> +
>> +Virtio-Net:
>> +1. If you want to get better performance, you can use virtio-net.
>> +
>> +Trouble shooting:
>> +If there's some error happend when staritng COLO, you can do:
>> +1. Make sure you have all necessary modules that DOM0 needed on both side.
>> +2. Make sure you have followed all the instructions in this README.
>> +3. Try to reboot both primary and secondary host.
>> +4. If you still have problems, collect the error logs and contact
>> +   Wen Congyang(wency@cn.fujitsu.com)/Yang Hongyang(yanghy@cn.fujitsu.com).
>
> After reading this whole document I think it should be a wiki page
> instead of an in-tree README.

Agreed.

>
> Wei.
>
>> --
>> 1.9.1
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common()
  2015-04-08 18:11   ` Wei Liu
@ 2015-04-14  5:56     ` Wen Congyang
  0 siblings, 0 replies; 47+ messages in thread
From: Wen Congyang @ 2015-04-14  5:56 UTC (permalink / raw)
  To: Wei Liu, Yang Hongyang
  Cc: ian.campbell, ian.jackson, yunhong.jiang, eddie.dong, xen-devel,
	rshriram

On 04/09/2015 02:11 AM, Wei Liu wrote:
> On Wed, Apr 01, 2015 at 02:41:38PM +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> libxl__domain_suspend() is to save the guest. I think
>> we should call it libxl__domain_save(), but I don't
>> rename it.
>>
> 
> FWIW this is not public API so we have certain degree of liberty to
> rename it if the new name is deemed more appropriate.

OK

> 
>> Secondary vm is running in colo mode. So we will do
>> the following things again and again:
>> 1. suspend both primay vm and secondary vm
>> 2. sync the state
>> 3. resume both primary vm and secondary vm
>> To suspend secondary vm, we need an independent API to
>> suspend vm.
>>
> [...]
>>  
>> +/*
>> + * libxl__domain_suspend_state is for saving guest, not
>> + * for suspending guest. We need to an independent API
>> + * to suspend guest only.
>> + */
>> +struct libxl__domain_suspend_state2 {
>> +    /* set by caller of libxl__domain_suspend2 */
>> +    libxl__ao *ao;
>> +
>> +    uint32_t domid;
>> +    libxl__ev_evtchn guest_evtchn;;
>> +    int guest_evtchn_lockfd;
>> +    int hvm;
>> +    const char *dm_savefile;
>> +    void (*callback_common_done)(libxl__egc*,
>> +                                 libxl__domain_suspend_state2*, int ok);
>> +    int save_dm;
>> +    int guest_responded;
>> +    libxl__xswait_state pvcontrol;
>> +    libxl__ev_xswatch guest_watch;
>> +    libxl__ev_time guest_timeout;
>> +};
>> +
>>  struct libxl__domain_suspend_state {
>>      /* set by caller of libxl__domain_suspend */
>>      libxl__ao *ao;
>> @@ -2827,22 +2851,14 @@ struct libxl__domain_suspend_state {
>>      int debug;
>>      const libxl_domain_remus_info *remus;
>>      /* private */
>> -    libxl__ev_evtchn guest_evtchn;
>> -    int guest_evtchn_lockfd;
>> +    libxl__domain_suspend_state2 dss2;
>>      int hvm;
>>      int xcflags;
>> -    int guest_responded;
>> -    libxl__xswait_state pvcontrol;
>> -    libxl__ev_xswatch guest_watch;
>> -    libxl__ev_time guest_timeout;
>> -    const char *dm_savefile;
>>      libxl__remus_devices_state rds;
>>      libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
>>      int interval; /* checkpoint interval (for Remus) */
>>      libxl__save_helper_state shs;
>>      libxl__logdirty_switch logdirty;
>> -    void (*callback_common_done)(libxl__egc*,
>> -                                 struct libxl__domain_suspend_state*, int ok);
>>      /* private for libxl__domain_save_device_model */
>>      libxl__save_device_model_cb *save_dm_callback;
>>      libxl__datacopier_state save_dm_datacopier;
> 
> You moved dm_savefile to new struct but not these two fields, why?
> Since your suspend2 function is just a wrapper around
> domain_suspend_callback_common, it doesn't seem to need access to
> dm_savefile.

Hmm, there are two operations:
1. save qemu state to dm_savefile
2. transfer qemu state to dm_savefile

The function libxl__domain_suspend_device_model() does operation 1, and
the function libxl__domain_save_device_model() does operation 2.

The function libxl__domain_suspend_device_model() is called when we suspend
the guest, and it only uses the filed dm_savefile.

I think we should split it out: save qemu state to dm_savefile if the caller
needs to do it.

Thanks
Wen Congyang

> 
> Wei.
> 
>> @@ -3116,6 +3132,9 @@ struct libxl__domain_create_state {
>>  
>>  /*----- Domain suspend (save) functions -----*/
>>  
>> +/* calls dss2->callback_common_done when done */
>> +_hidden void libxl__domain_suspend2(libxl__egc *egc,
>> +                                    libxl__domain_suspend_state2 *dss2);
>>  /* calls dss->callback when done */
>>  _hidden void libxl__domain_suspend(libxl__egc *egc,
>>                                     libxl__domain_suspend_state *dss);
>> @@ -3155,7 +3174,7 @@ _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>>  
>>  /* Each time the dm needs to be saved, we must call suspend and then save */
>>  _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
>> -                                           libxl__domain_suspend_state *dss);
>> +                                           libxl__domain_suspend_state2 *dss2);
>>  _hidden void libxl__domain_save_device_model(libxl__egc *egc,
>>                                       libxl__domain_suspend_state *dss,
>>                                       libxl__save_device_model_cb *callback);
>> -- 
>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> .
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state
  2015-04-08 18:11   ` Wei Liu
@ 2015-04-15 13:19     ` Ian Jackson
  0 siblings, 0 replies; 47+ messages in thread
From: Ian Jackson @ 2015-04-15 13:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, yunhong.jiang, eddie.dong, xen-devel,
	rshriram, Yang Hongyang

Wei Liu writes ("Re: [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state"):
> On Wed, Apr 01, 2015 at 02:41:39PM +0800, Yang Hongyang wrote:
> > +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> > +        /*
> > +         * This function may be called too many times for the same gc,
> > +         * so we use NOGC, and free the memory before return to avoid
> > +         * OOM.
> > +         */
> 
> Hmm...

I agree that this is doubtful.

The underlying problem is that in remus a migration send, as well as a
migration receive, are long-running.

I think more thought needs to be given to this area.  It may be
necessary to introduce a nested ao.

Ian.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common()
  2015-04-01  6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
  2015-04-08 18:11   ` Wei Liu
@ 2015-04-22 14:45   ` Ian Campbell
  1 sibling, 0 replies; 47+ messages in thread
From: Ian Campbell @ 2015-04-22 14:45 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram

On Wed, 2015-04-01 at 14:41 +0800, Yang Hongyang wrote:
> The core function to suspend vm is domain_suspend_callback_common().
> So use a new structure libxl__domain_suspend_state2 to
> instead of libxl__domain_suspend_state. The dss's members that
> will be used in domain_suspend_callback_common() are
> moved to dss2.

Please can we try and find a more semantically meaningful name than
appending a 2 to everything.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests
  2015-04-01  6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
  2015-04-08 18:12   ` Wei Liu
@ 2015-04-22 14:54   ` Ian Campbell
  2015-04-23 12:08     ` Wen Congyang
  1 sibling, 1 reply; 47+ messages in thread
From: Ian Campbell @ 2015-04-22 14:54 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, eddie.dong, yunhong.jiang, ian.jackson,
	xen-devel, rshriram

On Wed, 2015-04-01 at 14:41 +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> For PVHVM, the hypercall return code is 0, and it can be resumed
> in a new domain context.
> 
> For HVM, do nothing.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxc/xc_resume.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index e67bebd..b862ce3 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -109,6 +109,21 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>      return do_domctl(xch, &domctl);
>  }
>  
> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
> +{
> +    DECLARE_DOMCTL;
> +
> +    /*
> +     * If it is PVHVM, the hypercall return code is 0, and resume
> +     * it in a new domain context.

What is a "new domain context"?

> +     *
> +     * If it is a HVM, do nothing.

If it is an HVM guest they you do something, you call a hypercall.

If that hypercall is somehow a NOP for an HVM domain then I think that
needs to be explained somewhere here.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros
  2015-04-01  6:41 ` [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros Yang Hongyang
@ 2015-04-22 15:10   ` Ian Campbell
  2015-04-23 11:56     ` Wen Congyang
  0 siblings, 1 reply; 47+ messages in thread
From: Ian Campbell @ 2015-04-22 15:10 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, eddie.dong, yunhong.jiang, ian.jackson,
	xen-devel, rshriram

On Wed, 2015-04-01 at 14:41 +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> This is the same set used by libxc.

What is this for?

libxl already exposes a fairly complete libxl_bitmap type and helpers
for use in its own interfaces and by its users.

For libxl's internal purposes (i.e. to interact with libxc) I don't see
why it can't just use xc_bitops.h.

Ian.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 11/29] adjust the indentation
  2015-04-01  6:41 ` [RFC PATCH COLO v5 11/29] adjust the indentation Yang Hongyang
@ 2015-04-22 15:20   ` Ian Campbell
  0 siblings, 0 replies; 47+ messages in thread
From: Ian Campbell @ 2015-04-22 15:20 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, eddie.dong, yunhong.jiang, ian.jackson,
	xen-devel, rshriram

On Wed, 2015-04-01 at 14:41 +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>

I think this is just tidying up after the previous automatic renaming,
if that is the case please can you say so.

> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_checkpoint_device.c | 23 ++++++++++++-----------
>  tools/libxl/libxl_internal.h          | 21 ++++++++++++---------
>  tools/libxl/libxl_remus.c             | 12 ++++++++----
>  3 files changed, 32 insertions(+), 24 deletions(-)
> 
> diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
> index 109cd23..0cfabc3 100644
> --- a/tools/libxl/libxl_checkpoint_device.c
> +++ b/tools/libxl/libxl_checkpoint_device.c
> @@ -73,9 +73,9 @@ static void devices_teardown_cb(libxl__egc *egc,
>  /* checkpoint device setup and teardown */
>  
>  static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
> -                                              libxl__checkpoint_devices_state *cds,
> -                                              libxl__device_kind kind,
> -                                              void *libxl_dev)
> +                libxl__checkpoint_devices_state *cds,
> +                libxl__device_kind kind,
> +                void *libxl_dev)
>  {
>      libxl__checkpoint_device *dev = NULL;
>  
> @@ -89,9 +89,10 @@ static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
>  }
>  
>  static void checkpoint_devices_setup(libxl__egc *egc,
> -                                libxl__checkpoint_devices_state *cds);
> +                                     libxl__checkpoint_devices_state *cds);
>  
> -void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
> +void libxl__checkpoint_devices_setup(libxl__egc *egc,
> +                                     libxl__checkpoint_devices_state *cds)
>  {
>      int i, rc;
>  
> @@ -137,7 +138,7 @@ out:
>  }
>  
>  static void checkpoint_devices_setup(libxl__egc *egc,
> -                                libxl__checkpoint_devices_state *cds)
> +                                     libxl__checkpoint_devices_state *cds)
>  {
>      int i, rc;
>  
> @@ -223,7 +224,7 @@ static void all_devices_setup_cb(libxl__egc *egc,
>  }
>  
>  void libxl__checkpoint_devices_teardown(libxl__egc *egc,
> -                                   libxl__checkpoint_devices_state *cds)
> +                                        libxl__checkpoint_devices_state *cds)
>  {
>      int i;
>      libxl__checkpoint_device *dev;
> @@ -285,12 +286,12 @@ static void devices_checkpoint_cb(libxl__egc *egc,
>  
>  /* API implementations */
>  
> -#define define_checkpoint_api(api)                                \
> -void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
> -                                libxl__checkpoint_devices_state *cds)        \
> +#define define_checkpoint_api(api)                                      \
> +void libxl__checkpoint_devices_##api(libxl__egc *egc,                   \
> +                                libxl__checkpoint_devices_state *cds)   \
>  {                                                                       \
>      int i;                                                              \
> -    libxl__checkpoint_device *dev;                                           \
> +    libxl__checkpoint_device *dev;                                      \
>                                                                          \
>      STATE_AO_GC(cds->ao);                                               \
>                                                                          \
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 7e7c3b3..4b8590c 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2652,7 +2652,8 @@ typedef struct libxl__save_helper_state {
>   * Each device type needs to implement the interfaces specified in
>   * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
>   *
> - * The high-level control flow through the checkpoint device layer is shown below:
> + * The high-level control flow through the checkpoint device layer is shown
> + * below:
>   *
>   * xl remus
>   *  |->  libxl_domain_remus_start
> @@ -2713,7 +2714,8 @@ int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
>  void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
>  
>  typedef void libxl__checkpoint_callback(libxl__egc *,
> -                                   libxl__checkpoint_devices_state *, int rc);
> +                                        libxl__checkpoint_devices_state *,
> +                                        int rc);
>  
>  /*
>   * State associated with a checkpoint invocation, including parameters
> @@ -2721,7 +2723,7 @@ typedef void libxl__checkpoint_callback(libxl__egc *,
>   * save/restore machinery.
>   */
>  struct libxl__checkpoint_devices_state {
> -    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
> +    /*-- must be set by caller of libxl__checkpoint_device_(setup|teardown) --*/
>  
>      libxl__ao *ao;
>      uint32_t domid;
> @@ -2734,7 +2736,8 @@ struct libxl__checkpoint_devices_state {
>      /*
>       * this array is allocated before setup the checkpoint devices by the
>       * checkpoint abstract layer.
> -     * devs may be NULL, means there's no checkpoint devices that has been set up.
> +     * devs may be NULL, means there's no checkpoint devices that has been
> +     * set up.
>       * the size of this array is 'num_devices', which is the total number
>       * of libxl nic devices and disk devices(num_nics + num_disks).
>       */
> @@ -2794,15 +2797,15 @@ struct libxl__checkpoint_device {
>  
>  /* the following 5 APIs are async ops, call cds->callback when done */
>  _hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
> -                                        libxl__checkpoint_devices_state *cds);
> +                libxl__checkpoint_devices_state *cds);
>  _hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
> -                                           libxl__checkpoint_devices_state *cds);
> +                libxl__checkpoint_devices_state *cds);
>  _hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
> -                                              libxl__checkpoint_devices_state *cds);
> +                libxl__checkpoint_devices_state *cds);
>  _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
> -                                            libxl__checkpoint_devices_state *cds);
> +                libxl__checkpoint_devices_state *cds);
>  _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
> -                                         libxl__checkpoint_devices_state *cds);
> +                libxl__checkpoint_devices_state *cds);
>  _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>  
>  /*----- Domain suspend (save) state structure -----*/
> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
> index 211216c..5afa618 100644
> --- a/tools/libxl/libxl_remus.c
> +++ b/tools/libxl/libxl_remus.c
> @@ -21,9 +21,11 @@
>  
>  /*----- remus: setup the environment -----*/
>  static void libxl__remus_setup_done(libxl__egc *egc,
> -                                    libxl__checkpoint_devices_state *cds, int rc);
> +                                    libxl__checkpoint_devices_state *cds,
> +                                    int rc);
>  static void libxl__remus_setup_failed(libxl__egc *egc,
> -                                      libxl__checkpoint_devices_state *cds, int rc);
> +                                      libxl__checkpoint_devices_state *cds,
> +                                      int rc);
>  
>  void libxl__remus_setup(libxl__egc *egc,
>                          libxl__domain_suspend_state *dss)
> @@ -57,7 +59,8 @@ out:
>  }
>  
>  static void libxl__remus_setup_done(libxl__egc *egc,
> -                                    libxl__checkpoint_devices_state *cds, int rc)
> +                                    libxl__checkpoint_devices_state *cds,
> +                                    int rc)
>  {
>      libxl__domain_suspend_state *dss = CONTAINER_OF(cds, *dss, cds);
>      STATE_AO_GC(dss->ao);
> @@ -222,7 +225,8 @@ out:
>  
>  /*----- remus: wait a new checkpoint -----*/
>  static void remus_checkpoint_dm_saved(libxl__egc *egc,
> -                                      libxl__domain_suspend_state *dss, int rc);
> +                                      libxl__domain_suspend_state *dss,
> +                                      int rc);
>  static void remus_devices_commit_cb(libxl__egc *egc,
>                                      libxl__checkpoint_devices_state *cds,
>                                      int rc);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros
  2015-04-22 15:10   ` Ian Campbell
@ 2015-04-23 11:56     ` Wen Congyang
  0 siblings, 0 replies; 47+ messages in thread
From: Wen Congyang @ 2015-04-23 11:56 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, eddie.dong, yunhong.jiang, ian.jackson, xen-devel, rshriram

On 04/22/2015 11:10 PM, Ian Campbell wrote:
> On Wed, 2015-04-01 at 14:41 +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> This is the same set used by libxc.
> 
> What is this for?
> 
> libxl already exposes a fairly complete libxl_bitmap type and helpers
> for use in its own interfaces and by its users.
> 
> For libxl's internal purposes (i.e. to interact with libxc) I don't see
> why it can't just use xc_bitops.h.

I just need it to get/test the dirty page bitmaps. So libxl_bitmap cannot be
used directly.

If IIRC, we only can use the header file under the directory tools/libxc/include.
Is it OK to move xc_bitops.h to tools/libxc/include?

Thanks
Wen Congyang

> 
> Ian.
> 
> .
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests
  2015-04-22 14:54   ` Ian Campbell
@ 2015-04-23 12:08     ` Wen Congyang
  0 siblings, 0 replies; 47+ messages in thread
From: Wen Congyang @ 2015-04-23 12:08 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, eddie.dong, yunhong.jiang, ian.jackson, xen-devel, rshriram

On 04/22/2015 10:54 PM, Ian Campbell wrote:
> On Wed, 2015-04-01 at 14:41 +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> For PVHVM, the hypercall return code is 0, and it can be resumed
>> in a new domain context.
>>
>> For HVM, do nothing.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  tools/libxc/xc_resume.c | 20 ++++++++++++++++----
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index e67bebd..b862ce3 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -109,6 +109,21 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>>      return do_domctl(xch, &domctl);
>>  }
>>  
>> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
>> +{
>> +    DECLARE_DOMCTL;
>> +
>> +    /*
>> +     * If it is PVHVM, the hypercall return code is 0, and resume
>> +     * it in a new domain context.
> 
> What is a "new domain context"?

Hmm, this patch is written some years ago. IIRC, we suspend PVHVM and
resume it is like this:
1. suspend it via evtchn
2. modifty the return code to 1
3. the guest know that the suspend is cancelled, we will use fast path
   to resume it.

But we may update the guest's state(modify memory, cpu's registers, device status...).
In this case, we cannot use the fast path to resume it. Keep the return code 0,
and use a very slow path to resume the guest. We have updated the guest
state, so I call it a new domain context...

> 
>> +     *
>> +     * If it is a HVM, do nothing.
> 
> If it is an HVM guest they you do something, you call a hypercall.
> 
> If that hypercall is somehow a NOP for an HVM domain then I think that
> needs to be explained somewhere here.

Hmm, IIRC, it is a NOP. For HVM, we only need that xc_domain_resume()
doesn't fail.

Thanks
Wen Congyang

> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests
  2015-04-08 18:12   ` Wei Liu
@ 2015-04-23 12:09     ` Wen Congyang
  0 siblings, 0 replies; 47+ messages in thread
From: Wen Congyang @ 2015-04-23 12:09 UTC (permalink / raw)
  To: Wei Liu, Yang Hongyang
  Cc: ian.campbell, ian.jackson, yunhong.jiang, eddie.dong, xen-devel,
	rshriram

On 04/09/2015 02:12 AM, Wei Liu wrote:
> On Wed, Apr 01, 2015 at 02:41:43PM +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> For PVHVM, the hypercall return code is 0, and it can be resumed
>> in a new domain context.
>>
>> For HVM, do nothing.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  tools/libxc/xc_resume.c | 20 ++++++++++++++++----
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index e67bebd..b862ce3 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -109,6 +109,21 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>>      return do_domctl(xch, &domctl);
>>  }
>>  
>> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
>> +{
>> +    DECLARE_DOMCTL;
>> +
>> +    /*
>> +     * If it is PVHVM, the hypercall return code is 0, and resume
>> +     * it in a new domain context.
>> +     *
>> +     * If it is a HVM, do nothing.
>> +     */
>> +    domctl.cmd = XEN_DOMCTL_resumedomain;
>> +    domctl.domain = domid;
>> +    return do_domctl(xch, &domctl);
>> +}
>> +
>>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>  {
>>      DECLARE_DOMCTL;
>> @@ -138,10 +153,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>       */
>>  #if defined(__i386__) || defined(__x86_64__)
>>      if ( info.hvm )
>> -    {
>> -        ERROR("Cannot resume uncooperative HVM guests");
>> -        return rc;
>> -    }
>> +        return xc_domain_resume_hvm(xch, domid);
>>  
> 
> You're going to skip the rest of the function. I don't think this is
> right.

No, just use xc_domain_resume_hvm() to replace ERROR()...

Thanks
Wen Congyang

> 
> Wei.
> 
>>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>>      {
>> -- 
>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> .
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-04-23 12:09 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
2015-04-08 18:11   ` Wei Liu
2015-04-14  4:06     ` Hongyang Yang
2015-04-01  6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
2015-04-08 18:11   ` Wei Liu
2015-04-14  5:56     ` Wen Congyang
2015-04-22 14:45   ` Ian Campbell
2015-04-01  6:41 ` [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state Yang Hongyang
2015-04-08 18:11   ` Wei Liu
2015-04-15 13:19     ` Ian Jackson
2015-04-01  6:41 ` [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-01  6:41 ` [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause() Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-01  6:41 ` [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-01  6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-23 12:09     ` Wen Congyang
2015-04-22 14:54   ` Ian Campbell
2015-04-23 12:08     ` Wen Congyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros Yang Hongyang
2015-04-22 15:10   ` Ian Campbell
2015-04-23 11:56     ` Wen Congyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 09/29] move remus related codes to libxl_remus.c Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 10/29] rename remus device to checkpoint device Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 11/29] adjust the indentation Yang Hongyang
2015-04-22 15:20   ` Ian Campbell
2015-04-01  6:41 ` [RFC PATCH COLO v5 12/29] don't touch remus in checkpoint_device Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 13/29] Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 14/29] Allow slave sends data to master Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 15/29] secondary vm suspend/resume/checkpoint code Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 16/29] primary vm suspend/get_dirty_pfn/resume/checkpoint code Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 17/29] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 18/29] COLO: xc related codes Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 19/29] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 20/29] implement the cmdline for COLO Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 21/29] tools: xc_doamin_restore: zero ioreq page only one time Yang Hongyang
2015-04-01  6:55 ` [RFC PATCH COLO v5 22/29] Support colo mode for qemu disk Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 23/29] COLO: use qemu block replication Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 24/29] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 25/29] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 26/29] COLO nic: implement COLO nic subkind Yang Hongyang
2015-04-01  6:58 ` [RFC PATCH COLO v5 27/29] setup and control colo proxy on primary side Yang Hongyang
2015-04-01  6:58 ` [RFC PATCH COLO v5 28/29] setup and control colo proxy on secondary side Yang Hongyang
2015-04-01  6:58 ` [RFC PATCH COLO v5 29/29] cmdline switches and config vars to control colo-proxy Yang Hongyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.