All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/18] Prerequisite patches for COLO
@ 2016-01-29  5:27 Wen Congyang
  2016-01-29  5:27 ` [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback Wen Congyang
                   ` (18 more replies)
  0 siblings, 19 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

This patchset is Prerequisite for COLO feature. Refer to:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Patch status:
1. Acked patches: patch 2, 3, 4, 9, 10, 15, 16, 18
2. Reviewd patches: patch 1, 10, 13, 15, 16, 17, 18
3. New patches: none
Note: patch 4 is updated to fix a bug

You can get the codes from here:
https://github.com/wencongyang/xen/tree/colo_pre_v7
You can get the whole colo related patches from here:
https://github.com/wencongyang/xen/tree/colo_v10

v6->v7:
 - Addressed comments from Konrad Rzeszutek Wilk

v5->v6:
 - Fix some bugs found in the test

v4->v5:
 - Rebased to the latest xen
 - Addressed comments from last round

v3->v4:
 - Rebased to the latest migration v2 branch
 - Addressed comments from last round

v2->v3:
 - Merge '[PATCH v2 0/6] Misc cleanups for libxl' into this patchset
   for easy review
 - Addressed review comments
 - Add back channel to libxc
 - Introduce should_checkpoint callback
 - Introduce DIRTY_BITMAP record on libxc side
 - Introduce COLO_CONTEXT record on libxl side
 - Ported to Libxl migration v2

v1->v2:
 - Rebased to [PATCH v2 0/6] Misc cleanups for libxl
 - Add a bugfix for the error handling of process_record

Wen Congyang (18):
  libxl/remus: init checkpoint_callback in Remus setup callback
  tools/libxl: move remus code into libxl_remus.c
  tools/libxl: move save/restore code into libxl_dom_save.c
  libxl/save: Refactor libxl__domain_suspend_state
  tools/libxc: support to resume uncooperative HVM guests
  tools/libxl: introduce enum type libxl_checkpointed_stream
  migration/save: pass checkpointed_stream from libxl to libxc
  tools/libxl: introduce libxl__domain_restore_device_model to load qemu
    state
  tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  tools/libxl: export logdirty_init
  tools/libxl: Add back channel to allow migration target send data back
  tools/libx{l,c}: add back channel to libxc
  tools/libxl: rename remus device to checkpoint device
  tools/libxl: fix backword compatibility after the automatic renaming
  tools/libxl: adjust the indentation
  tools/libxl: store remus_ops in checkpoint device state
  tools/libxl: move remus state into a seperate structure
  tools/libxl: seperate device init/cleanup from checkpoint device layer

 tools/libxc/include/xenguest.h        |   8 +-
 tools/libxc/xc_nomigrate.c            |   5 +-
 tools/libxc/xc_resume.c               |  25 +-
 tools/libxc/xc_sr_common.h            |  12 +-
 tools/libxc/xc_sr_restore.c           |   2 +-
 tools/libxc/xc_sr_save.c              |  18 +-
 tools/libxl/Makefile                  |   4 +-
 tools/libxl/libxl.c                   |  83 +---
 tools/libxl/libxl.h                   |  49 ++-
 tools/libxl/libxl_checkpoint_device.c | 282 +++++++++++++
 tools/libxl/libxl_create.c            |  50 +--
 tools/libxl/libxl_dom.c               | 740 ----------------------------------
 tools/libxl/libxl_dom_save.c          | 555 +++++++++++++++++++++++++
 tools/libxl/libxl_dom_suspend.c       | 207 ++++++----
 tools/libxl/libxl_internal.h          | 237 +++++++----
 tools/libxl/libxl_netbuffer.c         | 117 +++---
 tools/libxl/libxl_nonetbuffer.c       |  10 +-
 tools/libxl/libxl_qmp.c               |  10 +
 tools/libxl/libxl_remus.c             | 410 +++++++++++++++++++
 tools/libxl/libxl_remus_device.c      | 327 ---------------
 tools/libxl/libxl_remus_disk_drbd.c   |  56 +--
 tools/libxl/libxl_save_callout.c      |  43 +-
 tools/libxl/libxl_save_helper.c       |   9 +-
 tools/libxl/libxl_stream_read.c       |   7 +-
 tools/libxl/libxl_stream_write.c      |  18 +-
 tools/libxl/libxl_types.idl           |  10 +-
 tools/libxl/xl_cmdimpl.c              |  26 +-
 tools/ocaml/libs/xl/xenlight_stubs.c  |   2 +-
 28 files changed, 1836 insertions(+), 1486 deletions(-)
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 create mode 100644 tools/libxl/libxl_dom_save.c
 create mode 100644 tools/libxl/libxl_remus.c
 delete mode 100644 tools/libxl/libxl_remus_device.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:39   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c Wen Congyang
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Wen Congyang,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Shriram Rajagopalan,
	Ian Jackson, Yang Hongyang

init stream {read/write} state checkpoint_callback in Remus setup callback.
There's no functional change, it's just refactoring so that we can move
all remus code into one file.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl.c          |  2 ++
 tools/libxl/libxl_create.c   | 10 +++++++++-
 tools/libxl/libxl_dom.c      |  5 +----
 tools/libxl/libxl_internal.h |  4 ++++
 4 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 94b5656..5346a0c 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -917,6 +917,8 @@ static void libxl__remus_setup(libxl__egc *egc,
     rds->domid = dss->domid;
     rds->callback = remus_setup_done;
 
+    dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
+
     libxl__remus_devices_setup(egc, rds);
     return;
 
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e491d83..8b1efe5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -718,6 +718,12 @@ static void remus_checkpoint_stream_done(
     libxl__xc_domain_saverestore_async_callback_done(egc, &stream->shs, rc);
 }
 
+static void libxl__remus_restore_setup(libxl__egc *egc,
+                                       libxl__domain_create_state *dcs)
+{
+    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
+}
+
 /*----- main domain creation -----*/
 
 /* We have a linear control flow; only one event callback is
@@ -1004,6 +1010,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->srs.shs.callbacks.restore.a;
+    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
 
     if (rc) {
         domcreate_rebuild_done(egc, dcs, rc);
@@ -1042,9 +1049,10 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->srs.fd = restore_fd;
     dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
     dcs->srs.completion_callback = domcreate_stream_done;
-    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
 
     if (restore_fd >= 0) {
+        if (checkpointed_stream)
+            libxl__remus_restore_setup(egc, dcs);
         libxl__stream_read_start(egc, &dcs->srs);
         return;
     }
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 2269998..9e28bc4 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1569,8 +1569,6 @@ out:
 
 /*----- remus asynchronous checkpoint callback -----*/
 
-static void remus_checkpoint_stream_written(
-    libxl__egc *egc, libxl__stream_write_state *sws, int rc);
 static void remus_devices_commit_cb(libxl__egc *egc,
                                     libxl__remus_devices_state *rds,
                                     int rc);
@@ -1588,7 +1586,7 @@ static void libxl__remus_domain_save_checkpoint_callback(void *data)
     libxl__stream_write_start_checkpoint(egc, &dss->sws);
 }
 
-static void remus_checkpoint_stream_written(
+void remus_checkpoint_stream_written(
     libxl__egc *egc, libxl__stream_write_state *sws, int rc)
 {
     libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
@@ -1761,7 +1759,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;
-        dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
     } else
         callbacks->suspend = libxl__domain_suspend_callback;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fc1b558..abc0eac 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3506,6 +3506,10 @@ _hidden void libxl__domain_suspend(libxl__egc *egc,
 /* used by libxc to suspend the guest during migration */
 _hidden void libxl__domain_suspend_callback(void *data);
 
+/* Remus callbacks for restore */
+_hidden void remus_checkpoint_stream_written(
+    libxl__egc *egc, libxl__stream_write_state *sws, int rc);
+
 
 /*
  * Convenience macros.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
  2016-01-29  5:27 ` [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:29   ` Konrad Rzeszutek Wilk
  2016-02-03 19:39   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c Wen Congyang
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Wen Congyang,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Shriram Rajagopalan,
	Ian Jackson, Yang Hongyang

After previous refactoring, we are now able to move all remus code
into a separate file libxl_remus.c.

Export following functions for internal use:
- Remus callbacks
  * libxl__remus_domain_suspend_callback
  * libxl__remus_domain_resume_callback
  * libxl__remus_domain_save_checkpoint_callback
  * libxl__remus_domain_restore_checkpoint_callback
- setup/teardown Remus:
  * libxl__remus_setup
  * libxl__remus_teardown

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by:Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/Makefile         |   2 +-
 tools/libxl/libxl.c          |  69 ---------
 tools/libxl/libxl_create.c   |  27 ----
 tools/libxl/libxl_dom.c      | 223 ---------------------------
 tools/libxl/libxl_internal.h |  15 +-
 tools/libxl/libxl_remus.c    | 348 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 362 insertions(+), 322 deletions(-)
 create mode 100644 tools/libxl/libxl_remus.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 620720e..7d64ecc 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -64,7 +64,7 @@ else
 LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 5346a0c..6347097 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -831,12 +831,6 @@ out:
     return ptr;
 }
 
-static void libxl__remus_setup(libxl__egc *egc,
-                               libxl__domain_suspend_state *dss);
-static void remus_setup_done(libxl__egc *egc,
-                             libxl__remus_devices_state *rds, int rc);
-static void remus_setup_failed(libxl__egc *egc,
-                               libxl__remus_devices_state *rds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
@@ -893,69 +887,6 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     return AO_CREATE_FAIL(rc);
 }
 
-static void libxl__remus_setup(libxl__egc *egc,
-                               libxl__domain_suspend_state *dss)
-{
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
-    const libxl_domain_remus_info *const info = dss->remus;
-
-    STATE_AO_GC(dss->ao);
-
-    if (libxl_defbool_val(info->netbuf)) {
-        if (!libxl__netbuffer_enabled(gc)) {
-            LOG(ERROR, "Remus: No support for network buffering");
-            goto out;
-        }
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
-    }
-
-    if (libxl_defbool_val(info->diskbuf))
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
-
-    rds->ao = ao;
-    rds->domid = dss->domid;
-    rds->callback = remus_setup_done;
-
-    dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
-
-    libxl__remus_devices_setup(egc, rds);
-    return;
-
-out:
-    dss->callback(egc, dss, ERROR_FAIL);
-}
-
-static void remus_setup_done(libxl__egc *egc,
-                             libxl__remus_devices_state *rds, int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (!rc) {
-        libxl__domain_save(egc, dss);
-        return;
-    }
-
-    LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
-        dss->domid, rc);
-    rds->callback = remus_setup_failed;
-    libxl__remus_devices_teardown(egc, rds);
-}
-
-static void remus_setup_failed(libxl__egc *egc,
-                               libxl__remus_devices_state *rds, int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (rc)
-        LOG(ERROR, "Remus: failed to teardown device after setup failed"
-            " for guest with domid %u, rc %d", dss->domid, rc);
-
-    dss->callback(egc, dss, rc);
-}
-
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc)
 {
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 8b1efe5..6f1cf93 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -697,33 +697,6 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
                             libxl_device_model_version_to_string(b_info->device_model_version));
 }
 
-/*----- remus asynchronous checkpoint callback -----*/
-
-static void remus_checkpoint_stream_done(
-    libxl__egc *egc, libxl__stream_read_state *srs, int rc);
-
-static void libxl__remus_domain_restore_checkpoint_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__domain_create_state *dcs = shs->caller_state;
-    libxl__egc *egc = shs->egc;
-    STATE_AO_GC(dcs->ao);
-
-    libxl__stream_read_start_checkpoint(egc, &dcs->srs);
-}
-
-static void remus_checkpoint_stream_done(
-    libxl__egc *egc, libxl__stream_read_state *stream, int rc)
-{
-    libxl__xc_domain_saverestore_async_callback_done(egc, &stream->shs, rc);
-}
-
-static void libxl__remus_restore_setup(libxl__egc *egc,
-                                       libxl__domain_create_state *dcs)
-{
-    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
-}
-
 /*----- main domain creation -----*/
 
 /* We have a linear control flow; only one event callback is
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 9e28bc4..81bd464 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1479,196 +1479,6 @@ int libxl__save_emulator_xenstore_data(libxl__domain_suspend_state *dss,
     return rc;
 }
 
-/*----- remus callbacks -----*/
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
-static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
-                                         int rc);
-static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc);
-
-static void libxl__remus_domain_suspend_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
-
-    dss->callback_common_done = remus_domain_suspend_callback_common_done;
-    libxl__domain_suspend(egc, dss);
-}
-
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int rc)
-{
-    if (rc)
-        goto out;
-
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_postsuspend_cb;
-    libxl__remus_devices_postsuspend(egc, rds);
-    return;
-
-out:
-    dss->rc = rc;
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
-}
-
-static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
-                                         int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-
-    if (rc)
-        goto out;
-
-    rc = 0;
-
-out:
-    if (rc)
-        dss->rc = rc;
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
-}
-
-static void libxl__remus_domain_resume_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
-
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_preresume_cb;
-    libxl__remus_devices_preresume(egc, rds);
-}
-
-static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (rc)
-        goto out;
-
-    /* Resumes the domain and the device model */
-    rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
-    if (rc)
-        goto out;
-
-    rc = 0;
-
-out:
-    if (rc)
-        dss->rc = rc;
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
-}
-
-/*----- remus asynchronous checkpoint callback -----*/
-
-static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
-                                    int rc);
-static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
-                                  const struct timeval *requested_abs,
-                                  int rc);
-
-static void libxl__remus_domain_save_checkpoint_callback(void *data)
-{
-    libxl__save_helper_state *shs = data;
-    libxl__domain_suspend_state *dss = shs->caller_state;
-    libxl__egc *egc = shs->egc;
-    STATE_AO_GC(dss->ao);
-
-    libxl__stream_write_start_checkpoint(egc, &dss->sws);
-}
-
-void remus_checkpoint_stream_written(
-    libxl__egc *egc, libxl__stream_write_state *sws, int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
-
-    STATE_AO_GC(dss->ao);
-
-    if (rc) {
-        LOG(ERROR, "Failed to save device model. Terminating Remus..");
-        goto out;
-    }
-
-    rds->callback = remus_devices_commit_cb;
-    libxl__remus_devices_commit(egc, rds);
-
-    return;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
-}
-
-static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
-                                    int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-
-    STATE_AO_GC(dss->ao);
-
-    if (rc) {
-        LOG(ERROR, "Failed to do device commit op."
-            " Terminating Remus..");
-        goto out;
-    }
-
-    /*
-     * At this point, we have successfully checkpointed the guest and
-     * committed it at the backup. We'll come back after the checkpoint
-     * interval to checkpoint the guest again. Until then, let the guest
-     * continue execution.
-     */
-
-    /* Set checkpoint interval timeout */
-    rc = libxl__ev_time_register_rel(ao, &dss->checkpoint_timeout,
-                                     remus_next_checkpoint,
-                                     dss->interval);
-
-    if (rc)
-        goto out;
-
-    return;
-
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
-}
-
-static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
-                                  const struct timeval *requested_abs,
-                                  int rc)
-{
-    libxl__domain_suspend_state *dss =
-                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
-
-    STATE_AO_GC(dss->ao);
-
-    if (rc == ERROR_TIMEDOUT) /* As intended */
-        rc = 0;
-
-    /*
-     * Time to checkpoint the guest again. We return 1 to libxc
-     * (xc_domain_save.c). in order to continue executing the infinite loop
-     * (suspend, checkpoint, resume) in xc_domain_save().
-     */
-
-    if (rc)
-        dss->rc = rc;
-
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
-}
-
 /*----- main code for saving, in order of execution -----*/
 
 void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
@@ -1782,13 +1592,6 @@ static void stream_done(libxl__egc *egc,
     domain_save_done(egc, sws->dss, rc);
 }
 
-static void libxl__remus_teardown(libxl__egc *egc,
-                                  libxl__domain_suspend_state *dss,
-                                  int rc);
-static void remus_teardown_done(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc);
-
 static void domain_save_done(libxl__egc *egc,
                              libxl__domain_suspend_state *dss, int rc)
 {
@@ -1817,32 +1620,6 @@ static void domain_save_done(libxl__egc *egc,
     dss->callback(egc, dss, rc);
 }
 
-static void libxl__remus_teardown(libxl__egc *egc,
-                                  libxl__domain_suspend_state *dss,
-                                  int rc)
-{
-    EGC_GC;
-
-    LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
-        " teardown Remus devices...", rc);
-    dss->rds.callback = remus_teardown_done;
-    libxl__remus_devices_teardown(egc, &dss->rds);
-}
-
-static void remus_teardown_done(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
-                                       int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-    STATE_AO_GC(dss->ao);
-
-    if (rc)
-        LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
-            " rc %d", dss->domid, rc);
-
-    dss->callback(egc, dss, rc);
-}
-
 /*==================== Miscellaneous ====================*/
 
 char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index abc0eac..7005d6b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3506,9 +3506,20 @@ _hidden void libxl__domain_suspend(libxl__egc *egc,
 /* used by libxc to suspend the guest during migration */
 _hidden void libxl__domain_suspend_callback(void *data);
 
+/* Remus callbacks for save */
+_hidden void libxl__remus_domain_suspend_callback(void *data);
+_hidden void libxl__remus_domain_resume_callback(void *data);
+_hidden void libxl__remus_domain_save_checkpoint_callback(void *data);
+/* Remus setup and teardown*/
+_hidden void libxl__remus_setup(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss);
+_hidden void libxl__remus_teardown(libxl__egc *egc,
+                                   libxl__domain_suspend_state *dss,
+                                   int rc);
 /* Remus callbacks for restore */
-_hidden void remus_checkpoint_stream_written(
-    libxl__egc *egc, libxl__stream_write_state *sws, int rc);
+_hidden void libxl__remus_domain_restore_checkpoint_callback(void *data);
+_hidden void libxl__remus_restore_setup(libxl__egc *egc,
+                                        libxl__domain_create_state *dcs);
 
 
 /*
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
new file mode 100644
index 0000000..e3caf7d
--- /dev/null
+++ b/tools/libxl/libxl_remus.c
@@ -0,0 +1,348 @@
+/*
+ * Copyright (C) 2009      Citrix Ltd.
+ * Author Vincent Hanquez <vincent.hanquez@eu.citrix.com>
+ *        Yang Hongyang <hongyang.yang@easystack.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*-------------------- Remus setup and teardown ---------------------*/
+
+static void remus_setup_done(libxl__egc *egc,
+                             libxl__remus_devices_state *rds, int rc);
+static void remus_setup_failed(libxl__egc *egc,
+                               libxl__remus_devices_state *rds, int rc);
+static void remus_checkpoint_stream_written(
+    libxl__egc *egc, libxl__stream_write_state *sws, int rc);
+
+void libxl__remus_setup(libxl__egc *egc,
+                        libxl__domain_suspend_state *dss)
+{
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+    const libxl_domain_remus_info *const info = dss->remus;
+
+    STATE_AO_GC(dss->ao);
+
+    if (libxl_defbool_val(info->netbuf)) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+    }
+
+    if (libxl_defbool_val(info->diskbuf))
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
+
+    rds->ao = ao;
+    rds->domid = dss->domid;
+    rds->callback = remus_setup_done;
+
+    dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
+
+    libxl__remus_devices_setup(egc, rds);
+    return;
+
+out:
+    dss->callback(egc, dss, ERROR_FAIL);
+}
+
+static void remus_setup_done(libxl__egc *egc,
+                             libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        libxl__domain_save(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
+        dss->domid, rc);
+    rds->callback = remus_setup_failed;
+    libxl__remus_devices_teardown(egc, rds);
+}
+
+static void remus_setup_failed(libxl__egc *egc,
+                               libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", dss->domid, rc);
+
+    dss->callback(egc, dss, rc);
+}
+
+static void remus_teardown_done(libxl__egc *egc,
+                                libxl__remus_devices_state *rds,
+                                int rc);
+void libxl__remus_teardown(libxl__egc *egc,
+                           libxl__domain_suspend_state *dss,
+                           int rc)
+{
+    EGC_GC;
+
+    LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
+        " teardown Remus devices...", rc);
+    dss->rds.callback = remus_teardown_done;
+    libxl__remus_devices_teardown(egc, &dss->rds);
+}
+
+static void remus_teardown_done(libxl__egc *egc,
+                                libxl__remus_devices_state *rds,
+                                int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
+            " rc %d", dss->domid, rc);
+
+    dss->callback(egc, dss, rc);
+}
+
+/*---------------------- remus callbacks (save) -----------------------*/
+
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss, int ok);
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc);
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc);
+
+void libxl__remus_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = shs->caller_state;
+
+    dss->callback_common_done = remus_domain_suspend_callback_common_done;
+    libxl__domain_suspend(egc, dss);
+}
+
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss, int rc)
+{
+    if (rc)
+        goto out;
+
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_postsuspend_cb;
+    libxl__remus_devices_postsuspend(egc, rds);
+    return;
+
+out:
+    dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    if (rc)
+        goto out;
+
+    rc = 0;
+
+out:
+    if (rc)
+        dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+void libxl__remus_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = shs->caller_state;
+    STATE_AO_GC(dss->ao);
+
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_preresume_cb;
+    libxl__remus_devices_preresume(egc, rds);
+}
+
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        goto out;
+
+    /* Resumes the domain and the device model */
+    rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
+    if (rc)
+        goto out;
+
+    rc = 0;
+
+out:
+    if (rc)
+        dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+/*----- remus asynchronous checkpoint callback -----*/
+
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs,
+                                  int rc);
+
+void libxl__remus_domain_save_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(dss->ao);
+
+    libxl__stream_write_start_checkpoint(egc, &dss->sws);
+}
+
+static void remus_checkpoint_stream_written(
+    libxl__egc *egc, libxl__stream_write_state *sws, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rds->callback = remus_devices_commit_cb;
+    libxl__remus_devices_commit(egc, rds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    }
+
+    /*
+     * At this point, we have successfully checkpointed the guest and
+     * committed it at the backup. We'll come back after the checkpoint
+     * interval to checkpoint the guest again. Until then, let the guest
+     * continue execution.
+     */
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(ao, &dss->checkpoint_timeout,
+                                     remus_next_checkpoint,
+                                     dss->interval);
+
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs,
+                                  int rc)
+{
+    libxl__domain_suspend_state *dss =
+                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc == ERROR_TIMEDOUT) /* As intended */
+        rc = 0;
+
+    /*
+     * Time to checkpoint the guest again. We return 1 to libxc
+     * (xc_domain_save.c). in order to continue executing the infinite loop
+     * (suspend, checkpoint, resume) in xc_domain_save().
+     */
+
+    if (rc)
+        dss->rc = rc;
+
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+/*---------------------- remus callbacks (restore) -----------------------*/
+
+/*----- remus asynchronous checkpoint callback -----*/
+
+static void remus_checkpoint_stream_done(
+    libxl__egc *egc, libxl__stream_read_state *srs, int rc);
+
+void libxl__remus_domain_restore_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__domain_create_state *dcs = shs->caller_state;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(dcs->ao);
+
+    libxl__stream_read_start_checkpoint(egc, &dcs->srs);
+}
+
+static void remus_checkpoint_stream_done(
+    libxl__egc *egc, libxl__stream_read_state *stream, int rc)
+{
+    libxl__xc_domain_saverestore_async_callback_done(egc, &stream->shs, rc);
+}
+
+void libxl__remus_restore_setup(libxl__egc *egc,
+                                libxl__domain_create_state *dcs)
+{
+    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
  2016-01-29  5:27 ` [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback Wen Congyang
  2016-01-29  5:27 ` [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:30   ` Konrad Rzeszutek Wilk
  2016-02-03 19:39   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state Wen Congyang
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Ian Jackson,
	Yang Hongyang

This is purely code motion.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/Makefile         |   2 +-
 tools/libxl/libxl_dom.c      | 514 ----------------------------------------
 tools/libxl/libxl_dom_save.c | 543 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 544 insertions(+), 515 deletions(-)
 create mode 100644 tools/libxl/libxl_dom_save.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 7d64ecc..263ea0e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -105,7 +105,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_stream_read.o libxl_stream_write.o \
 			libxl_save_callout.o _libxl_save_msgs_callout.o \
 			libxl_qmp.o libxl_event.o libxl_fork.o \
-			libxl_dom_suspend.o $(LIBXL_OBJS-y)
+			libxl_dom_suspend.o libxl_dom_save.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 81bd464..664adad 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -24,7 +24,6 @@
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/e820.h>
-#include <xen/errno.h>
 
 libxl_domain_type libxl__domain_type(libxl__gc *gc, uint32_t domid)
 {
@@ -1107,519 +1106,6 @@ int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
     return libxl__xs_printf(gc, XBT_NULL, path, "%s", cmd);
 }
 
-/*
- * Inspect the buffer between start and end, and return a pointer to the
- * character following the NUL terminator of start, or NULL if start is not
- * terminated before end.
- */
-static const char *next_string(const char *start, const char *end)
-{
-    if (start >= end) return NULL;
-
-    size_t total_len = end - start;
-    size_t len = strnlen(start, total_len);
-
-    if (len == total_len)
-        return NULL;
-    else
-        return start + len + 1;
-}
-
-int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
-                                          const char *ptr, uint32_t size)
-{
-    STATE_AO_GC(dcs->ao);
-    const char *next = ptr, *end = ptr + size, *key, *val;
-    int rc;
-
-    const uint32_t domid = dcs->guest_domid;
-    const uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
-    const char *xs_root = libxl__device_model_xs_path(gc, dm_domid, domid, "");
-
-    while (next < end) {
-        key = next;
-        next = next_string(next, end);
-
-        /* Sanitise 'key'. */
-        if (!next) {
-            rc = ERROR_FAIL;
-            LOG(ERROR, "Key in xenstore data not NUL terminated");
-            goto out;
-        }
-        if (key[0] == '\0') {
-            rc = ERROR_FAIL;
-            LOG(ERROR, "empty key found in xenstore data");
-            goto out;
-        }
-        if (key[0] == '/') {
-            rc = ERROR_FAIL;
-            LOG(ERROR, "Key in xenstore data not relative");
-            goto out;
-        }
-
-        val = next;
-        next = next_string(next, end);
-
-        /* Sanitise 'val'. */
-        if (!next) {
-            rc = ERROR_FAIL;
-            LOG(ERROR, "Val in xenstore data not NUL terminated");
-            goto out;
-        }
-
-        libxl__xs_printf(gc, XBT_NULL,
-                         GCSPRINTF("%s/%s", xs_root, key),
-                         "%s", val);
-    }
-
-    rc = 0;
-
- out:
-    return rc;
-}
-
-/*==================== Domain suspend (save) ====================*/
-
-static void stream_done(libxl__egc *egc,
-                        libxl__stream_write_state *sws, int rc);
-static void domain_save_done(libxl__egc *egc,
-                             libxl__domain_suspend_state *dss, int rc);
-
-/*----- complicated callback, called by xc_domain_save -----*/
-
-/*
- * We implement the other end of protocol for controlling qemu-dm's
- * logdirty.  There is no documentation for this protocol, but our
- * counterparty's implementation is in
- * qemu-xen-traditional.git:xenstore.c in the function
- * xenstore_process_logdirty_event
- */
-
-static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                                    const struct timeval *requested_abs,
-                                    int rc);
-static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
-                            const char *watch_path, const char *event_path);
-static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss, int rc);
-
-static void logdirty_init(libxl__logdirty_switch *lds)
-{
-    lds->cmd_path = 0;
-    libxl__ev_xswatch_init(&lds->watch);
-    libxl__ev_time_init(&lds->timeout);
-}
-
-static void domain_suspend_switch_qemu_xen_traditional_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
-{
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
-    int rc;
-    xs_transaction_t t = 0;
-    const char *got;
-
-    if (!lds->cmd_path) {
-        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
-        lds->cmd_path = libxl__device_model_xs_path(gc, dm_domid, domid,
-                                                    "/logdirty/cmd");
-        lds->ret_path = libxl__device_model_xs_path(gc, dm_domid, domid,
-                                                    "/logdirty/ret");
-    }
-    lds->cmd = enable ? "enable" : "disable";
-
-    rc = libxl__ev_xswatch_register(gc, &lds->watch,
-                                switch_logdirty_xswatch, lds->ret_path);
-    if (rc) goto out;
-
-    rc = libxl__ev_time_register_rel(ao, &lds->timeout,
-                                switch_logdirty_timeout, 10*1000);
-    if (rc) goto out;
-
-    for (;;) {
-        rc = libxl__xs_transaction_start(gc, &t);
-        if (rc) goto out;
-
-        rc = libxl__xs_read_checked(gc, t, lds->cmd_path, &got);
-        if (rc) goto out;
-
-        if (got) {
-            const char *got_ret;
-            rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got_ret);
-            if (rc) goto out;
-
-            if (!got_ret || strcmp(got, got_ret)) {
-                LOG(ERROR,"controlling logdirty: qemu was already sent"
-                    " command `%s' (xenstore path `%s') but result is `%s'",
-                    got, lds->cmd_path, got_ret ? got_ret : "<none>");
-                rc = ERROR_FAIL;
-                goto out;
-            }
-            rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
-            if (rc) goto out;
-        }
-
-        rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
-        if (rc) goto out;
-
-        rc = libxl__xs_write_checked(gc, t, lds->cmd_path, lds->cmd);
-        if (rc) goto out;
-
-        rc = libxl__xs_transaction_commit(gc, &t);
-        if (!rc) break;
-        if (rc<0) goto out;
-    }
-
-    /* OK, wait for some callback */
-    return;
-
- out:
-    LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
-    libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,rc);
-}
-
-static void domain_suspend_switch_qemu_xen_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
-{
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
-    int rc;
-
-    rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
-    if (!rc) {
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
-    } else {
-        LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
-        dss->rc = rc;
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
-    }
-}
-
-void libxl__domain_suspend_common_switch_qemu_logdirty
-                               (int domid, unsigned enable, void *user)
-{
-    libxl__save_helper_state *shs = user;
-    libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
-
-    switch (libxl__device_model_version_running(gc, domid)) {
-    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
-        break;
-    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
-        break;
-    case LIBXL_DEVICE_MODEL_VERSION_NONE:
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
-        break;
-    default:
-        LOG(ERROR,"logdirty switch failed"
-            ", no valid device model version found, abandoning suspend");
-        dss->rc = ERROR_FAIL;
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
-    }
-}
-static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                                    const struct timeval *requested_abs,
-                                    int rc)
-{
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
-    STATE_AO_GC(dss->ao);
-    LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,ERROR_FAIL);
-}
-
-static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
-                            const char *watch_path, const char *event_path)
-{
-    libxl__domain_suspend_state *dss =
-        CONTAINER_OF(watch, *dss, logdirty.watch);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
-    const char *got;
-    xs_transaction_t t = 0;
-    int rc;
-
-    for (;;) {
-        rc = libxl__xs_transaction_start(gc, &t);
-        if (rc) goto out;
-
-        rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got);
-        if (rc) goto out;
-
-        if (!got) {
-            rc = +1;
-            goto out;
-        }
-
-        if (strcmp(got, lds->cmd)) {
-            LOG(ERROR,"logdirty switch: sent command `%s' but got reply `%s'"
-                " (xenstore paths `%s' / `%s')", lds->cmd, got,
-                lds->cmd_path, lds->ret_path);
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
-        if (rc) goto out;
-
-        rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
-        if (rc) goto out;
-
-        rc = libxl__xs_transaction_commit(gc, &t);
-        if (!rc) break;
-        if (rc<0) goto out;
-    }
-
- out:
-    /* rc < 0: error
-     * rc == 0: ok, we are done
-     * rc == +1: need to keep waiting
-     */
-    libxl__xs_transaction_abort(gc, &t);
-
-    if (rc <= 0) {
-        if (rc < 0)
-            LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,rc);
-    }
-}
-
-static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss,
-                                 int rc)
-{
-    STATE_AO_GC(dss->ao);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-
-    libxl__ev_xswatch_deregister(gc, &lds->watch);
-    libxl__ev_time_deregister(gc, &lds->timeout);
-
-    int broke;
-    if (rc) {
-        broke = -1;
-        dss->rc = rc;
-    } else {
-        broke = 0;
-    }
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, broke);
-}
-
-/*----- callbacks, called by xc_domain_save -----*/
-
-/*
- * Expand the buffer 'buf' of length 'len', to append 'str' including its NUL
- * terminator.
- */
-static void append_string(libxl__gc *gc, char **buf, uint32_t *len,
-                          const char *str)
-{
-    size_t extralen = strlen(str) + 1;
-    char *new = libxl__realloc(gc, *buf, *len + extralen);
-
-    *buf = new;
-    memcpy(new + *len, str, extralen);
-    *len += extralen;
-}
-
-int libxl__save_emulator_xenstore_data(libxl__domain_suspend_state *dss,
-                                       char **callee_buf,
-                                       uint32_t *callee_len)
-{
-    STATE_AO_GC(dss->ao);
-    const char *xs_root;
-    char **entries, *buf = NULL;
-    unsigned int nr_entries, i, j, len = 0;
-    int rc;
-
-    const uint32_t domid = dss->domid;
-    const uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
-
-    xs_root = libxl__device_model_xs_path(gc, dm_domid, domid, "");
-
-    entries = libxl__xs_directory(gc, 0, GCSPRINTF("%s/physmap", xs_root),
-                                  &nr_entries);
-    if (!entries || nr_entries == 0) { rc = 0; goto out; }
-
-    for (i = 0; i < nr_entries; ++i) {
-        static const char *const physmap_subkeys[] = {
-            "start_addr", "size", "name"
-        };
-
-        for (j = 0; j < ARRAY_SIZE(physmap_subkeys); ++j) {
-            const char *key = GCSPRINTF("physmap/%s/%s",
-                                        entries[i], physmap_subkeys[j]);
-
-            const char *val =
-                libxl__xs_read(gc, XBT_NULL,
-                               GCSPRINTF("%s/%s", xs_root, key));
-
-            if (!val) { rc = ERROR_FAIL; goto out; }
-
-            append_string(gc, &buf, &len, key);
-            append_string(gc, &buf, &len, val);
-        }
-    }
-
-    rc = 0;
-
- out:
-    if (!rc) {
-        *callee_buf = buf;
-        *callee_len = len;
-    }
-
-    return rc;
-}
-
-/*----- main code for saving, in order of execution -----*/
-
-void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
-{
-    STATE_AO_GC(dss->ao);
-    int port;
-    int rc, ret;
-
-    /* Convenience aliases */
-    const uint32_t domid = dss->domid;
-    const libxl_domain_type type = dss->type;
-    const int live = dss->live;
-    const int debug = dss->debug;
-    const libxl_domain_remus_info *const r_info = dss->remus;
-    libxl__srm_save_autogen_callbacks *const callbacks =
-        &dss->sws.shs.callbacks.save.a;
-    unsigned int nr_vnodes = 0, nr_vmemranges = 0, nr_vcpus = 0;
-
-    dss->rc = 0;
-    logdirty_init(&dss->logdirty);
-    libxl__xswait_init(&dss->pvcontrol);
-    libxl__ev_evtchn_init(&dss->guest_evtchn);
-    libxl__ev_xswatch_init(&dss->guest_watch);
-    libxl__ev_time_init(&dss->guest_timeout);
-
-    switch (type) {
-    case LIBXL_DOMAIN_TYPE_HVM: {
-        dss->hvm = 1;
-        break;
-    }
-    case LIBXL_DOMAIN_TYPE_PV:
-        dss->hvm = 0;
-        break;
-    default:
-        abort();
-    }
-
-    dss->xcflags = (live ? XCFLAGS_LIVE : 0)
-          | (debug ? XCFLAGS_DEBUG : 0)
-          | (dss->hvm ? XCFLAGS_HVM : 0);
-
-    /* Disallow saving a guest with vNUMA configured because migration
-     * stream does not preserve node information.
-     *
-     * Reject any domain which has vnuma enabled, even if the
-     * configuration is empty. Only domains which have no vnuma
-     * configuration at all are supported.
-     */
-    ret = xc_domain_getvnuma(CTX->xch, domid, &nr_vnodes, &nr_vmemranges,
-                             &nr_vcpus, NULL, NULL, NULL);
-    if (ret != -1 || errno != XEN_EOPNOTSUPP) {
-        LOG(ERROR, "Cannot save a guest with vNUMA configured");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    dss->guest_evtchn.port = -1;
-    dss->guest_evtchn_lockfd = -1;
-    dss->guest_responded = 0;
-    dss->dm_savefile = libxl__device_model_savefile(gc, domid);
-
-    if (r_info != NULL) {
-        dss->interval = r_info->interval;
-        dss->xcflags |= XCFLAGS_CHECKPOINTED;
-        if (libxl_defbool_val(r_info->compression))
-            dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
-    }
-
-    port = xs_suspend_evtchn_port(dss->domid);
-
-    if (port >= 0) {
-        rc = libxl__ctx_evtchn_init(gc);
-        if (rc) goto out;
-
-        dss->guest_evtchn.port =
-            xc_suspend_evtchn_init_exclusive(CTX->xch, CTX->xce,
-                                  dss->domid, port, &dss->guest_evtchn_lockfd);
-
-        if (dss->guest_evtchn.port < 0) {
-            LOG(WARN, "Suspend event channel initialization failed");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-    }
-
-    memset(callbacks, 0, sizeof(*callbacks));
-    if (r_info != NULL) {
-        callbacks->suspend = libxl__remus_domain_suspend_callback;
-        callbacks->postcopy = libxl__remus_domain_resume_callback;
-        callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;
-    } else
-        callbacks->suspend = libxl__domain_suspend_callback;
-
-    callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
-
-    dss->sws.ao  = dss->ao;
-    dss->sws.dss = dss;
-    dss->sws.fd  = dss->fd;
-    dss->sws.completion_callback = stream_done;
-
-    libxl__stream_write_start(egc, &dss->sws);
-    return;
-
- out:
-    domain_save_done(egc, dss, rc);
-}
-
-static void stream_done(libxl__egc *egc,
-                        libxl__stream_write_state *sws, int rc)
-{
-    domain_save_done(egc, sws->dss, rc);
-}
-
-static void domain_save_done(libxl__egc *egc,
-                             libxl__domain_suspend_state *dss, int rc)
-{
-    STATE_AO_GC(dss->ao);
-
-    /* Convenience aliases */
-    const uint32_t domid = dss->domid;
-
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-
-    if (dss->guest_evtchn.port > 0)
-        xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
-                           dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
-
-    if (dss->remus) {
-        /*
-         * With Remus, if we reach this point, it means either
-         * backup died or some network error occurred preventing us
-         * from sending checkpoints. Teardown the network buffers and
-         * release netlink resources.  This is an async op.
-         */
-        libxl__remus_teardown(egc, dss, rc);
-        return;
-    }
-
-    dss->callback(egc, dss, rc);
-}
-
 /*==================== Miscellaneous ====================*/
 
 char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid)
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
new file mode 100644
index 0000000..27fd58b
--- /dev/null
+++ b/tools/libxl/libxl_dom_save.c
@@ -0,0 +1,543 @@
+/*
+ * Copyright (C) 2009      Citrix Ltd.
+ * Author Vincent Hanquez <vincent.hanquez@eu.citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+#include <xen/errno.h>
+
+/*========================= Domain save ============================*/
+
+static void stream_done(libxl__egc *egc,
+                        libxl__stream_write_state *sws, int rc);
+static void domain_save_done(libxl__egc *egc,
+                             libxl__domain_suspend_state *dss, int rc);
+
+/*----- complicated callback, called by xc_domain_save -----*/
+
+/*
+ * We implement the other end of protocol for controlling qemu-dm's
+ * logdirty.  There is no documentation for this protocol, but our
+ * counterparty's implementation is in
+ * qemu-xen-traditional.git:xenstore.c in the function
+ * xenstore_process_logdirty_event
+ */
+
+static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
+                                    const struct timeval *requested_abs,
+                                    int rc);
+static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
+                            const char *watch_path, const char *event_path);
+static void switch_logdirty_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss, int rc);
+
+static void logdirty_init(libxl__logdirty_switch *lds)
+{
+    lds->cmd_path = 0;
+    libxl__ev_xswatch_init(&lds->watch);
+    libxl__ev_time_init(&lds->timeout);
+}
+
+static void domain_suspend_switch_qemu_xen_traditional_logdirty
+                               (int domid, unsigned enable,
+                                libxl__save_helper_state *shs)
+{
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(dss->ao);
+    int rc;
+    xs_transaction_t t = 0;
+    const char *got;
+
+    if (!lds->cmd_path) {
+        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
+        lds->cmd_path = libxl__device_model_xs_path(gc, dm_domid, domid,
+                                                    "/logdirty/cmd");
+        lds->ret_path = libxl__device_model_xs_path(gc, dm_domid, domid,
+                                                    "/logdirty/ret");
+    }
+    lds->cmd = enable ? "enable" : "disable";
+
+    rc = libxl__ev_xswatch_register(gc, &lds->watch,
+                                switch_logdirty_xswatch, lds->ret_path);
+    if (rc) goto out;
+
+    rc = libxl__ev_time_register_rel(ao, &lds->timeout,
+                                switch_logdirty_timeout, 10*1000);
+    if (rc) goto out;
+
+    for (;;) {
+        rc = libxl__xs_transaction_start(gc, &t);
+        if (rc) goto out;
+
+        rc = libxl__xs_read_checked(gc, t, lds->cmd_path, &got);
+        if (rc) goto out;
+
+        if (got) {
+            const char *got_ret;
+            rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got_ret);
+            if (rc) goto out;
+
+            if (!got_ret || strcmp(got, got_ret)) {
+                LOG(ERROR,"controlling logdirty: qemu was already sent"
+                    " command `%s' (xenstore path `%s') but result is `%s'",
+                    got, lds->cmd_path, got_ret ? got_ret : "<none>");
+                rc = ERROR_FAIL;
+                goto out;
+            }
+            rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
+            if (rc) goto out;
+        }
+
+        rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, lds->cmd_path, lds->cmd);
+        if (rc) goto out;
+
+        rc = libxl__xs_transaction_commit(gc, &t);
+        if (!rc) break;
+        if (rc<0) goto out;
+    }
+
+    /* OK, wait for some callback */
+    return;
+
+ out:
+    LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
+    libxl__xs_transaction_abort(gc, &t);
+    switch_logdirty_done(egc,dss,rc);
+}
+
+static void domain_suspend_switch_qemu_xen_logdirty
+                               (int domid, unsigned enable,
+                                libxl__save_helper_state *shs)
+{
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = shs->caller_state;
+    STATE_AO_GC(dss->ao);
+    int rc;
+
+    rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
+    if (!rc) {
+        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+    } else {
+        LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
+        dss->rc = rc;
+        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+    }
+}
+
+void libxl__domain_suspend_common_switch_qemu_logdirty
+                               (int domid, unsigned enable, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
+    libxl__domain_suspend_state *dss = shs->caller_state;
+    STATE_AO_GC(dss->ao);
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
+        break;
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
+        break;
+    case LIBXL_DEVICE_MODEL_VERSION_NONE:
+        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        break;
+    default:
+        LOG(ERROR,"logdirty switch failed"
+            ", no valid device model version found, abandoning suspend");
+        dss->rc = ERROR_FAIL;
+        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+    }
+}
+static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
+                                    const struct timeval *requested_abs,
+                                    int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
+    STATE_AO_GC(dss->ao);
+    LOG(ERROR,"logdirty switch: wait for device model timed out");
+    switch_logdirty_done(egc,dss,ERROR_FAIL);
+}
+
+static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
+                            const char *watch_path, const char *event_path)
+{
+    libxl__domain_suspend_state *dss =
+        CONTAINER_OF(watch, *dss, logdirty.watch);
+    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(dss->ao);
+    const char *got;
+    xs_transaction_t t = 0;
+    int rc;
+
+    for (;;) {
+        rc = libxl__xs_transaction_start(gc, &t);
+        if (rc) goto out;
+
+        rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got);
+        if (rc) goto out;
+
+        if (!got) {
+            rc = +1;
+            goto out;
+        }
+
+        if (strcmp(got, lds->cmd)) {
+            LOG(ERROR,"logdirty switch: sent command `%s' but got reply `%s'"
+                " (xenstore paths `%s' / `%s')", lds->cmd, got,
+                lds->cmd_path, lds->ret_path);
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
+        if (rc) goto out;
+
+        rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
+        if (rc) goto out;
+
+        rc = libxl__xs_transaction_commit(gc, &t);
+        if (!rc) break;
+        if (rc<0) goto out;
+    }
+
+ out:
+    /* rc < 0: error
+     * rc == 0: ok, we are done
+     * rc == +1: need to keep waiting
+     */
+    libxl__xs_transaction_abort(gc, &t);
+
+    if (rc <= 0) {
+        if (rc < 0)
+            LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
+        switch_logdirty_done(egc,dss,rc);
+    }
+}
+
+static void switch_logdirty_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss,
+                                 int rc)
+{
+    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = &dss->logdirty;
+
+    libxl__ev_xswatch_deregister(gc, &lds->watch);
+    libxl__ev_time_deregister(gc, &lds->timeout);
+
+    int broke;
+    if (rc) {
+        broke = -1;
+        dss->rc = rc;
+    } else {
+        broke = 0;
+    }
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, broke);
+}
+
+/*----- callbacks, called by xc_domain_save -----*/
+
+/*
+ * Expand the buffer 'buf' of length 'len', to append 'str' including its NUL
+ * terminator.
+ */
+static void append_string(libxl__gc *gc, char **buf, uint32_t *len,
+                          const char *str)
+{
+    size_t extralen = strlen(str) + 1;
+    char *new = libxl__realloc(gc, *buf, *len + extralen);
+
+    *buf = new;
+    memcpy(new + *len, str, extralen);
+    *len += extralen;
+}
+
+int libxl__save_emulator_xenstore_data(libxl__domain_suspend_state *dss,
+                                       char **callee_buf,
+                                       uint32_t *callee_len)
+{
+    STATE_AO_GC(dss->ao);
+    const char *xs_root;
+    char **entries, *buf = NULL;
+    unsigned int nr_entries, i, j, len = 0;
+    int rc;
+
+    const uint32_t domid = dss->domid;
+    const uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
+
+    xs_root = libxl__device_model_xs_path(gc, dm_domid, domid, "");
+
+    entries = libxl__xs_directory(gc, 0, GCSPRINTF("%s/physmap", xs_root),
+                                  &nr_entries);
+    if (!entries || nr_entries == 0) { rc = 0; goto out; }
+
+    for (i = 0; i < nr_entries; ++i) {
+        static const char *const physmap_subkeys[] = {
+            "start_addr", "size", "name"
+        };
+
+        for (j = 0; j < ARRAY_SIZE(physmap_subkeys); ++j) {
+            const char *key = GCSPRINTF("physmap/%s/%s",
+                                        entries[i], physmap_subkeys[j]);
+
+            const char *val =
+                libxl__xs_read(gc, XBT_NULL,
+                               GCSPRINTF("%s/%s", xs_root, key));
+
+            if (!val) { rc = ERROR_FAIL; goto out; }
+
+            append_string(gc, &buf, &len, key);
+            append_string(gc, &buf, &len, val);
+        }
+    }
+
+    rc = 0;
+
+ out:
+    if (!rc) {
+        *callee_buf = buf;
+        *callee_len = len;
+    }
+
+    return rc;
+}
+
+/*----- main code for saving, in order of execution -----*/
+
+void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
+{
+    STATE_AO_GC(dss->ao);
+    int port;
+    int rc, ret;
+
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+    const libxl_domain_type type = dss->type;
+    const int live = dss->live;
+    const int debug = dss->debug;
+    const libxl_domain_remus_info *const r_info = dss->remus;
+    libxl__srm_save_autogen_callbacks *const callbacks =
+        &dss->sws.shs.callbacks.save.a;
+    unsigned int nr_vnodes = 0, nr_vmemranges = 0, nr_vcpus = 0;
+
+    dss->rc = 0;
+    logdirty_init(&dss->logdirty);
+    libxl__xswait_init(&dss->pvcontrol);
+    libxl__ev_evtchn_init(&dss->guest_evtchn);
+    libxl__ev_xswatch_init(&dss->guest_watch);
+    libxl__ev_time_init(&dss->guest_timeout);
+
+    switch (type) {
+    case LIBXL_DOMAIN_TYPE_HVM: {
+        dss->hvm = 1;
+        break;
+    }
+    case LIBXL_DOMAIN_TYPE_PV:
+        dss->hvm = 0;
+        break;
+    default:
+        abort();
+    }
+
+    dss->xcflags = (live ? XCFLAGS_LIVE : 0)
+          | (debug ? XCFLAGS_DEBUG : 0)
+          | (dss->hvm ? XCFLAGS_HVM : 0);
+
+    /* Disallow saving a guest with vNUMA configured because migration
+     * stream does not preserve node information.
+     *
+     * Reject any domain which has vnuma enabled, even if the
+     * configuration is empty. Only domains which have no vnuma
+     * configuration at all are supported.
+     */
+    ret = xc_domain_getvnuma(CTX->xch, domid, &nr_vnodes, &nr_vmemranges,
+                             &nr_vcpus, NULL, NULL, NULL);
+    if (ret != -1 || errno != XEN_EOPNOTSUPP) {
+        LOG(ERROR, "Cannot save a guest with vNUMA configured");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    dss->guest_evtchn.port = -1;
+    dss->guest_evtchn_lockfd = -1;
+    dss->guest_responded = 0;
+    dss->dm_savefile = libxl__device_model_savefile(gc, domid);
+
+    if (r_info != NULL) {
+        dss->interval = r_info->interval;
+        dss->xcflags |= XCFLAGS_CHECKPOINTED;
+        if (libxl_defbool_val(r_info->compression))
+            dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
+    }
+
+    port = xs_suspend_evtchn_port(dss->domid);
+
+    if (port >= 0) {
+        rc = libxl__ctx_evtchn_init(gc);
+        if (rc) goto out;
+
+        dss->guest_evtchn.port =
+            xc_suspend_evtchn_init_exclusive(CTX->xch, CTX->xce,
+                                  dss->domid, port, &dss->guest_evtchn_lockfd);
+
+        if (dss->guest_evtchn.port < 0) {
+            LOG(WARN, "Suspend event channel initialization failed");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
+    memset(callbacks, 0, sizeof(*callbacks));
+    if (r_info != NULL) {
+        callbacks->suspend = libxl__remus_domain_suspend_callback;
+        callbacks->postcopy = libxl__remus_domain_resume_callback;
+        callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;
+    } else
+        callbacks->suspend = libxl__domain_suspend_callback;
+
+    callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
+
+    dss->sws.ao  = dss->ao;
+    dss->sws.dss = dss;
+    dss->sws.fd  = dss->fd;
+    dss->sws.completion_callback = stream_done;
+
+    libxl__stream_write_start(egc, &dss->sws);
+    return;
+
+ out:
+    domain_save_done(egc, dss, rc);
+}
+
+static void stream_done(libxl__egc *egc,
+                        libxl__stream_write_state *sws, int rc)
+{
+    domain_save_done(egc, sws->dss, rc);
+}
+
+static void domain_save_done(libxl__egc *egc,
+                             libxl__domain_suspend_state *dss, int rc)
+{
+    STATE_AO_GC(dss->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+
+    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
+
+    if (dss->guest_evtchn.port > 0)
+        xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
+                           dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
+
+    if (dss->remus) {
+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        libxl__remus_teardown(egc, dss, rc);
+        return;
+    }
+
+    dss->callback(egc, dss, rc);
+}
+
+/*========================= Domain restore ============================*/
+
+/*
+ * Inspect the buffer between start and end, and return a pointer to the
+ * character following the NUL terminator of start, or NULL if start is not
+ * terminated before end.
+ */
+static const char *next_string(const char *start, const char *end)
+{
+    if (start >= end) return NULL;
+
+    size_t total_len = end - start;
+    size_t len = strnlen(start, total_len);
+
+    if (len == total_len)
+        return NULL;
+    else
+        return start + len + 1;
+}
+
+int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
+                                          const char *ptr, uint32_t size)
+{
+    STATE_AO_GC(dcs->ao);
+    const char *next = ptr, *end = ptr + size, *key, *val;
+    int rc;
+
+    const uint32_t domid = dcs->guest_domid;
+    const uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
+    const char *xs_root = libxl__device_model_xs_path(gc, dm_domid, domid, "");
+
+    while (next < end) {
+        key = next;
+        next = next_string(next, end);
+
+        /* Sanitise 'key'. */
+        if (!next) {
+            rc = ERROR_FAIL;
+            LOG(ERROR, "Key in xenstore data not NUL terminated");
+            goto out;
+        }
+        if (key[0] == '\0') {
+            rc = ERROR_FAIL;
+            LOG(ERROR, "empty key found in xenstore data");
+            goto out;
+        }
+        if (key[0] == '/') {
+            rc = ERROR_FAIL;
+            LOG(ERROR, "Key in xenstore data not relative");
+            goto out;
+        }
+
+        val = next;
+        next = next_string(next, end);
+
+        /* Sanitise 'val'. */
+        if (!next) {
+            rc = ERROR_FAIL;
+            LOG(ERROR, "Val in xenstore data not NUL terminated");
+            goto out;
+        }
+
+        libxl__xs_printf(gc, XBT_NULL,
+                         GCSPRINTF("%s/%s", xs_root, key),
+                         "%s", val);
+    }
+
+    rc = 0;
+
+ out:
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (2 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:31   ` Konrad Rzeszutek Wilk
  2016-02-03 19:39   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests Wen Congyang
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Wen Congyang,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Shriram Rajagopalan,
	Ian Jackson, Yang Hongyang

Currently struct libxl__domain_suspend_state contains 2 type of states,
one is save state, another is suspend state. This patch separates those
two out.
The motivation of this is that COLO will need to do suspend/resume
continuously, we need a more common suspend state.

After this change, dss stands for libxl__domain_save_state,
dsps stands for libxl__domain_suspend_state.

Also introduce libxl__domain_suspend_init to initialise the
libxl__domain_suspend_state.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by:Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl.c              |  10 +-
 tools/libxl/libxl_create.c       |  10 +-
 tools/libxl/libxl_dom_save.c     |  61 ++++--------
 tools/libxl/libxl_dom_suspend.c  | 207 ++++++++++++++++++++++++---------------
 tools/libxl/libxl_internal.h     |  61 +++++++-----
 tools/libxl/libxl_netbuffer.c    |   2 +-
 tools/libxl/libxl_remus.c        |  37 +++----
 tools/libxl/libxl_save_callout.c |   2 +-
 tools/libxl/libxl_stream_write.c |  16 +--
 9 files changed, 227 insertions(+), 179 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6347097..8707b08 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -832,7 +832,7 @@ out:
 }
 
 static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc);
+                              libxl__domain_save_state *dss, int rc);
 
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
@@ -840,7 +840,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
                              const libxl_asyncop_how *ao_how)
 {
     AO_CREATE(ctx, domid, ao_how);
-    libxl__domain_suspend_state *dss;
+    libxl__domain_save_state *dss;
     int rc;
 
     libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -888,7 +888,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 }
 
 static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc)
+                              libxl__domain_save_state *dss, int rc)
 {
     STATE_AO_GC(dss->ao);
     /*
@@ -900,7 +900,7 @@ static void remus_failover_cb(libxl__egc *egc,
 }
 
 static void domain_suspend_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc)
+                              libxl__domain_save_state *dss, int rc)
 {
     STATE_AO_GC(dss->ao);
     int flrc;
@@ -925,7 +925,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags,
         goto out_err;
     }
 
-    libxl__domain_suspend_state *dss;
+    libxl__domain_save_state *dss;
     GCNEW(dss);
 
     dss->ao = ao;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 6f1cf93..91c78e5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1549,7 +1549,7 @@ typedef struct {
 typedef struct {
     libxl__app_domain_create_state cdcs;
     libxl__domain_destroy_state dds;
-    libxl__domain_suspend_state dss;
+    libxl__domain_save_state dss;
     char *toolstack_buf;
     uint32_t toolstack_len;
 } libxl__domain_soft_reset_state;
@@ -1644,7 +1644,7 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
     libxl__app_domain_create_state *cdcs;
     libxl__domain_create_state *dcs;
     libxl__domain_build_state *state;
-    libxl__domain_suspend_state *dss;
+    libxl__domain_save_state *dss;
     char *dom_path, *xs_store_mfn, *xs_console_mfn;
     uint32_t domid_out;
     int rc;
@@ -1688,8 +1688,8 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
 
     dss->ao = ao;
     dss->domid = domid_soft_reset;
-    dss->dm_savefile = GCSPRINTF(LIBXL_DEVICE_MODEL_SAVE_FILE".%d",
-                                 domid_soft_reset);
+    dss->dsps.dm_savefile = GCSPRINTF(LIBXL_DEVICE_MODEL_SAVE_FILE".%d",
+                                      domid_soft_reset);
 
     rc = libxl__save_emulator_xenstore_data(dss, &srs->toolstack_buf,
                                             &srs->toolstack_len);
@@ -1698,7 +1698,7 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
         goto out;
     }
 
-    rc = libxl__domain_suspend_device_model(gc, dss);
+    rc = libxl__domain_suspend_device_model(gc, &dss->dsps);
     if (rc) {
         LOG(ERROR, "failed to suspend device model.");
         goto out;
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 27fd58b..02cc143 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -24,7 +24,7 @@
 static void stream_done(libxl__egc *egc,
                         libxl__stream_write_state *sws, int rc);
 static void domain_save_done(libxl__egc *egc,
-                             libxl__domain_suspend_state *dss, int rc);
+                             libxl__domain_save_state *dss, int rc);
 
 /*----- complicated callback, called by xc_domain_save -----*/
 
@@ -42,7 +42,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss, int rc);
+                                 libxl__domain_save_state *dss, int rc);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -56,7 +56,7 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
                                 libxl__save_helper_state *shs)
 {
     libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
     libxl__logdirty_switch *lds = &dss->logdirty;
     STATE_AO_GC(dss->ao);
     int rc;
@@ -128,7 +128,7 @@ static void domain_suspend_switch_qemu_xen_logdirty
                                 libxl__save_helper_state *shs)
 {
     libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
     STATE_AO_GC(dss->ao);
     int rc;
 
@@ -147,7 +147,7 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
 {
     libxl__save_helper_state *shs = user;
     libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
     STATE_AO_GC(dss->ao);
 
     switch (libxl__device_model_version_running(gc, domid)) {
@@ -171,7 +171,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
                                     const struct timeval *requested_abs,
                                     int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
+    libxl__domain_save_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
     STATE_AO_GC(dss->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
     switch_logdirty_done(egc,dss,ERROR_FAIL);
@@ -180,7 +180,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
                             const char *watch_path, const char *event_path)
 {
-    libxl__domain_suspend_state *dss =
+    libxl__domain_save_state *dss =
         CONTAINER_OF(watch, *dss, logdirty.watch);
     libxl__logdirty_switch *lds = &dss->logdirty;
     STATE_AO_GC(dss->ao);
@@ -234,7 +234,7 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss,
+                                 libxl__domain_save_state *dss,
                                  int rc)
 {
     STATE_AO_GC(dss->ao);
@@ -270,7 +270,7 @@ static void append_string(libxl__gc *gc, char **buf, uint32_t *len,
     *len += extralen;
 }
 
-int libxl__save_emulator_xenstore_data(libxl__domain_suspend_state *dss,
+int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
                                        char **callee_buf,
                                        uint32_t *callee_len)
 {
@@ -322,10 +322,9 @@ int libxl__save_emulator_xenstore_data(libxl__domain_suspend_state *dss,
 
 /*----- main code for saving, in order of execution -----*/
 
-void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
+void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
 {
     STATE_AO_GC(dss->ao);
-    int port;
     int rc, ret;
 
     /* Convenience aliases */
@@ -337,13 +336,14 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->sws.shs.callbacks.save.a;
     unsigned int nr_vnodes = 0, nr_vmemranges = 0, nr_vcpus = 0;
+    libxl__domain_suspend_state *dsps = &dss->dsps;
 
     dss->rc = 0;
     logdirty_init(&dss->logdirty);
-    libxl__xswait_init(&dss->pvcontrol);
-    libxl__ev_evtchn_init(&dss->guest_evtchn);
-    libxl__ev_xswatch_init(&dss->guest_watch);
-    libxl__ev_time_init(&dss->guest_timeout);
+    dsps->ao = ao;
+    dsps->domid = domid;
+    rc = libxl__domain_suspend_init(egc, dsps, type);
+    if (rc) goto out;
 
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
@@ -376,11 +376,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
         goto out;
     }
 
-    dss->guest_evtchn.port = -1;
-    dss->guest_evtchn_lockfd = -1;
-    dss->guest_responded = 0;
-    dss->dm_savefile = libxl__device_model_savefile(gc, domid);
-
     if (r_info != NULL) {
         dss->interval = r_info->interval;
         dss->xcflags |= XCFLAGS_CHECKPOINTED;
@@ -388,23 +383,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
 
-    port = xs_suspend_evtchn_port(dss->domid);
-
-    if (port >= 0) {
-        rc = libxl__ctx_evtchn_init(gc);
-        if (rc) goto out;
-
-        dss->guest_evtchn.port =
-            xc_suspend_evtchn_init_exclusive(CTX->xch, CTX->xce,
-                                  dss->domid, port, &dss->guest_evtchn_lockfd);
-
-        if (dss->guest_evtchn.port < 0) {
-            LOG(WARN, "Suspend event channel initialization failed");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-    }
-
     memset(callbacks, 0, sizeof(*callbacks));
     if (r_info != NULL) {
         callbacks->suspend = libxl__remus_domain_suspend_callback;
@@ -434,18 +412,19 @@ static void stream_done(libxl__egc *egc,
 }
 
 static void domain_save_done(libxl__egc *egc,
-                             libxl__domain_suspend_state *dss, int rc)
+                             libxl__domain_save_state *dss, int rc)
 {
     STATE_AO_GC(dss->ao);
 
     /* Convenience aliases */
     const uint32_t domid = dss->domid;
+    libxl__domain_suspend_state *dsps = &dss->dsps;
 
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
+    libxl__ev_evtchn_cancel(gc, &dsps->guest_evtchn);
 
-    if (dss->guest_evtchn.port > 0)
+    if (dsps->guest_evtchn.port > 0)
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
-                           dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
+                        dsps->guest_evtchn.port, &dsps->guest_evtchn_lockfd);
 
     if (dss->remus) {
         /*
diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c
index 16f603f..cc0b217 100644
--- a/tools/libxl/libxl_dom_suspend.c
+++ b/tools/libxl/libxl_dom_suspend.c
@@ -19,14 +19,61 @@
 
 /*====================== Domain suspend =======================*/
 
+int libxl__domain_suspend_init(libxl__egc *egc,
+                               libxl__domain_suspend_state *dsps,
+                               libxl_domain_type type)
+{
+    STATE_AO_GC(dsps->ao);
+    int rc = ERROR_FAIL;
+    int port;
+
+    /* Convenience aliases */
+    const uint32_t domid = dsps->domid;
+
+    libxl__xswait_init(&dsps->pvcontrol);
+    libxl__ev_evtchn_init(&dsps->guest_evtchn);
+    libxl__ev_xswatch_init(&dsps->guest_watch);
+    libxl__ev_time_init(&dsps->guest_timeout);
+
+    if (type == LIBXL_DOMAIN_TYPE_INVALID) goto out;
+    dsps->type = type;
+
+    dsps->guest_evtchn.port = -1;
+    dsps->guest_evtchn_lockfd = -1;
+    dsps->guest_responded = 0;
+    dsps->dm_savefile = libxl__device_model_savefile(gc, domid);
+
+    port = xs_suspend_evtchn_port(domid);
+
+    if (port >= 0) {
+        rc = libxl__ctx_evtchn_init(gc);
+        if (rc) goto out;
+
+        dsps->guest_evtchn.port =
+            xc_suspend_evtchn_init_exclusive(CTX->xch, CTX->xce,
+                                    domid, port, &dsps->guest_evtchn_lockfd);
+
+        if (dsps->guest_evtchn.port < 0) {
+            LOG(WARN, "Suspend event channel initialization failed");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
 /*----- callbacks, called by xc_domain_save -----*/
 
 int libxl__domain_suspend_device_model(libxl__gc *gc,
-                                       libxl__domain_suspend_state *dss)
+                                       libxl__domain_suspend_state *dsps)
 {
     int ret = 0;
-    uint32_t const domid = dss->domid;
-    const char *const filename = dss->dm_savefile;
+    uint32_t const domid = dsps->domid;
+    const char *const filename = dsps->dm_savefile;
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
@@ -53,9 +100,9 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
 }
 
 static void domain_suspend_common_wait_guest(libxl__egc *egc,
-                                             libxl__domain_suspend_state *dss);
+                                             libxl__domain_suspend_state *dsps);
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss);
+                                         libxl__domain_suspend_state *dsps);
 
 static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
       libxl__xswait_state *xswa, int rc, const char *state);
@@ -64,24 +111,24 @@ static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path);
 static void suspend_common_wait_guest_check(libxl__egc *egc,
-        libxl__domain_suspend_state *dss);
+        libxl__domain_suspend_state *dsps);
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs, int rc);
 
 static void domain_suspend_common_done(libxl__egc *egc,
-                                       libxl__domain_suspend_state *dss,
+                                       libxl__domain_suspend_state *dsps,
                                        int rc);
 
 static void domain_suspend_callback_common(libxl__egc *egc,
-                                           libxl__domain_suspend_state *dss);
+                                           libxl__domain_suspend_state *dsps);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int rc);
+                                libxl__domain_suspend_state *dsps, int rc);
 
-/* calls dss->callback_common_done when done */
+/* calls dsps->callback_common_done when done */
 void libxl__domain_suspend(libxl__egc *egc,
-                           libxl__domain_suspend_state *dss)
+                           libxl__domain_suspend_state *dsps)
 {
-    domain_suspend_callback_common(egc, dss);
+    domain_suspend_callback_common(egc, dsps);
 }
 
 static bool domain_suspend_pvcontrol_acked(const char *state) {
@@ -90,37 +137,37 @@ static bool domain_suspend_pvcontrol_acked(const char *state) {
     return strcmp(state,"suspend");
 }
 
-/* calls dss->callback_common_done when done */
+/* calls dsps->callback_common_done when done */
 static void domain_suspend_callback_common(libxl__egc *egc,
-                                           libxl__domain_suspend_state *dss)
+                                           libxl__domain_suspend_state *dsps)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dsps->ao);
     uint64_t hvm_s_state = 0, hvm_pvdrv = 0;
     int ret, rc;
 
     /* Convenience aliases */
-    const uint32_t domid = dss->domid;
+    const uint32_t domid = dsps->domid;
 
-    if (dss->hvm) {
+    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM) {
         xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, &hvm_pvdrv);
         xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, &hvm_s_state);
     }
 
-    if ((hvm_s_state == 0) && (dss->guest_evtchn.port >= 0)) {
+    if ((hvm_s_state == 0) && (dsps->guest_evtchn.port >= 0)) {
         LOG(DEBUG, "issuing %s suspend request via event channel",
-            dss->hvm ? "PVHVM" : "PV");
-        ret = xenevtchn_notify(CTX->xce, dss->guest_evtchn.port);
+            dsps->type == LIBXL_DOMAIN_TYPE_HVM ? "PVHVM" : "PV");
+        ret = xenevtchn_notify(CTX->xce, dsps->guest_evtchn.port);
         if (ret < 0) {
             LOG(ERROR, "xenevtchn_notify failed ret=%d", ret);
             rc = ERROR_FAIL;
             goto err;
         }
 
-        dss->guest_evtchn.callback = domain_suspend_common_wait_guest_evtchn;
-        rc = libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
+        dsps->guest_evtchn.callback = domain_suspend_common_wait_guest_evtchn;
+        rc = libxl__ev_evtchn_wait(gc, &dsps->guest_evtchn);
         if (rc) goto err;
 
-        rc = libxl__ev_time_register_rel(ao, &dss->guest_timeout,
+        rc = libxl__ev_time_register_rel(ao, &dsps->guest_timeout,
                                          suspend_common_wait_guest_timeout,
                                          60*1000);
         if (rc) goto err;
@@ -128,7 +175,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
         return;
     }
 
-    if (dss->hvm && (!hvm_pvdrv || hvm_s_state)) {
+    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM && (!hvm_pvdrv || hvm_s_state)) {
         LOG(DEBUG, "Calling xc_domain_shutdown on HVM domain");
         ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
         if (ret < 0) {
@@ -137,55 +184,55 @@ static void domain_suspend_callback_common(libxl__egc *egc,
             goto err;
         }
         /* The guest does not (need to) respond to this sort of request. */
-        dss->guest_responded = 1;
-        domain_suspend_common_wait_guest(egc, dss);
+        dsps->guest_responded = 1;
+        domain_suspend_common_wait_guest(egc, dsps);
         return;
     }
 
     LOG(DEBUG, "issuing %s suspend request via XenBus control node",
-        dss->hvm ? "PVHVM" : "PV");
+        dsps->type == LIBXL_DOMAIN_TYPE_HVM ? "PVHVM" : "PV");
 
     libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend");
 
-    dss->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
-    if (!dss->pvcontrol.path) { rc = ERROR_FAIL; goto err; }
+    dsps->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
+    if (!dsps->pvcontrol.path) { rc = ERROR_FAIL; goto err; }
 
-    dss->pvcontrol.ao = ao;
-    dss->pvcontrol.what = "guest acknowledgement of suspend request";
-    dss->pvcontrol.timeout_ms = 60 * 1000;
-    dss->pvcontrol.callback = domain_suspend_common_pvcontrol_suspending;
-    libxl__xswait_start(gc, &dss->pvcontrol);
+    dsps->pvcontrol.ao = ao;
+    dsps->pvcontrol.what = "guest acknowledgement of suspend request";
+    dsps->pvcontrol.timeout_ms = 60 * 1000;
+    dsps->pvcontrol.callback = domain_suspend_common_pvcontrol_suspending;
+    libxl__xswait_start(gc, &dsps->pvcontrol);
     return;
 
  err:
-    domain_suspend_common_done(egc, dss, rc);
+    domain_suspend_common_done(egc, dsps, rc);
 }
 
 static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
         libxl__ev_evtchn *evev)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(evev, *dss, guest_evtchn);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state *dsps = CONTAINER_OF(evev, *dsps, guest_evtchn);
+    STATE_AO_GC(dsps->ao);
     /* If we should be done waiting, suspend_common_wait_guest_check
      * will end up calling domain_suspend_common_guest_suspended or
      * domain_suspend_common_done, both of which cancel the evtchn
      * wait as needed.  So re-enable it now. */
-    libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
-    suspend_common_wait_guest_check(egc, dss);
+    libxl__ev_evtchn_wait(gc, &dsps->guest_evtchn);
+    suspend_common_wait_guest_check(egc, dsps);
 }
 
 static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
       libxl__xswait_state *xswa, int rc, const char *state)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(xswa, *dss, pvcontrol);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state *dsps = CONTAINER_OF(xswa, *dsps, pvcontrol);
+    STATE_AO_GC(dsps->ao);
     xs_transaction_t t = 0;
 
     if (!rc && !domain_suspend_pvcontrol_acked(state))
         /* keep waiting */
         return;
 
-    libxl__xswait_stop(gc, &dss->pvcontrol);
+    libxl__xswait_stop(gc, &dsps->pvcontrol);
 
     if (rc == ERROR_TIMEDOUT) {
         /*
@@ -228,56 +275,56 @@ static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
     LOG(DEBUG, "guest acknowledged suspend request");
 
     libxl__xs_transaction_abort(gc, &t);
-    dss->guest_responded = 1;
-    domain_suspend_common_wait_guest(egc,dss);
+    dsps->guest_responded = 1;
+    domain_suspend_common_wait_guest(egc,dsps);
     return;
 
  err:
     libxl__xs_transaction_abort(gc, &t);
-    domain_suspend_common_done(egc, dss, rc);
+    domain_suspend_common_done(egc, dsps, rc);
     return;
 }
 
 static void domain_suspend_common_wait_guest(libxl__egc *egc,
-                                             libxl__domain_suspend_state *dss)
+                                             libxl__domain_suspend_state *dsps)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dsps->ao);
     int rc;
 
     LOG(DEBUG, "wait for the guest to suspend");
 
-    rc = libxl__ev_xswatch_register(gc, &dss->guest_watch,
+    rc = libxl__ev_xswatch_register(gc, &dsps->guest_watch,
                                     suspend_common_wait_guest_watch,
                                     "@releaseDomain");
     if (rc) goto err;
 
-    rc = libxl__ev_time_register_rel(ao, &dss->guest_timeout,
+    rc = libxl__ev_time_register_rel(ao, &dsps->guest_timeout,
                                      suspend_common_wait_guest_timeout,
                                      60*1000);
     if (rc) goto err;
     return;
 
  err:
-    domain_suspend_common_done(egc, dss, rc);
+    domain_suspend_common_done(egc, dsps, rc);
 }
 
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(xsw, *dss, guest_watch);
-    suspend_common_wait_guest_check(egc, dss);
+    libxl__domain_suspend_state *dsps = CONTAINER_OF(xsw, *dsps, guest_watch);
+    suspend_common_wait_guest_check(egc, dsps);
 }
 
 static void suspend_common_wait_guest_check(libxl__egc *egc,
-        libxl__domain_suspend_state *dss)
+        libxl__domain_suspend_state *dsps)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dsps->ao);
     xc_domaininfo_t info;
     int ret;
     int shutdown_reason;
 
     /* Convenience aliases */
-    const uint32_t domid = dss->domid;
+    const uint32_t domid = dsps->domid;
 
     ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info);
     if (ret < 0) {
@@ -304,71 +351,73 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
     }
 
     LOG(DEBUG, "guest has suspended");
-    domain_suspend_common_guest_suspended(egc, dss);
+    domain_suspend_common_guest_suspended(egc, dsps);
     return;
 
  err:
-    domain_suspend_common_done(egc, dss, ERROR_FAIL);
+    domain_suspend_common_done(egc, dsps, ERROR_FAIL);
 }
 
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, guest_timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, guest_timeout);
+    STATE_AO_GC(dsps->ao);
     if (rc == ERROR_TIMEDOUT) {
         LOG(ERROR, "guest did not suspend, timed out");
         rc = ERROR_GUEST_TIMEDOUT;
     }
-    domain_suspend_common_done(egc, dss, rc);
+    domain_suspend_common_done(egc, dsps, rc);
 }
 
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss)
+                                         libxl__domain_suspend_state *dsps)
 {
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(dsps->ao);
     int rc;
 
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
-    libxl__ev_time_deregister(gc, &dss->guest_timeout);
+    libxl__ev_evtchn_cancel(gc, &dsps->guest_evtchn);
+    libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
+    libxl__ev_time_deregister(gc, &dsps->guest_timeout);
 
-    if (dss->hvm) {
-        rc = libxl__domain_suspend_device_model(gc, dss);
+    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM) {
+        rc = libxl__domain_suspend_device_model(gc, dsps);
         if (rc) {
             LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", rc);
-            domain_suspend_common_done(egc, dss, rc);
+            domain_suspend_common_done(egc, dsps, rc);
             return;
         }
     }
-    domain_suspend_common_done(egc, dss, 0);
+    domain_suspend_common_done(egc, dsps, 0);
 }
 
 static void domain_suspend_common_done(libxl__egc *egc,
-                                       libxl__domain_suspend_state *dss,
+                                       libxl__domain_suspend_state *dsps,
                                        int rc)
 {
     EGC_GC;
-    assert(!libxl__xswait_inuse(&dss->pvcontrol));
-    libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
-    libxl__ev_time_deregister(gc, &dss->guest_timeout);
-    dss->callback_common_done(egc, dss, rc);
+    assert(!libxl__xswait_inuse(&dsps->pvcontrol));
+    libxl__ev_evtchn_cancel(gc, &dsps->guest_evtchn);
+    libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
+    libxl__ev_time_deregister(gc, &dsps->guest_timeout);
+    dsps->callback_common_done(egc, dsps, rc);
 }
 
 void libxl__domain_suspend_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
+    libxl__domain_suspend_state *dsps = &dss->dsps;
 
-    dss->callback_common_done = domain_suspend_callback_common_done;
-    domain_suspend_callback_common(egc, dss);
+    dsps->callback_common_done = domain_suspend_callback_common_done;
+    domain_suspend_callback_common(egc, dsps);
 }
 
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int rc)
+                                libxl__domain_suspend_state *dsps, int rc)
 {
+    libxl__domain_save_state *dss = CONTAINER_OF(dsps, *dss, dsps);
     dss->rc = rc;
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7005d6b..bc48bec 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3009,11 +3009,12 @@ static inline bool libxl__conversion_helper_inuse
  */
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
+typedef struct libxl__domain_save_state libxl__domain_save_state;
 
-typedef void libxl__domain_suspend_cb(libxl__egc*,
-                                      libxl__domain_suspend_state*, int rc);
+typedef void libxl__domain_save_cb(libxl__egc*,
+                                   libxl__domain_save_state*, int rc);
 typedef void libxl__save_device_model_cb(libxl__egc*,
-                                         libxl__domain_suspend_state*, int rc);
+                                         libxl__domain_save_state*, int rc);
 
 /* State for writing a libxl migration v2 stream */
 typedef struct libxl__stream_write_state libxl__stream_write_state;
@@ -3022,7 +3023,7 @@ typedef void (*sws_record_done_cb)(libxl__egc *egc,
 struct libxl__stream_write_state {
     /* filled by the user */
     libxl__ao *ao;
-    libxl__domain_suspend_state *dss;
+    libxl__domain_save_state *dss;
     int fd;
     void (*completion_callback)(libxl__egc *egc,
                                 libxl__stream_write_state *sws,
@@ -3076,9 +3077,33 @@ typedef struct libxl__logdirty_switch {
 } libxl__logdirty_switch;
 
 struct libxl__domain_suspend_state {
+    /* set by caller of libxl__domain_suspend_init */
+    libxl__ao *ao;
+    uint32_t domid;
+
+    /* private */
+    libxl_domain_type type;
+
+    libxl__ev_evtchn guest_evtchn;
+    int guest_evtchn_lockfd;
+    int guest_responded;
+
+    libxl__xswait_state pvcontrol;
+    libxl__ev_xswatch guest_watch;
+    libxl__ev_time guest_timeout;
+
+    const char *dm_savefile;
+    void (*callback_common_done)(libxl__egc*,
+                                 struct libxl__domain_suspend_state*, int ok);
+};
+int libxl__domain_suspend_init(libxl__egc *egc,
+                               libxl__domain_suspend_state *dsps,
+                               libxl_domain_type type);
+
+struct libxl__domain_save_state {
     /* set by caller of libxl__domain_save */
     libxl__ao *ao;
-    libxl__domain_suspend_cb *callback;
+    libxl__domain_save_cb *callback;
 
     uint32_t domid;
     int fd;
@@ -3089,22 +3114,14 @@ struct libxl__domain_suspend_state {
     const libxl_domain_remus_info *remus;
     /* private */
     int rc;
-    libxl__ev_evtchn guest_evtchn;
-    int guest_evtchn_lockfd;
     int hvm;
     int xcflags;
-    int guest_responded;
-    libxl__xswait_state pvcontrol;
-    libxl__ev_xswatch guest_watch;
-    libxl__ev_time guest_timeout;
-    const char *dm_savefile;
+    libxl__domain_suspend_state dsps;
     libxl__remus_devices_state rds;
     libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__stream_write_state sws;
     libxl__logdirty_switch logdirty;
-    void (*callback_common_done)(libxl__egc*,
-                                 struct libxl__domain_suspend_state*, int ok);
 };
 
 
@@ -3445,12 +3462,12 @@ struct libxl__domain_create_state {
 
 /* calls dss->callback when done */
 _hidden void libxl__domain_save(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss);
+                                libxl__domain_save_state *dss);
 
 
 /* calls libxl__xc_domain_suspend_done when done */
 _hidden void libxl__xc_domain_save(libxl__egc *egc,
-                                   libxl__domain_suspend_state *dss,
+                                   libxl__domain_save_state *dss,
                                    libxl__save_helper_state *shs);
 /* If rc==0 then retval is the return value from xc_domain_save
  * and errnoval is the errno value it provided.
@@ -3468,7 +3485,7 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
-_hidden int libxl__save_emulator_xenstore_data(libxl__domain_suspend_state *dss,
+_hidden int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
                                                char **buf, uint32_t *len);
 _hidden int libxl__restore_emulator_xenstore_data
     (libxl__domain_create_state *dcs, const char *ptr, uint32_t size);
@@ -3496,13 +3513,13 @@ static inline bool libxl__save_helper_inuse(const libxl__save_helper_state *shs)
 
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
-                                           libxl__domain_suspend_state *dss);
+                                           libxl__domain_suspend_state *dsps);
 
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 
-/* calls dss->callback_common_done when done */
+/* calls dsps->callback_common_done when done */
 _hidden void libxl__domain_suspend(libxl__egc *egc,
-                                   libxl__domain_suspend_state *dss);
+                                   libxl__domain_suspend_state *dsps);
 /* used by libxc to suspend the guest during migration */
 _hidden void libxl__domain_suspend_callback(void *data);
 
@@ -3512,9 +3529,9 @@ _hidden void libxl__remus_domain_resume_callback(void *data);
 _hidden void libxl__remus_domain_save_checkpoint_callback(void *data);
 /* Remus setup and teardown*/
 _hidden void libxl__remus_setup(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss);
+                                libxl__domain_save_state *dss);
 _hidden void libxl__remus_teardown(libxl__egc *egc,
-                                   libxl__domain_suspend_state *dss,
+                                   libxl__domain_save_state *dss,
                                    int rc);
 /* Remus callbacks for restore */
 _hidden void libxl__remus_domain_restore_checkpoint_callback(void *data);
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 107e867..c245a4e 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -41,7 +41,7 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 int init_subkind_nic(libxl__remus_devices_state *rds)
 {
     int rc, ret;
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     STATE_AO_GC(rds->ao);
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index e3caf7d..fae2120 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -28,7 +28,7 @@ static void remus_checkpoint_stream_written(
     libxl__egc *egc, libxl__stream_write_state *sws, int rc);
 
 void libxl__remus_setup(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss)
+                        libxl__domain_save_state *dss)
 {
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
@@ -63,7 +63,7 @@ out:
 static void remus_setup_done(libxl__egc *egc,
                              libxl__remus_devices_state *rds, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -80,7 +80,7 @@ static void remus_setup_done(libxl__egc *egc,
 static void remus_setup_failed(libxl__egc *egc,
                                libxl__remus_devices_state *rds, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -94,7 +94,7 @@ static void remus_teardown_done(libxl__egc *egc,
                                 libxl__remus_devices_state *rds,
                                 int rc);
 void libxl__remus_teardown(libxl__egc *egc,
-                           libxl__domain_suspend_state *dss,
+                           libxl__domain_save_state *dss,
                            int rc)
 {
     EGC_GC;
@@ -109,7 +109,7 @@ static void remus_teardown_done(libxl__egc *egc,
                                 libxl__remus_devices_state *rds,
                                 int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -122,7 +122,7 @@ static void remus_teardown_done(libxl__egc *egc,
 /*---------------------- remus callbacks (save) -----------------------*/
 
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
+                                libxl__domain_suspend_state *dsps, int ok);
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
                                          libxl__remus_devices_state *rds,
                                          int rc);
@@ -134,15 +134,18 @@ void libxl__remus_domain_suspend_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
+    libxl__domain_suspend_state *dsps = &dss->dsps;
 
-    dss->callback_common_done = remus_domain_suspend_callback_common_done;
-    libxl__domain_suspend(egc, dss);
+    dsps->callback_common_done = remus_domain_suspend_callback_common_done;
+    libxl__domain_suspend(egc, dsps);
 }
 
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int rc)
+                                libxl__domain_suspend_state *dsps, int rc)
 {
+    libxl__domain_save_state *dss = CONTAINER_OF(dsps, *dss, dsps);
+
     if (rc)
         goto out;
 
@@ -160,7 +163,7 @@ static void remus_devices_postsuspend_cb(libxl__egc *egc,
                                          libxl__remus_devices_state *rds,
                                          int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     if (rc)
         goto out;
@@ -177,7 +180,7 @@ void libxl__remus_domain_resume_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
     STATE_AO_GC(dss->ao);
 
     libxl__remus_devices_state *const rds = &dss->rds;
@@ -189,7 +192,7 @@ static void remus_devices_preresume_cb(libxl__egc *egc,
                                        libxl__remus_devices_state *rds,
                                        int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -220,7 +223,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
 void libxl__remus_domain_save_checkpoint_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
-    libxl__domain_suspend_state *dss = shs->caller_state;
+    libxl__domain_save_state *dss = shs->caller_state;
     libxl__egc *egc = shs->egc;
     STATE_AO_GC(dss->ao);
 
@@ -230,7 +233,7 @@ void libxl__remus_domain_save_checkpoint_callback(void *data)
 static void remus_checkpoint_stream_written(
     libxl__egc *egc, libxl__stream_write_state *sws, int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
+    libxl__domain_save_state *dss = CONTAINER_OF(sws, *dss, sws);
 
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
@@ -255,7 +258,7 @@ static void remus_devices_commit_cb(libxl__egc *egc,
                                     libxl__remus_devices_state *rds,
                                     int rc)
 {
-    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     STATE_AO_GC(dss->ao);
 
@@ -290,7 +293,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs,
                                   int rc)
 {
-    libxl__domain_suspend_state *dss =
+    libxl__domain_save_state *dss =
                             CONTAINER_OF(ev, *dss, checkpoint_timeout);
 
     STATE_AO_GC(dss->ao);
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 3af99af..2d06b42 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -75,7 +75,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
                argnums, ARRAY_SIZE(argnums));
 }
 
-void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
+void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
                            libxl__save_helper_state *shs)
 {
     STATE_AO_GC(dss->ao);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 21b4b51..9053146 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -216,7 +216,7 @@ void libxl__stream_write_start(libxl__egc *egc,
                                libxl__stream_write_state *stream)
 {
     libxl__datacopier_state *dc = &stream->dc;
-    libxl__domain_suspend_state *dss = stream->dss;
+    libxl__domain_save_state *dss = stream->dss;
     STATE_AO_GC(stream->ao);
     struct libxl__sr_hdr hdr;
     int rc = 0;
@@ -324,7 +324,7 @@ static void libxc_header_done(libxl__egc *egc,
 void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
                                 int rc, int retval, int errnoval)
 {
-    libxl__domain_suspend_state *dss = dss_void;
+    libxl__domain_save_state *dss = dss_void;
     libxl__stream_write_state *stream = &dss->sws;
     STATE_AO_GC(dss->ao);
 
@@ -333,10 +333,10 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
 
     if (retval) {
         LOGEV(ERROR, errnoval, "saving domain: %s",
-              dss->guest_responded ?
+              dss->dsps.guest_responded ?
               "domain responded to suspend request" :
               "domain did not respond to suspend request");
-        if (!dss->guest_responded)
+        if (!dss->dsps.guest_responded)
             rc = ERROR_GUEST_TIMEDOUT;
         else if (dss->rc)
             rc = dss->rc;
@@ -371,7 +371,7 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
 static void write_emulator_xenstore_record(libxl__egc *egc,
                                            libxl__stream_write_state *stream)
 {
-    libxl__domain_suspend_state *dss = stream->dss;
+    libxl__domain_save_state *dss = stream->dss;
     STATE_AO_GC(stream->ao);
     struct libxl__sr_rec_hdr rec;
     int rc;
@@ -410,7 +410,7 @@ static void write_emulator_xenstore_record(libxl__egc *egc,
 static void emulator_xenstore_record_done(libxl__egc *egc,
                                           libxl__stream_write_state *stream)
 {
-    libxl__domain_suspend_state *dss = stream->dss;
+    libxl__domain_save_state *dss = stream->dss;
 
     if (dss->type == LIBXL_DOMAIN_TYPE_HVM)
         write_emulator_context_record(egc, stream);
@@ -425,7 +425,7 @@ static void emulator_xenstore_record_done(libxl__egc *egc,
 static void write_emulator_context_record(libxl__egc *egc,
                                           libxl__stream_write_state *stream)
 {
-    libxl__domain_suspend_state *dss = stream->dss;
+    libxl__domain_save_state *dss = stream->dss;
     libxl__datacopier_state *dc = &stream->emu_dc;
     STATE_AO_GC(stream->ao);
     struct libxl__sr_rec_hdr *rec = &stream->emu_rec_hdr;
@@ -440,7 +440,7 @@ static void write_emulator_context_record(libxl__egc *egc,
     }
 
     /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
+    const char *const filename = dss->dsps.dm_savefile;
 
     libxl__carefd_begin();
     int readfd = open(filename, O_RDONLY);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (3 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:30   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream Wen Congyang
                   ` (13 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

Before this patch:
1. suspend
a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
   request to the guest). If the guest doesn't support evtchn, the xenstore
   variant will be used, suspending the guest via XenBus control node.
b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
   the guest

2. Resume:
a. fast path(fast=1)
   Do not change the guest state. We call libxl__domain_resume(.., 1) which
   calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
   PV:       modify the return code to 1, and than call the domctl:
             XEN_DOMCTL_resumedomain
   PVHVM:    same with PV
   pure HVM: do nothing in modify_returncode, and than call the domctl:
             XEN_DOMCTL_resumedomain
b. slow
   Used when the guest's state have been changed. Will call
   libxl__domain_resume(..., 0) to resume the guest.
   PV:       update start info, and reset all secondary CPU states. Than call
             the domctl: XEN_DOMCTL_resumedomain
   PVHVM:    can not be resumed. You will get the following error message:
                 "Cannot resume uncooperative HVM guests"
   purt HVM: same with PVHVM

After this patch:
1. suspend
   unchanged

2. Resume
a. fast path:
   unchanged
b. slow
   PV:       unchanged
   PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
             don't modify the return code, the PV driver will disconnect
             and reconnect.
             The guest ends up doing the XENMAPSPACE_shared_info
             XENMEM_add_to_physmap hypercall and resetting all of its CPU
             states to point to the shared_info(well except the ones past 32).
             That is the Linux kernel does that - regardless whether the
             SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
   Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.

Under COLO, we will update the guest's state(modify memory, cpu's registers,
device status...). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use a slow path to resume the guest. While
resuming HVM using slow path is not supported currently, this patch is to
make the resume call to not fail.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
---
 tools/libxc/xc_resume.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 87d4324..4a9b035 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -108,6 +108,26 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
     return do_domctl(xch, &domctl);
 }
 
+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    /*
+     * The domctl XEN_DOMCTL_resumedomain unpause each vcpu. After
+     * the domctl, the guest will run.
+     *
+     * If it is PVHVM, the guest called the hypercall
+     *    SCHEDOP_shutdown:SHUTDOWN_suspend
+     * to suspend itself. We don't modify the return code, so the PV driver
+     * will disconnect and reconnect.
+     *
+     * If it is a HVM, the guest will continue running.
+     */
+    domctl.cmd = XEN_DOMCTL_resumedomain;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
+}
+
 static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
 {
     DECLARE_DOMCTL;
@@ -137,10 +157,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
      */
 #if defined(__i386__) || defined(__x86_64__)
     if ( info.hvm )
-    {
-        ERROR("Cannot resume uncooperative HVM guests");
-        return rc;
-    }
+        return xc_domain_resume_hvm(xch, domid);
 
     if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
     {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (4 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:34   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc Wen Congyang
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

Introduce enum type libxl_checkpointed_stream in IDL.
rename the last argument of migrate_receive from "remus" to
"checkpointed" since the semantics of this parameter has
changed.

NOTE:
 libxl_domain_restore_params and domain_create aren't changed here,
 checkpointed_stream is still an int. Because we will pass the
 value from libxl to libxc.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.h             |  7 +++++++
 tools/libxl/libxl_create.c      |  8 ++++++--
 tools/libxl/libxl_stream_read.c |  7 +++++--
 tools/libxl/libxl_types.idl     |  5 +++++
 tools/libxl/xl_cmdimpl.c        | 18 ++++++++++++------
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index fa87f53..6225db1 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -876,6 +876,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
  */
 #define LIBXL_HAVE_DEVICE_MODEL_VERSION_NONE 1
 
+/*
+ * LIBXL_HAVE_CHECKPOINTED_STREAM
+ *
+ * If this is defined, then libxl_checkpointed_stream exists.
+ */
+#define LIBXL_HAVE_CHECKPOINTED_STREAM 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 91c78e5..0d20c2d 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1024,9 +1024,13 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->srs.completion_callback = domcreate_stream_done;
 
     if (restore_fd >= 0) {
-        if (checkpointed_stream)
+        switch (checkpointed_stream) {
+        case LIBXL_CHECKPOINTED_STREAM_REMUS:
             libxl__remus_restore_setup(egc, dcs);
-        libxl__stream_read_start(egc, &dcs->srs);
+            /* fall through */
+        case LIBXL_CHECKPOINTED_STREAM_NONE:
+            libxl__stream_read_start(egc, &dcs->srs);
+        }
         return;
     }
 
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index dac134e..f4781eb 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -794,19 +794,22 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
      * If the stream is not still alive, we must not continue any work.
      */
     if (libxl__stream_read_inuse(stream)) {
-        if (checkpointed_stream) {
+        switch (checkpointed_stream) {
+        case LIBXL_CHECKPOINTED_STREAM_REMUS:
             /*
              * Failover from primary. Domain state is currently at a
              * consistent checkpoint, complete the stream, and call
              * stream->completion_callback() to resume the guest.
              */
             stream_complete(egc, stream, 0);
-        } else {
+            break;
+        case LIBXL_CHECKPOINTED_STREAM_NONE:
             /*
              * Libxc has indicated that it is done with the stream.
              * Resume reading libxl records from it.
              */
             stream_continue(egc, stream);
+            break;
         }
     }
 }
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9ad7eba..b8fb22f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -228,6 +228,11 @@ libxl_hdtype = Enumeration("hdtype", [
     (2, "AHCI"),
     ], init_val = "LIBXL_HDTYPE_IDE")
 
+libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
+    (0, "NONE"),
+    (1, "REMUS"),
+    ])
+
 #
 # Complex libxl types
 #
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25507c7..04cbcf3 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4424,7 +4424,8 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 }
 
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int send_fd, int recv_fd, int remus)
+                            int send_fd, int recv_fd,
+                            libxl_checkpointed_stream checkpointed)
 {
     uint32_t domid;
     int rc, rc2;
@@ -4449,7 +4450,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
     dom_info.migration_domname_r = &migration_domname;
-    dom_info.checkpointed_stream = remus;
+    dom_info.checkpointed_stream = checkpointed;
 
     rc = create_domain(&dom_info);
     if (rc < 0) {
@@ -4460,7 +4461,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
 
     domid = rc;
 
-    if (remus) {
+    switch (checkpointed) {
+    case LIBXL_CHECKPOINTED_STREAM_REMUS:
         /* If we are here, it means that the sender (primary) has crashed.
          * TODO: Split-Brain Check.
          */
@@ -4493,6 +4495,9 @@ static void migrate_receive(int debug, int daemonize, int monitor,
                     common_domname, domid, rc);
 
         exit(rc ? -ERROR_FAIL: 0);
+    default:
+        /* do nothing */
+        break;
     }
 
     fprintf(stderr, "migration target: Transfer complete,"
@@ -4630,7 +4635,8 @@ int main_restore(int argc, char **argv)
 
 int main_migrate_receive(int argc, char **argv)
 {
-    int debug = 0, daemonize = 1, monitor = 1, remus = 0;
+    int debug = 0, daemonize = 1, monitor = 1;
+    libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
 
     SWITCH_FOREACH_OPT(opt, "Fedr", NULL, "migrate-receive", 0) {
@@ -4645,7 +4651,7 @@ int main_migrate_receive(int argc, char **argv)
         debug = 1;
         break;
     case 'r':
-        remus = 1;
+        checkpointed = LIBXL_CHECKPOINTED_STREAM_REMUS;
         break;
     }
 
@@ -4655,7 +4661,7 @@ int main_migrate_receive(int argc, char **argv)
     }
     migrate_receive(debug, daemonize, monitor,
                     STDOUT_FILENO, STDIN_FILENO,
-                    remus);
+                    checkpointed);
 
     return 0;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (5 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:35   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state Wen Congyang
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Wen Congyang,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Shriram Rajagopalan,
	Ian Jackson, Yang Hongyang

Pass checkpointed_stream from libxl to libxc.
It won't affact legacy migration because legacy migration
won't use this param.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/include/xenguest.h   |  6 ++++--
 tools/libxc/xc_nomigrate.c       |  3 ++-
 tools/libxc/xc_sr_common.h       | 12 +++++++++++-
 tools/libxc/xc_sr_save.c         | 18 ++++++++++++------
 tools/libxl/libxl.c              |  2 ++
 tools/libxl/libxl_dom_save.c     | 11 ++++++++---
 tools/libxl/libxl_internal.h     |  1 +
 tools/libxl/libxl_save_callout.c |  2 +-
 tools/libxl/libxl_save_helper.c  |  3 ++-
 tools/libxl/libxl_stream_write.c |  2 +-
 tools/libxl/libxl_types.idl      |  1 +
 11 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index d48b3ff..affc42b 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -29,7 +29,6 @@
 #define XCFLAGS_HVM       (1 << 2)
 #define XCFLAGS_STDVGA    (1 << 3)
 #define XCFLAGS_CHECKPOINT_COMPRESS    (1 << 4)
-#define XCFLAGS_CHECKPOINTED    (1 << 5)
 
 #define X86_64_B_SIZE   64 
 #define X86_32_B_SIZE   32
@@ -82,11 +81,14 @@ struct save_callbacks {
  * @parm xch a handle to an open hypervisor interface
  * @parm fd the file descriptor to save a domain to
  * @parm dom the id of the domain
+ * @param checkpointed_stream MIG_STREAM_NONE if the far end of the stream
+ *        doesn't use checkpointing
  * @return 0 on success, -1 on failure
  */
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
-                   struct save_callbacks* callbacks, int hvm);
+                   struct save_callbacks* callbacks, int hvm,
+                   int checkpointed_stream);
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 902429e..c9124df 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -22,7 +22,8 @@
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags,
-                   struct save_callbacks* callbacks, int hvm)
+                   struct save_callbacks* callbacks, int hvm,
+                   int checkpointed_stream)
 {
     errno = ENOSYS;
     return -1;
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 60b43e8..66f595f 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -180,6 +180,16 @@ struct xc_sr_context
 
     xc_dominfo_t dominfo;
 
+    /*
+     * migration stream
+     * 0: Plain VM
+     * 1: Remus
+     */
+    enum {
+        MIG_STREAM_NONE, /* plain stream */
+        MIG_STREAM_REMUS,
+    } migration_stream;
+
     union /* Common save or restore data. */
     {
         struct /* Save data. */
@@ -191,7 +201,7 @@ struct xc_sr_context
             bool live;
 
             /* Plain VM, or checkpoints over time. */
-            bool checkpointed;
+            int checkpointed;
 
             /* Further debugging information in the stream. */
             bool debug;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index ccb000e..0bea97e 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -629,7 +629,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
     if ( rc )
         goto out;
 
-    if ( ctx->save.debug && !ctx->save.checkpointed )
+    if ( ctx->save.debug &&
+         ctx->save.checkpointed != MIG_STREAM_NONE )
     {
         rc = verify_frames(ctx);
         if ( rc )
@@ -758,7 +759,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 
         if ( ctx->save.live )
             rc = send_domain_memory_live(ctx);
-        else if ( ctx->save.checkpointed )
+        else if ( ctx->save.checkpointed != MIG_STREAM_NONE )
             rc = send_domain_memory_checkpointed(ctx);
         else
             rc = send_domain_memory_nonlive(ctx);
@@ -778,7 +779,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
         if ( rc )
             goto err;
 
-        if ( ctx->save.checkpointed )
+        if ( ctx->save.checkpointed != MIG_STREAM_NONE )
         {
             /*
              * We have now completed the initial live portion of the checkpoint
@@ -799,7 +800,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
             if ( rc <= 0 )
                 goto err;
         }
-    } while ( ctx->save.checkpointed );
+    } while ( ctx->save.checkpointed != MIG_STREAM_NONE );
 
     xc_report_progress_single(xch, "End of stream");
 
@@ -829,7 +830,8 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
                    uint32_t max_iters, uint32_t max_factor, uint32_t flags,
-                   struct save_callbacks* callbacks, int hvm)
+                   struct save_callbacks* callbacks, int hvm,
+                   int checkpointed_stream)
 {
     struct xc_sr_context ctx =
         {
@@ -841,7 +843,11 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
     ctx.save.callbacks = callbacks;
     ctx.save.live  = !!(flags & XCFLAGS_LIVE);
     ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
-    ctx.save.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
+    ctx.save.checkpointed = checkpointed_stream;
+
+    /* If altering migration_stream update this assert too. */
+    assert(checkpointed_stream == MIG_STREAM_NONE ||
+           checkpointed_stream == MIG_STREAM_REMUS);
 
     /*
      * TODO: Find some time to better tweak the live migration algorithm.
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 8707b08..fc7844d 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -876,6 +876,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->live = 1;
     dss->debug = 0;
     dss->remus = info;
+    dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
 
     assert(info);
 
@@ -936,6 +937,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags,
     dss->type = type;
     dss->live = flags & LIBXL_SUSPEND_LIVE;
     dss->debug = flags & LIBXL_SUSPEND_DEBUG;
+    dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
 
     rc = libxl__fd_flags_modify_save(gc, dss->fd,
                                      ~(O_NONBLOCK|O_NDELAY), 0,
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 02cc143..cd2e7de 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -338,6 +338,12 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     unsigned int nr_vnodes = 0, nr_vmemranges = 0, nr_vcpus = 0;
     libxl__domain_suspend_state *dsps = &dss->dsps;
 
+    if (dss->checkpointed_stream != LIBXL_CHECKPOINTED_STREAM_NONE && !r_info) {
+        LOG(ERROR, "Migration stream is checkpointed, but there's no "
+                   "checkpoint info!");
+        goto out;
+    }
+
     dss->rc = 0;
     logdirty_init(&dss->logdirty);
     dsps->ao = ao;
@@ -376,15 +382,14 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
         goto out;
     }
 
-    if (r_info != NULL) {
+    if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) {
         dss->interval = r_info->interval;
-        dss->xcflags |= XCFLAGS_CHECKPOINTED;
         if (libxl_defbool_val(r_info->compression))
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
 
     memset(callbacks, 0, sizeof(*callbacks));
-    if (r_info != NULL) {
+    if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) {
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bc48bec..fbd1acb 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3111,6 +3111,7 @@ struct libxl__domain_save_state {
     libxl_domain_type type;
     int live;
     int debug;
+    int checkpointed_stream;
     const libxl_domain_remus_info *remus;
     /* private */
     int rc;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 2d06b42..416b318 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -85,7 +85,7 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
 
     const unsigned long argnums[] = {
         dss->domid, 0, 0, dss->xcflags, dss->hvm,
-        cbflags,
+        cbflags, dss->checkpointed_stream,
     };
 
     shs->ao = ao;
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 39038f9..6bdcf13 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -253,6 +253,7 @@ int main(int argc, char **argv)
         uint32_t flags =           strtoul(NEXTARG,0,10);
         int hvm =                  atoi(NEXTARG);
         unsigned cbflags =         strtoul(NEXTARG,0,10);
+        int checkpointed_stream =  strtoul(NEXTARG,0,10);
         assert(!*++argv);
 
         helper_setcallbacks_save(&helper_save_callbacks, cbflags);
@@ -261,7 +262,7 @@ int main(int argc, char **argv)
         setup_signals(save_signal_handler);
 
         r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-                           &helper_save_callbacks, hvm);
+                           &helper_save_callbacks, hvm, checkpointed_stream);
         complete(r);
 
     } else if (!strcmp(mode,"--restore-domain")) {
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 9053146..f6ea55d 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -355,7 +355,7 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
      * If the stream is not still alive, we must not continue any work.
      */
     if (libxl__stream_write_inuse(stream)) {
-        if (dss->remus)
+        if (dss->checkpointed_stream != LIBXL_CHECKPOINTED_STREAM_NONE)
             /*
              * For remus, if libxl__xc_domain_save_done() completes,
              * there was an error sending data to the secondary.
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index b8fb22f..605fb9a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -228,6 +228,7 @@ libxl_hdtype = Enumeration("hdtype", [
     (2, "AHCI"),
     ], init_val = "LIBXL_HDTYPE_IDE")
 
+# Consistent with the values defined for migration_stream.
 libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
     (0, "NONE"),
     (1, "REMUS"),
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (6 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:34   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Wen Congyang
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Anthony Perard, Shriram Rajagopalan,
	Yang Hongyang

In normal migration, the qemu state is passed to qemu as a parameter.
With COLO, secondary vm is running. So we will do the following steps
at every checkpoint:
1. suspend both primary vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
Primary will send qemu's state in step2, and secondary's qemu should
read it and restore the state before it is resumed. We can not pass
the state to qemu as a parameter because secondary QEMU already started
at this point, so we introduce libxl__domain_restore_device_model() to
do it. This API MUST be called before resuming secondary vm.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
---
 tools/libxl/libxl_dom_save.c | 20 ++++++++++++++++++++
 tools/libxl/libxl_internal.h |  4 ++++
 tools/libxl/libxl_qmp.c      | 10 ++++++++++
 3 files changed, 34 insertions(+)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index cd2e7de..7383d2d 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -518,6 +518,26 @@ int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
     return rc;
 }
 
+int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
+                                       const char *restore_file)
+{
+    int rc;
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        /* Will never be supported. */
+        rc = ERROR_INVAL;
+        break;
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        rc = libxl__qmp_restore(gc, domid, restore_file);
+        break;
+    default:
+        rc = ERROR_INVAL;
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fbd1acb..896c119 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1117,6 +1117,8 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
                                  const char *old_name, const char *new_name,
                                  xs_transaction_t trans);
 
+_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
+                                               const char *restore_file);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
 
 _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
@@ -1760,6 +1762,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
 _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
 /* Save current QEMU state into fd. */
 _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
+/* Load current QEMU state from fd. */
+_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
 /* Set dirty bitmap logging status */
 _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 714038b..eec8a44 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -905,6 +905,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
                            NULL, NULL);
 }
 
+int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "filename", state_file);
+
+    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
+                           NULL, NULL);
+}
+
 static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
                       char *device, char *target, char *arg)
 {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (7 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:34   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 10/18] tools/libxl: export logdirty_init Wen Congyang
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

Secondary vm is running in COLO mode, we need to send secondary
vm's dirty page information to primary host at checkpoint, so we
have to enable qemu logdirty on secondary.

libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses libxl__domain_save_state, and calls
libxl__xc_domain_saverestore_async_callback_done() before exits.
This can not be used for secondary vm.

Update libxl__domain_suspend_common_switch_qemu_logdirty() to
introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds->callback before exits. This new API will be used by the patch:
  secondary vm suspend/resume/checkpoint code

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_dom_save.c | 95 ++++++++++++++++++++++++--------------------
 tools/libxl/libxl_internal.h |  8 ++++
 2 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 7383d2d..8bcc3ff 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -42,7 +42,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_save_state *dss, int rc);
+                                 libxl__logdirty_switch *lds, int rc);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -52,13 +52,10 @@ static void logdirty_init(libxl__logdirty_switch *lds)
 }
 
 static void domain_suspend_switch_qemu_xen_traditional_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                               (libxl__egc *egc, int domid, unsigned enable,
+                                libxl__logdirty_switch *lds)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_save_state *dss = shs->caller_state;
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
     xs_transaction_t t = 0;
     const char *got;
@@ -120,26 +117,34 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
     LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
     libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,rc);
+    switch_logdirty_done(egc,lds,rc);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                               (libxl__egc *egc, int domid, unsigned enable,
+                                libxl__logdirty_switch *lds)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_save_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
 
     rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
-    if (!rc) {
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
-    } else {
+    if (rc)
         LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
+
+    lds->callback(egc, lds, rc);
+}
+
+static void domain_suspend_switch_qemu_logdirty_done
+                        (libxl__egc *egc, libxl__logdirty_switch *lds, int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(lds, *dss, logdirty);
+
+    if (rc) {
         dss->rc = rc;
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
-    }
+        libxl__xc_domain_saverestore_async_callback_done(egc,
+                                                         &dss->sws.shs, -1);
+    } else
+        libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
 }
 
 void libxl__domain_suspend_common_switch_qemu_logdirty
@@ -148,42 +153,52 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
     libxl__save_helper_state *shs = user;
     libxl__egc *egc = shs->egc;
     libxl__domain_save_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
+
+    /* Convenience aliases. */
+    libxl__logdirty_switch *const lds = &dss->logdirty;
+
+    lds->callback = domain_suspend_switch_qemu_logdirty_done;
+    libxl__domain_common_switch_qemu_logdirty(egc, domid, enable, lds);
+}
+
+void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+                                               int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds)
+{
+    STATE_AO_GC(lds->ao);
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_traditional_logdirty(egc, domid, enable,
+                                                            lds);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_logdirty(egc, domid, enable, lds);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_NONE:
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        lds->callback(egc, lds, 0);
         break;
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, abandoning suspend");
-        dss->rc = ERROR_FAIL;
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, ERROR_FAIL);
     }
 }
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
                                     const struct timeval *requested_abs,
                                     int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(ev, *lds, timeout);
+    STATE_AO_GC(lds->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,ERROR_FAIL);
+    switch_logdirty_done(egc,lds,ERROR_FAIL);
 }
 
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
                             const char *watch_path, const char *event_path)
 {
-    libxl__domain_save_state *dss =
-        CONTAINER_OF(watch, *dss, logdirty.watch);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(watch, *lds, watch);
+    STATE_AO_GC(lds->ao);
     const char *got;
     xs_transaction_t t = 0;
     int rc;
@@ -229,28 +244,20 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
     if (rc <= 0) {
         if (rc < 0)
             LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,rc);
+        switch_logdirty_done(egc,lds,rc);
     }
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_save_state *dss,
+                                 libxl__logdirty_switch *lds,
                                  int rc)
 {
-    STATE_AO_GC(dss->ao);
-    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(lds->ao);
 
     libxl__ev_xswatch_deregister(gc, &lds->watch);
     libxl__ev_time_deregister(gc, &lds->timeout);
 
-    int broke;
-    if (rc) {
-        broke = -1;
-        dss->rc = rc;
-    } else {
-        broke = 0;
-    }
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, broke);
+    lds->callback(egc, lds, rc);
 }
 
 /*----- callbacks, called by xc_domain_save -----*/
@@ -346,6 +353,8 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
 
     dss->rc = 0;
     logdirty_init(&dss->logdirty);
+    dss->logdirty.ao = ao;
+
     dsps->ao = ao;
     dsps->domid = domid;
     rc = libxl__domain_suspend_init(egc, dsps, type);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 896c119..dd710cc 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3073,6 +3073,11 @@ libxl__stream_write_inuse(const libxl__stream_write_state *stream)
 }
 
 typedef struct libxl__logdirty_switch {
+    /* Set by caller of libxl__domain_common_switch_qemu_logdirty */
+    libxl__ao *ao;
+    void (*callback)(libxl__egc *egc, struct libxl__logdirty_switch *lds,
+                     int rc);
+
     const char *cmd;
     const char *cmd_path;
     const char *ret_path;
@@ -3490,6 +3495,9 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
+_hidden void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+                                               int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds);
 _hidden int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
                                                char **buf, uint32_t *len);
 _hidden int libxl__restore_emulator_xenstore_data
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 10/18] tools/libxl: export logdirty_init
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (8 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 11/18] tools/libxl: Add back channel to allow migration target send data back Wen Congyang
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

We need to enable logdirty on secondary, so we export logdirty_init
for internal use. Rename it to libxl__logdirty_init.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_dom_save.c | 4 ++--
 tools/libxl/libxl_internal.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 8bcc3ff..ab043f9 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -44,7 +44,7 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
 static void switch_logdirty_done(libxl__egc *egc,
                                  libxl__logdirty_switch *lds, int rc);
 
-static void logdirty_init(libxl__logdirty_switch *lds)
+void libxl__logdirty_init(libxl__logdirty_switch *lds)
 {
     lds->cmd_path = 0;
     libxl__ev_xswatch_init(&lds->watch);
@@ -352,7 +352,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     }
 
     dss->rc = 0;
-    logdirty_init(&dss->logdirty);
+    libxl__logdirty_init(&dss->logdirty);
     dss->logdirty.ao = ao;
 
     dsps->ao = ao;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index dd710cc..adc426a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3085,6 +3085,8 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+_hidden void libxl__logdirty_init(libxl__logdirty_switch *lds);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend_init */
     libxl__ao *ao;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 11/18] tools/libxl: Add back channel to allow migration target send data back
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (9 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 10/18] tools/libxl: export logdirty_init Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Wen Congyang
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

In COLO mode, secondary needs to send the following data to primary:
1. In libxl
   Secondary sends the following CHECKPOINT_CONTEXT to primary:
   CHECKPOINT_SVM_SUSPENDED, CHECKPOINT_SVM_READY and CHECKPOINT_SVM_RESUMED
2. In libxc
   Secondary sends the dirty pfn list to primary

But the io_fd only can be written in primary, and only can be read in
secondary. Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state. Extend libxl_domain_create_restore API, add a
send_fd param to it. Add LIBXL_HAVE_CREATE_RESTORE_SEND_FD to indicate
the API change.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
---
 tools/libxl/libxl.c                  |  2 +-
 tools/libxl/libxl.h                  | 30 ++++++++++++++++++++++++++++--
 tools/libxl/libxl_create.c           |  9 +++++----
 tools/libxl/libxl_internal.h         |  2 ++
 tools/libxl/xl_cmdimpl.c             |  8 +++++++-
 tools/ocaml/libs/xl/xenlight_stubs.c |  2 +-
 6 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index fc7844d..e286329 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -871,7 +871,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->callback = remus_failover_cb;
     dss->domid = domid;
     dss->fd = send_fd;
-    /* TODO do something with recv_fd */
+    dss->recv_fd = recv_fd;
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 6225db1..5e4aede 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -639,6 +639,15 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
 
 /*
+ * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_FD 1
+ *
+ * If this is defined, libxl_domain_create_restore()'s API has changed to
+ * include a send_fd param which used for libxl migration back channel
+ * during COLO.
+ */
+#define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_FD 1
+
+/*
  * LIBXL_HAVE_CREATEINFO_PVH
  * If this is defined, then libxl supports creation of a PVH guest.
  */
@@ -1152,7 +1161,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
                             LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
-                                uint32_t *domid, int restore_fd,
+                                uint32_t *domid, int restore_fd, int send_fd,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
@@ -1173,7 +1182,7 @@ int static inline libxl_domain_create_restore_0x040200(
     libxl_domain_restore_params_init(&params);
 
     ret = libxl_domain_create_restore(
-        ctx, d_config, domid, restore_fd, &params, ao_how, aop_console_how);
+        ctx, d_config, domid, restore_fd, -1, &params, ao_how, aop_console_how);
 
     libxl_domain_restore_params_dispose(&params);
     return ret;
@@ -1181,6 +1190,23 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #define libxl_domain_create_restore libxl_domain_create_restore_0x040200
 
+#elif defined(LIBXL_API_VERSION) && LIBXL_API_VERSION >= 0x040400 \
+                                 && LIBXL_API_VERSION < 0x040700
+
+int static inline libxl_domain_create_restore_0x040400(
+    libxl_ctx *ctx, libxl_domain_config *d_config,
+    uint32_t *domid, int restore_fd,
+    const libxl_domain_restore_params *params,
+    const libxl_asyncop_how *ao_how,
+    const libxl_asyncprogress_how *aop_console_how)
+    LIBXL_EXTERNAL_CALLERS_ONLY
+{
+    return libxl_domain_create_restore(ctx, d_config, domid, restore_fd,
+                                       -1, params, ao_how, aop_console_how);
+}
+
+#define libxl_domain_create_restore libxl_domain_create_restore_0x040400
+
 #endif
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 0d20c2d..eb869ea 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1563,7 +1563,7 @@ static void domain_create_cb(libxl__egc *egc,
                              int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-                            uint32_t *domid, int restore_fd,
+                            uint32_t *domid, int restore_fd, int send_fd,
                             const libxl_domain_restore_params *params,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
@@ -1578,6 +1578,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
+    cdcs->dcs.send_fd = send_fd;
     if (restore_fd > -1) {
         cdcs->dcs.restore_params = *params;
         rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
@@ -1756,17 +1757,17 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
                             ao_how, aop_console_how);
 }
 
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
-                                uint32_t *domid, int restore_fd,
+                                uint32_t *domid, int restore_fd, int send_fd,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, restore_fd, params,
+    return do_domain_create(ctx, d_config, domid, restore_fd, send_fd, params,
                             ao_how, aop_console_how);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index adc426a..8e9e57d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3119,6 +3119,7 @@ struct libxl__domain_save_state {
     uint32_t domid;
     int fd;
     int fdfl; /* original flags on fd */
+    int recv_fd;
     libxl_domain_type type;
     int live;
     int debug;
@@ -3453,6 +3454,7 @@ struct libxl__domain_create_state {
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd, libxc_fd;
     int restore_fdfl; /* original flags of restore_fd */
+    int send_fd;
     libxl_domain_restore_params restore_params;
     uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 04cbcf3..59182b7 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -159,6 +159,7 @@ struct domain_create {
     char *extra_config; /* extra config string */
     const char *restore_file;
     int migrate_fd; /* -1 means none */
+    int send_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
 };
 
@@ -2686,6 +2687,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
     int config_len = 0;
     int restore_fd = -1;
     int restore_fd_to_close = -1;
+    int send_fd = -1;
     const libxl_asyncprogress_how *autoconnect_console_how;
     struct save_file_header hdr;
     uint32_t domid_soft_reset = INVALID_DOMID;
@@ -2703,6 +2705,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
         if (migrate_fd >= 0) {
             restore_source = "<incoming migration stream>";
             restore_fd = migrate_fd;
+            send_fd = dom_info->send_fd;
         } else {
             restore_source = restore_file;
             restore_fd = open(restore_file, O_RDONLY);
@@ -2893,7 +2896,7 @@ start:
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
-                                          &params,
+                                          send_fd, &params,
                                           0, autoconnect_console_how);
 
         libxl_domain_restore_params_dispose(&params);
@@ -4449,6 +4452,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.monitor = monitor;
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
+    dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
 
@@ -4622,6 +4626,7 @@ int main_restore(int argc, char **argv)
     dom_info.config_file = config_file;
     dom_info.restore_file = checkpoint_file;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
@@ -5089,6 +5094,7 @@ int main_create(int argc, char **argv)
     dom_info.quiet = quiet;
     dom_info.config_file = filename;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c b/tools/ocaml/libs/xl/xenlight_stubs.c
index 4133527..1c52c2a 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -537,7 +537,7 @@ value stub_libxl_domain_create_restore(value ctx, value domain_config, value par
 	restore_fd = Int_val(Field(params, 0));
 
 	caml_enter_blocking_section();
-	ret = libxl_domain_create_restore(CTX, &c_dconfig, &c_domid, restore_fd,
+	ret = libxl_domain_create_restore(CTX, &c_dconfig, &c_domid, restore_fd, -1,
 		&c_params, ao_how, NULL);
 	caml_leave_blocking_section();
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (10 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 11/18] tools/libxl: Add back channel to allow migration target send data back Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:38   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 13/18] tools/libxl: rename remus device to checkpoint device Wen Congyang
                   ` (6 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Wen Congyang,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Shriram Rajagopalan,
	Ian Jackson, Yang Hongyang

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent from the
primary to the secondary.

However, the set difference B - A (the one in B but not in A, lets
call this C) is out-of-date on the secondary (with respect to the
primary) and will not be sent by the primary (to secondary), as it
was not memory dirtied by the primary. The secondary needs C page data
to reconstruct an exact copy of the primary at the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

We transfer the dirty bitmap on libxc side, so we need to introduce back
channel to libxc.

Note: it is different from the paper. We change the original design to
the current one, according to our following concerns:
1. The original design needs extra memory on Secondary host. When there's
   multiple backups on one host, the memory cost is high.
2. The memory cache code will be another 1k+, it will make the review
   more time consuming.

Note: the back channel will be used in the patch
 libxc/restore: send dirty pfn list to primary when checkpoint under COLO
to send dirty pfn list from secondary to primary. The patch is posted in
another series.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenguest.h   |  4 ++--
 tools/libxc/xc_nomigrate.c       |  4 ++--
 tools/libxc/xc_sr_restore.c      |  2 +-
 tools/libxc/xc_sr_save.c         |  2 +-
 tools/libxl/libxl_save_callout.c | 39 ++++++++++++++++++++++++++-------------
 tools/libxl/libxl_save_helper.c  |  8 ++++++--
 6 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index affc42b..ff230a4 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -88,7 +88,7 @@ struct save_callbacks {
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
                    struct save_callbacks* callbacks, int hvm,
-                   int checkpointed_stream);
+                   int checkpointed_stream, int back_fd);
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
@@ -127,7 +127,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       int checkpointed_stream,
-                      struct restore_callbacks *callbacks);
+                      struct restore_callbacks *callbacks, int back_fd);
 
 /**
  * This function will create a domain for a paravirtualized Linux
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index c9124df..089f767 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -23,7 +23,7 @@
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   int checkpointed_stream)
+                   int checkpointed_stream, int back_fd)
 {
     errno = ENOSYS;
     return -1;
@@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       int checkpointed_stream,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int back_fd)
 {
     errno = ENOSYS;
     return -1;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d4d33fd..b0f47b5 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -726,7 +726,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_gfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       int checkpointed_stream,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int back_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 0bea97e..2cc5b45 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -831,7 +831,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
                    uint32_t max_iters, uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   int checkpointed_stream)
+                   int checkpointed_stream, int back_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 416b318..631e3e2 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -27,7 +27,7 @@
  */
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
                        const char *mode_arg,
-                       int stream_fd,
+                       int stream_fd, int back_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums);
 
@@ -50,6 +50,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     /* Convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     const int restore_fd = dcs->libxc_fd;
+    const int send_fd = dcs->send_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
 
     unsigned cbflags =
@@ -71,7 +72,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     shs->caller_state = dcs;
     shs->need_results = 1;
 
-    run_helper(egc, shs, "--restore-domain", restore_fd, 0, 0,
+    run_helper(egc, shs, "--restore-domain", restore_fd, send_fd, 0, 0,
                argnums, ARRAY_SIZE(argnums));
 }
 
@@ -95,7 +96,7 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
     shs->caller_state = dss;
     shs->need_results = 0;
 
-    run_helper(egc, shs, "--save-domain", dss->fd,
+    run_helper(egc, shs, "--save-domain", dss->fd, dss->recv_fd,
                NULL, 0,
                argnums, ARRAY_SIZE(argnums));
     return;
@@ -118,14 +119,29 @@ void libxl__save_helper_init(libxl__save_helper_state *shs)
 }
 
 /*----- helper execution -----*/
+static int dup_fd_helper(libxl__gc *gc, int fd, const char *what)
+{
+    int dup_fd = fd;
+
+    if (fd <= 2) {
+        dup_fd = dup(fd);
+        if (dup_fd < 0) {
+            LOGE(ERROR,"dup %s", what);
+            exit(-1);
+        }
+    }
+    libxl_fd_set_cloexec(CTX, dup_fd, 0);
+
+    return dup_fd;
+}
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
-                       const char *mode_arg, int stream_fd,
+                       const char *mode_arg, int stream_fd, int back_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums)
 {
     STATE_AO_GC(shs->ao);
-    const char *args[4 + num_argnums];
+    const char *args[5 + num_argnums];
     const char **arg = args;
     int i, rc;
 
@@ -153,6 +169,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC_BIN "/" "libxl-save-helper";
     *arg++ = mode_arg;
     const char **stream_fd_arg = arg++;
+    const char **back_fd_arg = arg++;
     for (i=0; i<num_argnums; i++)
         *arg++ = GCSPRINTF("%lu", argnums[i]);
     *arg++ = 0;
@@ -177,16 +194,12 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
 
     pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
     if (!pid) {
-        if (stream_fd <= 2) {
-            stream_fd = dup(stream_fd);
-            if (stream_fd < 0) {
-                LOGE(ERROR,"dup migration stream fd");
-                exit(-1);
-            }
-        }
-        libxl_fd_set_cloexec(CTX, stream_fd, 0);
+        stream_fd = dup_fd_helper(gc, stream_fd, "migration stream fd");
         *stream_fd_arg = GCSPRINTF("%d", stream_fd);
 
+        back_fd = dup_fd_helper(gc, back_fd, "migration back channel fd");
+        *back_fd_arg = GCSPRINTF("%d", back_fd);
+
         for (i=0; i<num_preserve_fds; i++)
             if (preserve_fds[i] >= 0) {
                 assert(preserve_fds[i] > 2);
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 6bdcf13..9bdcf41 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -238,6 +238,7 @@ static struct restore_callbacks helper_restore_callbacks;
 int main(int argc, char **argv)
 {
     int r;
+    int back_fd;
 
 #define NEXTARG (++argv, assert(*argv), *argv)
 
@@ -247,6 +248,7 @@ int main(int argc, char **argv)
     if (!strcmp(mode,"--save-domain")) {
 
         io_fd =                    atoi(NEXTARG);
+        back_fd =                  atoi(NEXTARG);
         uint32_t dom =             strtoul(NEXTARG,0,10);
         uint32_t max_iters =       strtoul(NEXTARG,0,10);
         uint32_t max_factor =      strtoul(NEXTARG,0,10);
@@ -262,12 +264,14 @@ int main(int argc, char **argv)
         setup_signals(save_signal_handler);
 
         r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-                           &helper_save_callbacks, hvm, checkpointed_stream);
+                           &helper_save_callbacks, hvm, checkpointed_stream,
+                           back_fd);
         complete(r);
 
     } else if (!strcmp(mode,"--restore-domain")) {
 
         io_fd =                    atoi(NEXTARG);
+        back_fd =                  atoi(NEXTARG);
         uint32_t dom =             strtoul(NEXTARG,0,10);
         unsigned store_evtchn =    strtoul(NEXTARG,0,10);
         domid_t store_domid =      strtoul(NEXTARG,0,10);
@@ -292,7 +296,7 @@ int main(int argc, char **argv)
                               store_domid, console_evtchn, &console_mfn,
                               console_domid, hvm, pae, superpages,
                               checkpointed,
-                              &helper_restore_callbacks);
+                              &helper_restore_callbacks, back_fd);
         helper_stub_restore_results(store_mfn,console_mfn,0);
         complete(r);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 13/18] tools/libxl: rename remus device to checkpoint device
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (11 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 14/18] tools/libxl: fix backword compatibility after the automatic renaming Wen Congyang
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

This patch is auto generated by the following commands:
 1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c
 2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' tools/libxl/Makefile
 3. perl -pi -e 's/\blibxl__remus_devices/libxl__checkpoint_devices/g' tools/libxl/*.[ch]
 4. perl -pi -e 's/\blibxl__remus_device\b/libxl__checkpoint_device/g' tools/libxl/*.[ch]
 5. perl -pi -e 's/\blibxl__remus_device_instance_ops\b/libxl__checkpoint_device_instance_ops/g' tools/libxl/*.[ch]
 6. perl -pi -e 's/\blibxl__remus_callback\b/libxl__checkpoint_callback/g' tools/libxl/*.[ch]
 7. perl -pi -e 's/\bremus_device_init\b/checkpoint_device_init/g' tools/libxl/*.[ch]
 8. perl -pi -e 's/\bremus_devices_setup\b/checkpoint_devices_setup/g' tools/libxl/*.[ch]
 9. perl -pi -e 's/\bdefine_remus_checkpoint_api\b/define_checkpoint_api/g' tools/libxl/*.[ch]
10. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch]
11. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] tools/libxl/*.idl
12. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] tools/libxl/*.idl
13. perl -pi -e 's/\bremus\b/checkpoint/g' tools/libxl/libxl_checkpoint_device.[ch]
14. perl -pi -e 's/\bremus device/checkpoint device/g' tools/libxl/libxl_internal.h
15. perl -pi -e 's/\bRemus device/checkpoint device/g' tools/libxl/libxl_internal.h
16. perl -pi -e 's/\bremus abstract/checkpoint abstract/g' tools/libxl/libxl_internal.h
17. perl -pi -e 's/\bremus invocation/checkpoint invocation/g' tools/libxl/libxl_internal.h
18. perl -pi -e 's/\blibxl__remus_device_\(/libxl__checkpoint_device_(/g' tools/libxl/libxl_internal.h

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-Lightly-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/Makefile                               |   2 +-
 ...xl_remus_device.c => libxl_checkpoint_device.c} | 198 ++++++++++-----------
 tools/libxl/libxl_internal.h                       | 112 ++++++------
 tools/libxl/libxl_netbuffer.c                      | 108 +++++------
 tools/libxl/libxl_nonetbuffer.c                    |  10 +-
 tools/libxl/libxl_remus.c                          |  76 ++++----
 tools/libxl/libxl_remus_disk_drbd.c                |  52 +++---
 tools/libxl/libxl_types.idl                        |   4 +-
 8 files changed, 281 insertions(+), 281 deletions(-)
 rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (52%)

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 263ea0e..789a12e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -64,7 +64,7 @@ else
 LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
-LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_checkpoint_device.c
similarity index 52%
rename from tools/libxl/libxl_remus_device.c
rename to tools/libxl/libxl_checkpoint_device.c
index a6cb7f6..109cd23 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,9 +17,9 @@
 
 #include "libxl_internal.h"
 
-extern const libxl__remus_device_instance_ops remus_device_nic;
-extern const libxl__remus_device_instance_ops remus_device_drbd_disk;
-static const libxl__remus_device_instance_ops *remus_ops[] = {
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
     &remus_device_nic,
     &remus_device_drbd_disk,
     NULL,
@@ -27,18 +27,18 @@ static const libxl__remus_device_instance_ops *remus_ops[] = {
 
 /*----- helper functions -----*/
 
-static int init_device_subkind(libxl__remus_devices_state *rds)
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* init device subkind-specific state in the libxl ctx */
     int rc;
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     if (libxl__netbuffer_enabled(gc)) {
-        rc = init_subkind_nic(rds);
+        rc = init_subkind_nic(cds);
         if (rc) goto out;
     }
 
-    rc = init_subkind_drbd_disk(rds);
+    rc = init_subkind_drbd_disk(cds);
     if (rc) goto out;
 
     rc = 0;
@@ -46,15 +46,15 @@ out:
     return rc;
 }
 
-static void cleanup_device_subkind(libxl__remus_devices_state *rds)
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     if (libxl__netbuffer_enabled(gc))
-        cleanup_subkind_nic(rds);
+        cleanup_subkind_nic(cds);
 
-    cleanup_subkind_drbd_disk(rds);
+    cleanup_subkind_drbd_disk(cds);
 }
 
 /*----- setup() and teardown() -----*/
@@ -70,103 +70,103 @@ static void devices_teardown_cb(libxl__egc *egc,
                                 libxl__multidev *multidev,
                                 int rc);
 
-/* remus device setup and teardown */
+/* checkpoint device setup and teardown */
 
-static libxl__remus_device* remus_device_init(libxl__egc *egc,
-                                              libxl__remus_devices_state *rds,
+static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
+                                              libxl__checkpoint_devices_state *cds,
                                               libxl__device_kind kind,
                                               void *libxl_dev)
 {
-    libxl__remus_device *dev = NULL;
+    libxl__checkpoint_device *dev = NULL;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
     GCNEW(dev);
     dev->backend_dev = libxl_dev;
     dev->kind = kind;
-    dev->rds = rds;
+    dev->cds = cds;
 
     return dev;
 }
 
-static void remus_devices_setup(libxl__egc *egc,
-                                libxl__remus_devices_state *rds);
+static void checkpoint_devices_setup(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds);
 
-void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_devices_state *rds)
+void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rc = init_device_subkind(rds);
+    rc = init_device_subkind(cds);
     if (rc)
         goto out;
 
-    rds->num_devices = 0;
-    rds->num_nics = 0;
-    rds->num_disks = 0;
+    cds->num_devices = 0;
+    cds->num_nics = 0;
+    cds->num_disks = 0;
 
-    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
-        rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
+    if (cds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
+        cds->nics = libxl_device_nic_list(CTX, cds->domid, &cds->num_nics);
 
-    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
-        rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
+    if (cds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
+        cds->disks = libxl_device_disk_list(CTX, cds->domid, &cds->num_disks);
 
-    if (rds->num_nics == 0 && rds->num_disks == 0)
+    if (cds->num_nics == 0 && cds->num_disks == 0)
         goto out;
 
-    GCNEW_ARRAY(rds->devs, rds->num_nics + rds->num_disks);
+    GCNEW_ARRAY(cds->devs, cds->num_nics + cds->num_disks);
 
-    for (i = 0; i < rds->num_nics; i++) {
-        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
+    for (i = 0; i < cds->num_nics; i++) {
+        cds->devs[cds->num_devices++] = checkpoint_device_init(egc, cds,
                                                 LIBXL__DEVICE_KIND_VIF,
-                                                &rds->nics[i]);
+                                                &cds->nics[i]);
     }
 
-    for (i = 0; i < rds->num_disks; i++) {
-        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
+    for (i = 0; i < cds->num_disks; i++) {
+        cds->devs[cds->num_devices++] = checkpoint_device_init(egc, cds,
                                                 LIBXL__DEVICE_KIND_VBD,
-                                                &rds->disks[i]);
+                                                &cds->disks[i]);
     }
 
-    remus_devices_setup(egc, rds);
+    checkpoint_devices_setup(egc, cds);
 
     return;
 
 out:
-    rds->callback(egc, rds, rc);
+    cds->callback(egc, cds, rc);
 }
 
-static void remus_devices_setup(libxl__egc *egc,
-                                libxl__remus_devices_state *rds)
+static void checkpoint_devices_setup(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    libxl__multidev_begin(ao, &rds->multidev);
-    rds->multidev.callback = all_devices_setup_cb;
-    for (i = 0; i < rds->num_devices; i++) {
-        libxl__remus_device *dev = rds->devs[i];
+    libxl__multidev_begin(ao, &cds->multidev);
+    cds->multidev.callback = all_devices_setup_cb;
+    for (i = 0; i < cds->num_devices; i++) {
+        libxl__checkpoint_device *dev = cds->devs[i];
         dev->ops_index = -1;
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);
 
-        dev->aodev.rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+        dev->aodev.rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
         dev->aodev.callback = device_setup_iterate;
         device_setup_iterate(egc,&dev->aodev);
     }
 
     rc = 0;
-    libxl__multidev_prepared(egc, &rds->multidev, rc);
+    libxl__multidev_prepared(egc, &cds->multidev, rc);
 }
 
 
 static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
 {
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     EGC_GC;
 
-    if (aodev->rc != ERROR_REMUS_DEVICE_NOT_SUPPORTED &&
-        aodev->rc != ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
+    if (aodev->rc != ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED &&
+        aodev->rc != ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH)
         /* might be success or disaster */
         goto out;
 
@@ -186,16 +186,16 @@ static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
                 domid = disk->backend_domid;
                 devid = libxl__device_disk_dev_number(disk->vdev, NULL, NULL);
             } else {
-                LOG(ERROR,"device kind not handled by remus: %s",
+                LOG(ERROR,"device kind not handled by checkpoint: %s",
                     libxl__device_kind_to_string(dev->kind));
                 aodev->rc = ERROR_FAIL;
                 goto out;
             }
-            LOG(ERROR,"device not handled by remus"
+            LOG(ERROR,"device not handled by checkpoint"
                 " (device=%s:%"PRId32"/%"PRId32")",
                 libxl__device_kind_to_string(dev->kind),
                 domid, devid);
-            aodev->rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+            aodev->rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
             goto out;
         }
     } while (dev->ops->kind != dev->kind);
@@ -216,32 +216,32 @@ static void all_devices_setup_cb(libxl__egc *egc,
     STATE_AO_GC(multidev->ao);
 
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
 
-    rds->callback(egc, rds, rc);
+    cds->callback(egc, cds, rc);
 }
 
-void libxl__remus_devices_teardown(libxl__egc *egc,
-                                   libxl__remus_devices_state *rds)
+void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds)
 {
     int i;
-    libxl__remus_device *dev;
+    libxl__checkpoint_device *dev;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    libxl__multidev_begin(ao, &rds->multidev);
-    rds->multidev.callback = devices_teardown_cb;
-    for (i = 0; i < rds->num_devices; i++) {
-        dev = rds->devs[i];
+    libxl__multidev_begin(ao, &cds->multidev);
+    cds->multidev.callback = devices_teardown_cb;
+    for (i = 0; i < cds->num_devices; i++) {
+        dev = cds->devs[i];
         if (!dev->ops || !dev->matched)
             continue;
 
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);
         dev->ops->teardown(egc,dev);
     }
 
-    libxl__multidev_prepared(egc, &rds->multidev, 0);
+    libxl__multidev_prepared(egc, &cds->multidev, 0);
 }
 
 static void devices_teardown_cb(libxl__egc *egc,
@@ -253,26 +253,26 @@ static void devices_teardown_cb(libxl__egc *egc,
     STATE_AO_GC(multidev->ao);
 
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
 
     /* clean nic */
-    for (i = 0; i < rds->num_nics; i++)
-        libxl_device_nic_dispose(&rds->nics[i]);
-    free(rds->nics);
-    rds->nics = NULL;
-    rds->num_nics = 0;
+    for (i = 0; i < cds->num_nics; i++)
+        libxl_device_nic_dispose(&cds->nics[i]);
+    free(cds->nics);
+    cds->nics = NULL;
+    cds->num_nics = 0;
 
     /* clean disk */
-    for (i = 0; i < rds->num_disks; i++)
-        libxl_device_disk_dispose(&rds->disks[i]);
-    free(rds->disks);
-    rds->disks = NULL;
-    rds->num_disks = 0;
+    for (i = 0; i < cds->num_disks; i++)
+        libxl_device_disk_dispose(&cds->disks[i]);
+    free(cds->disks);
+    cds->disks = NULL;
+    cds->num_disks = 0;
 
-    cleanup_device_subkind(rds);
+    cleanup_device_subkind(cds);
 
-    rds->callback(egc, rds, rc);
+    cds->callback(egc, cds, rc);
 }
 
 /*----- checkpointing APIs -----*/
@@ -285,33 +285,33 @@ static void devices_checkpoint_cb(libxl__egc *egc,
 
 /* API implementations */
 
-#define define_remus_checkpoint_api(api)                                \
-void libxl__remus_devices_##api(libxl__egc *egc,                        \
-                                libxl__remus_devices_state *rds)        \
+#define define_checkpoint_api(api)                                \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
+                                libxl__checkpoint_devices_state *cds)        \
 {                                                                       \
     int i;                                                              \
-    libxl__remus_device *dev;                                           \
+    libxl__checkpoint_device *dev;                                           \
                                                                         \
-    STATE_AO_GC(rds->ao);                                               \
+    STATE_AO_GC(cds->ao);                                               \
                                                                         \
-    libxl__multidev_begin(ao, &rds->multidev);                          \
-    rds->multidev.callback = devices_checkpoint_cb;                     \
-    for (i = 0; i < rds->num_devices; i++) {                            \
-        dev = rds->devs[i];                                             \
+    libxl__multidev_begin(ao, &cds->multidev);                          \
+    cds->multidev.callback = devices_checkpoint_cb;                     \
+    for (i = 0; i < cds->num_devices; i++) {                            \
+        dev = cds->devs[i];                                             \
         if (!dev->matched || !dev->ops->api)                            \
             continue;                                                   \
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);\
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);\
         dev->ops->api(egc,dev);                                         \
     }                                                                   \
                                                                         \
-    libxl__multidev_prepared(egc, &rds->multidev, 0);                   \
+    libxl__multidev_prepared(egc, &cds->multidev, 0);                   \
 }
 
-define_remus_checkpoint_api(postsuspend);
+define_checkpoint_api(postsuspend);
 
-define_remus_checkpoint_api(preresume);
+define_checkpoint_api(preresume);
 
-define_remus_checkpoint_api(commit);
+define_checkpoint_api(commit);
 
 static void devices_checkpoint_cb(libxl__egc *egc,
                                   libxl__multidev *multidev,
@@ -320,8 +320,8 @@ static void devices_checkpoint_cb(libxl__egc *egc,
     STATE_AO_GC(multidev->ao);
 
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
 
-    rds->callback(egc, rds, rc);
+    cds->callback(egc, cds, rc);
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8e9e57d..0380408 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2796,9 +2796,9 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
-/*----- remus device related state structure -----*/
+/*----- checkpoint device related state structure -----*/
 /*
- * The abstract Remus device layer exposes a common
+ * The abstract checkpoint device layer exposes a common
  * set of API to [external] libxl for manipulating devices attached to
  * a guest protected by Remus. The device layer also exposes a set of
  * [internal] interfaces that every device type must implement.
@@ -2806,34 +2806,34 @@ typedef struct libxl__save_helper_state {
  * The following API are exposed to libxl:
  *
  * One-time configuration operations:
- *  +libxl__remus_devices_setup
+ *  +libxl__checkpoint_devices_setup
  *    > Enable output buffering for NICs, setup disk replication, etc.
- *  +libxl__remus_devices_teardown
+ *  +libxl__checkpoint_devices_teardown
  *    > Disable output buffering and disk replication; teardown any
  *       associated external setups like qdiscs for NICs.
  *
  * Operations executed every checkpoint (in order of invocation):
- *  +libxl__remus_devices_postsuspend
- *  +libxl__remus_devices_preresume
- *  +libxl__remus_devices_commit
+ *  +libxl__checkpoint_devices_postsuspend
+ *  +libxl__checkpoint_devices_preresume
+ *  +libxl__checkpoint_devices_commit
  *
  * Each device type needs to implement the interfaces specified in
- * the libxl__remus_device_instance_ops if it wishes to support Remus.
+ * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the Remus device layer is shown below:
+ * The high-level control flow through the checkpoint device layer is shown below:
  *
  * xl remus
  *  |->  libxl_domain_remus_start
- *    |-> libxl__remus_devices_setup
- *      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
+ *    |-> libxl__checkpoint_devices_setup
+ *      |-> Per-checkpoint libxl__checkpoint_devices_[postsuspend,preresume,commit]
  *        ...
  *        |-> On backup failure, network error or other internal errors:
- *            libxl__remus_devices_teardown
+ *            libxl__checkpoint_devices_teardown
  */
 
-typedef struct libxl__remus_device libxl__remus_device;
-typedef struct libxl__remus_devices_state libxl__remus_devices_state;
-typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops;
+typedef struct libxl__checkpoint_device libxl__checkpoint_device;
+typedef struct libxl__checkpoint_devices_state libxl__checkpoint_devices_state;
+typedef struct libxl__checkpoint_device_instance_ops libxl__checkpoint_device_instance_ops;
 
 /*
  * Interfaces to be implemented by every device subkind that wishes to
@@ -2843,7 +2843,7 @@ typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops
  * synchronous and call dev->aodev.callback directly (as the last
  * thing they do).
  */
-struct libxl__remus_device_instance_ops {
+struct libxl__checkpoint_device_instance_ops {
     /* the device kind this ops belongs to... */
     libxl__device_kind kind;
 
@@ -2854,12 +2854,12 @@ struct libxl__remus_device_instance_ops {
      * Asynchronous.
      */
 
-    void (*postsuspend)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*preresume)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*commit)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*postsuspend)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*preresume)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*commit)(libxl__egc *egc, libxl__checkpoint_device *dev);
 
     /*
-     * setup() and teardown() are refer to the actual remus device.
+     * setup() and teardown() are refer to the actual checkpoint device.
      * Asynchronous.
      * teardown is called even if setup fails.
      */
@@ -2868,45 +2868,45 @@ struct libxl__remus_device_instance_ops {
      * device. If matched, the device will then be managed with this set of
      * subkind operations.
      * Yields 0 if the device successfully set up.
-     * REMUS_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
+     * CHECKPOINT_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
      * any other rc indicates failure.
      */
-    void (*setup)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*teardown)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*setup)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*teardown)(libxl__egc *egc, libxl__checkpoint_device *dev);
 };
 
-int init_subkind_nic(libxl__remus_devices_state *rds);
-void cleanup_subkind_nic(libxl__remus_devices_state *rds);
-int init_subkind_drbd_disk(libxl__remus_devices_state *rds);
-void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds);
+int init_subkind_nic(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds);
+int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
-typedef void libxl__remus_callback(libxl__egc *,
-                                   libxl__remus_devices_state *, int rc);
+typedef void libxl__checkpoint_callback(libxl__egc *,
+                                   libxl__checkpoint_devices_state *, int rc);
 
 /*
- * State associated with a remus invocation, including parameters
- * passed to the remus abstract device layer by the remus
+ * State associated with a checkpoint invocation, including parameters
+ * passed to the checkpoint abstract device layer by the remus
  * save/restore machinery.
  */
-struct libxl__remus_devices_state {
-    /*---- must be set by caller of libxl__remus_device_(setup|teardown) ----*/
+struct libxl__checkpoint_devices_state {
+    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
 
     libxl__ao *ao;
     uint32_t domid;
-    libxl__remus_callback *callback;
+    libxl__checkpoint_callback *callback;
     int device_kind_flags;
 
     /*----- private for abstract layer only -----*/
 
     int num_devices;
     /*
-     * this array is allocated before setup the remus devices by the
-     * remus abstract layer.
-     * devs may be NULL, means there's no remus devices that has been set up.
+     * this array is allocated before setup the checkpoint devices by the
+     * checkpoint abstract layer.
+     * devs may be NULL, means there's no checkpoint devices that has been set up.
      * the size of this array is 'num_devices', which is the total number
      * of libxl nic devices and disk devices(num_nics + num_disks).
      */
-    libxl__remus_device **devs;
+    libxl__checkpoint_device **devs;
 
     libxl_device_nic *nics;
     int num_nics;
@@ -2928,20 +2928,20 @@ struct libxl__remus_devices_state {
 
 /*
  * Information about a single device being handled by remus.
- * Allocated by the remus abstract layer.
+ * Allocated by the checkpoint abstract layer.
  */
-struct libxl__remus_device {
+struct libxl__checkpoint_device {
     /*----- shared between abstract and concrete layers -----*/
     /*
      * if this is true, that means the subkind ops match the device
      */
     bool matched;
 
-    /*----- set by remus device abstruct layer -----*/
-    /* libxl__device_* which this remus device related to */
+    /*----- set by checkpoint device abstruct layer -----*/
+    /* libxl__device_* which this checkpoint device related to */
     const void *backend_dev;
     libxl__device_kind kind;
-    libxl__remus_devices_state *rds;
+    libxl__checkpoint_devices_state *cds;
     libxl__ao_device aodev;
 
     /*----- private for abstract layer only -----*/
@@ -2952,7 +2952,7 @@ struct libxl__remus_device {
      * individual devices.
      */
     int ops_index;
-    const libxl__remus_device_instance_ops *ops;
+    const libxl__checkpoint_device_instance_ops *ops;
 
     /*----- private for concrete (device-specific) layer -----*/
 
@@ -2960,17 +2960,17 @@ struct libxl__remus_device {
     void *concrete_data;
 };
 
-/* the following 5 APIs are async ops, call rds->callback when done */
-_hidden void libxl__remus_devices_setup(libxl__egc *egc,
-                                        libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
-                                           libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
-                                              libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
-                                            libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_commit(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds);
+/* the following 5 APIs are async ops, call cds->callback when done */
+_hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                           libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
+                                              libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
+                                            libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
+                                         libxl__checkpoint_devices_state *cds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Legacy conversion helper -----*/
@@ -3130,7 +3130,7 @@ struct libxl__domain_save_state {
     int hvm;
     int xcflags;
     libxl__domain_suspend_state dsps;
-    libxl__remus_devices_state rds;
+    libxl__checkpoint_devices_state cds;
     libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__stream_write_state sws;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index c245a4e..33c2a42 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -38,21 +38,21 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 1;
 }
 
-int init_subkind_nic(libxl__remus_devices_state *rds)
+int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     int rc, ret;
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->nlsock = nl_socket_alloc();
-    if (!rds->nlsock) {
+    cds->nlsock = nl_socket_alloc();
+    if (!cds->nlsock) {
         LOG(ERROR, "cannot allocate nl socket");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+    ret = nl_connect(cds->nlsock, NETLINK_ROUTE);
     if (ret) {
         LOG(ERROR, "failed to open netlink socket: %s",
             nl_geterror(ret));
@@ -61,7 +61,7 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
     }
 
     /* get list of all qdiscs installed on network devs. */
-    ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+    ret = rtnl_qdisc_alloc_cache(cds->nlsock, &cds->qdisc_cache);
     if (ret) {
         LOG(ERROR, "failed to allocate qdisc cache: %s",
             nl_geterror(ret));
@@ -70,9 +70,9 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
     }
 
     if (dss->remus->netbufscript) {
-        rds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+        cds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
     } else {
-        rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+        cds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
                                       libxl__xen_script_dir_path());
     }
 
@@ -82,22 +82,22 @@ out:
     return rc;
 }
 
-void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* free qdisc cache */
-    if (rds->qdisc_cache) {
-        nl_cache_clear(rds->qdisc_cache);
-        nl_cache_free(rds->qdisc_cache);
-        rds->qdisc_cache = NULL;
+    if (cds->qdisc_cache) {
+        nl_cache_clear(cds->qdisc_cache);
+        nl_cache_free(cds->qdisc_cache);
+        cds->qdisc_cache = NULL;
     }
 
     /* close & free nlsock */
-    if (rds->nlsock) {
-        nl_close(rds->nlsock);
-        nl_socket_free(rds->nlsock);
-        rds->nlsock = NULL;
+    if (cds->nlsock) {
+        nl_close(cds->nlsock);
+        nl_socket_free(cds->nlsock);
+        cds->nlsock = NULL;
     }
 }
 
@@ -111,17 +111,17 @@ void cleanup_subkind_nic(libxl__remus_devices_state *rds)
  * it must ONLY be used for remus because if driver domains
  * were in use it would constitute a security vulnerability.
  */
-static const char *get_vifname(libxl__remus_device *dev,
+static const char *get_vifname(libxl__checkpoint_device *dev,
                                const libxl_device_nic *nic)
 {
     const char *vifname = NULL;
     const char *path;
     int rc;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Convenience aliases */
-    const uint32_t domid = dev->rds->domid;
+    const uint32_t domid = dev->cds->domid;
 
     path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
                      libxl__xs_get_dompath(gc, 0), domid, nic->devid);
@@ -144,19 +144,19 @@ static void free_qdisc(libxl__remus_device_nic *remus_nic)
     remus_nic->qdisc = NULL;
 }
 
-static int init_qdisc(libxl__remus_devices_state *rds,
+static int init_qdisc(libxl__checkpoint_devices_state *cds,
                       libxl__remus_device_nic *remus_nic)
 {
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
     struct rtnl_qdisc *qdisc = NULL;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Now that we have brought up REMUS_IFB device with plug qdisc for
      * this vif, so we need to refill the qdisc cache.
      */
-    ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+    ret = nl_cache_refill(cds->nlsock, cds->qdisc_cache);
     if (ret) {
         LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
         rc = ERROR_FAIL;
@@ -164,7 +164,7 @@ static int init_qdisc(libxl__remus_devices_state *rds,
     }
 
     /* get a handle to the REMUS_IFB interface */
-    ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+    ret = rtnl_link_get_kernel(cds->nlsock, 0, remus_nic->ifb, &ifb);
     if (ret) {
         LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
             nl_geterror(ret));
@@ -187,7 +187,7 @@ static int init_qdisc(libxl__remus_devices_state *rds,
      * There is no need to explicitly free this qdisc as its just a
      * reference from the qdisc cache we allocated earlier.
      */
-    qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+    qdisc = rtnl_qdisc_get_by_parent(cds->qdisc_cache, ifindex, TC_H_ROOT);
     if (qdisc) {
         const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
         /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
@@ -231,19 +231,19 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
  * $REMUS_IFB (for teardown)
  * setup/teardown as command line arg.
  */
-static void setup_async_exec(libxl__remus_device *dev, char *op)
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
 {
     int arraysize, nr = 0;
     char **env = NULL, **args = NULL;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
-    libxl__remus_devices_state *rds = dev->rds;
+    libxl__checkpoint_devices_state *cds = dev->cds;
     libxl__async_exec_state *aes = &dev->aodev.aes;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    char *const script = libxl__strdup(gc, rds->netbufscript);
-    const uint32_t domid = rds->domid;
+    char *const script = libxl__strdup(gc, cds->netbufscript);
+    const uint32_t domid = cds->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
     const char *const ifb = remus_nic->ifb;
@@ -269,7 +269,7 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
     args[nr++] = NULL;
     assert(nr == arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", args[0], args[1]);
     aes->env = env;
     aes->args = args;
@@ -286,13 +286,13 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
 
 /* setup() and teardown() */
 
-static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic;
     const libxl_device_nic *nic = dev->backend_dev;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /*
      * thers's no subkind of nic devices, so nic ops is always matched
@@ -330,15 +330,15 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
                                    int rc, int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
-    libxl__remus_devices_state *rds = dev->rds;
+    libxl__checkpoint_devices_state *cds = dev->cds;
     const char *out_path_base, *hotplug_error = NULL;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    const uint32_t domid = rds->domid;
+    const uint32_t domid = cds->domid;
     const int devid = remus_nic->devid;
     const char *const vif = remus_nic->vif;
     const char **const ifb = &remus_nic->ifb;
@@ -377,7 +377,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            rds->netbufscript, vif, hotplug_error);
+            cds->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
@@ -388,17 +388,17 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     }
 
     LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
-    rc = init_qdisc(rds, remus_nic);
+    rc = init_qdisc(cds, remus_nic);
 
 out:
     aodev->rc = rc;
     aodev->callback(egc, aodev);
 }
 
-static void nic_teardown(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     setup_async_exec(dev, "teardown");
 
@@ -418,7 +418,7 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
                                       int rc, int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
     if (status && !rc)
@@ -441,12 +441,12 @@ enum {
 /* API implementations */
 
 static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
-                           libxl__remus_devices_state *rds,
+                           libxl__checkpoint_devices_state *cds,
                            int buffer_op)
 {
     int rc, ret;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     if (buffer_op == tc_buffer_start)
         ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
@@ -458,7 +458,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
         goto out;
     }
 
-    ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    ret = rtnl_qdisc_add(cds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
     if (ret) {
         rc = ERROR_FAIL;
         goto out;
@@ -475,33 +475,33 @@ out:
     return rc;
 }
 
-static void nic_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_postsuspend(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+    rc = remus_netbuf_op(remus_nic, dev->cds, tc_buffer_start);
 
     dev->aodev.rc = rc;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-static void nic_commit(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_commit(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+    rc = remus_netbuf_op(remus_nic, dev->cds, tc_buffer_release);
 
     dev->aodev.rc = rc;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_nic = {
+const libxl__checkpoint_device_instance_ops remus_device_nic = {
     .kind = LIBXL__DEVICE_KIND_VIF,
     .setup = nic_setup,
     .teardown = nic_teardown,
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 3c659c2..4b68152 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,25 +22,25 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
-int init_subkind_nic(libxl__remus_devices_state *rds)
+int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     return 0;
 }
 
-void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     return;
 }
 
-static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     dev->aodev.rc = ERROR_FAIL;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_nic = {
+const libxl__checkpoint_device_instance_ops remus_device_nic = {
     .kind = LIBXL__DEVICE_KIND_VIF,
     .setup = nic_setup,
 };
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index fae2120..d088dad 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -21,9 +21,9 @@
 /*-------------------- Remus setup and teardown ---------------------*/
 
 static void remus_setup_done(libxl__egc *egc,
-                             libxl__remus_devices_state *rds, int rc);
+                             libxl__checkpoint_devices_state *cds, int rc);
 static void remus_setup_failed(libxl__egc *egc,
-                               libxl__remus_devices_state *rds, int rc);
+                               libxl__checkpoint_devices_state *cds, int rc);
 static void remus_checkpoint_stream_written(
     libxl__egc *egc, libxl__stream_write_state *sws, int rc);
 
@@ -31,7 +31,7 @@ void libxl__remus_setup(libxl__egc *egc,
                         libxl__domain_save_state *dss)
 {
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
     const libxl_domain_remus_info *const info = dss->remus;
 
     STATE_AO_GC(dss->ao);
@@ -41,19 +41,19 @@ void libxl__remus_setup(libxl__egc *egc,
             LOG(ERROR, "Remus: No support for network buffering");
             goto out;
         }
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+        cds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
     }
 
     if (libxl_defbool_val(info->diskbuf))
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
+        cds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
 
-    rds->ao = ao;
-    rds->domid = dss->domid;
-    rds->callback = remus_setup_done;
+    cds->ao = ao;
+    cds->domid = dss->domid;
+    cds->callback = remus_setup_done;
 
     dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
 
-    libxl__remus_devices_setup(egc, rds);
+    libxl__checkpoint_devices_setup(egc, cds);
     return;
 
 out:
@@ -61,9 +61,9 @@ out:
 }
 
 static void remus_setup_done(libxl__egc *egc,
-                             libxl__remus_devices_state *rds, int rc)
+                             libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -73,14 +73,14 @@ static void remus_setup_done(libxl__egc *egc,
 
     LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
         dss->domid, rc);
-    rds->callback = remus_setup_failed;
-    libxl__remus_devices_teardown(egc, rds);
+    cds->callback = remus_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, cds);
 }
 
 static void remus_setup_failed(libxl__egc *egc,
-                               libxl__remus_devices_state *rds, int rc)
+                               libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -91,7 +91,7 @@ static void remus_setup_failed(libxl__egc *egc,
 }
 
 static void remus_teardown_done(libxl__egc *egc,
-                                libxl__remus_devices_state *rds,
+                                libxl__checkpoint_devices_state *cds,
                                 int rc);
 void libxl__remus_teardown(libxl__egc *egc,
                            libxl__domain_save_state *dss,
@@ -101,15 +101,15 @@ void libxl__remus_teardown(libxl__egc *egc,
 
     LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
         " teardown Remus devices...", rc);
-    dss->rds.callback = remus_teardown_done;
-    libxl__remus_devices_teardown(egc, &dss->rds);
+    dss->cds.callback = remus_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->cds);
 }
 
 static void remus_teardown_done(libxl__egc *egc,
-                                libxl__remus_devices_state *rds,
+                                libxl__checkpoint_devices_state *cds,
                                 int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -124,10 +124,10 @@ static void remus_teardown_done(libxl__egc *egc,
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dsps, int ok);
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
+                                         libxl__checkpoint_devices_state *cds,
                                          int rc);
 static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
+                                       libxl__checkpoint_devices_state *cds,
                                        int rc);
 
 void libxl__remus_domain_suspend_callback(void *data)
@@ -149,9 +149,9 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     if (rc)
         goto out;
 
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_postsuspend_cb;
-    libxl__remus_devices_postsuspend(egc, rds);
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    cds->callback = remus_devices_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
     return;
 
 out:
@@ -160,10 +160,10 @@ out:
 }
 
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
+                                         libxl__checkpoint_devices_state *cds,
                                          int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     if (rc)
         goto out;
@@ -183,16 +183,16 @@ void libxl__remus_domain_resume_callback(void *data)
     libxl__domain_save_state *dss = shs->caller_state;
     STATE_AO_GC(dss->ao);
 
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_preresume_cb;
-    libxl__remus_devices_preresume(egc, rds);
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    cds->callback = remus_devices_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
 }
 
 static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
+                                       libxl__checkpoint_devices_state *cds,
                                        int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -214,7 +214,7 @@ out:
 /*----- remus asynchronous checkpoint callback -----*/
 
 static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
+                                    libxl__checkpoint_devices_state *cds,
                                     int rc);
 static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs,
@@ -236,7 +236,7 @@ static void remus_checkpoint_stream_written(
     libxl__domain_save_state *dss = CONTAINER_OF(sws, *dss, sws);
 
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
 
     STATE_AO_GC(dss->ao);
 
@@ -245,8 +245,8 @@ static void remus_checkpoint_stream_written(
         goto out;
     }
 
-    rds->callback = remus_devices_commit_cb;
-    libxl__remus_devices_commit(egc, rds);
+    cds->callback = remus_devices_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
 
     return;
 
@@ -255,10 +255,10 @@ out:
 }
 
 static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
+                                    libxl__checkpoint_devices_state *cds,
                                     int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 1c3a88a..4dddc58 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -26,30 +26,30 @@ typedef struct libxl__remus_drbd_disk {
     int ackwait;
 } libxl__remus_drbd_disk;
 
-int init_subkind_drbd_disk(libxl__remus_devices_state *rds)
+int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+    cds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
                                        libxl__xen_script_dir_path());
 
     return 0;
 }
 
-void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds)
+void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
     return;
 }
 
 /*----- helper functions, for async calls -----*/
 static void drbd_async_call(libxl__egc *egc,
-                            libxl__remus_device *dev,
-                            void func(libxl__remus_device *),
+                            libxl__checkpoint_device *dev,
+                            void func(libxl__checkpoint_device *),
                             libxl__ev_child_callback callback)
 {
     int pid, rc;
     libxl__ao_device *aodev = &dev->aodev;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Fork and call */
     pid = libxl__ev_child_fork(gc, &aodev->child, callback);
@@ -82,21 +82,21 @@ static void match_async_exec_cb(libxl__egc *egc,
 
 /* implementations */
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev);
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev);
 
-static void drbd_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     match_async_exec(egc, dev);
 }
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int arraysize, nr = 0, rc;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aodev.aes;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* setup env & args */
     arraysize = 1;
@@ -107,12 +107,12 @@ static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
     arraysize = 3;
     nr = 0;
     GCNEW_ARRAY(aes->args, arraysize);
-    aes->args[nr++] = dev->rds->drbd_probe_script;
+    aes->args[nr++] = dev->cds->drbd_probe_script;
     aes->args[nr++] = disk->pdev_path;
     aes->args[nr++] = NULL;
     assert(nr <= arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
     aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
     aes->callback = match_async_exec_cb;
@@ -136,7 +136,7 @@ static void match_async_exec_cb(libxl__egc *egc,
                                 int rc, int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *drbd_disk;
     const libxl_device_disk *disk = dev->backend_dev;
 
@@ -146,7 +146,7 @@ static void match_async_exec_cb(libxl__egc *egc,
         goto out;
 
     if (status) {
-        rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
         /* BUG: seems to assume that any exit status means `no match' */
         /* BUG: exit status will have been logged as an error */
         goto out;
@@ -171,10 +171,10 @@ out:
     aodev->callback(egc, aodev);
 }
 
-static void drbd_teardown(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_teardown(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *drbd_disk = dev->concrete_data;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     close(drbd_disk->ctl_fd);
     dev->aodev.rc = 0;
@@ -191,9 +191,9 @@ static void checkpoint_async_call_done(libxl__egc *egc,
 /* API implementations */
 
 /* this op will not wait and block, so implement as sync op */
-static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_postsuspend(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
 
@@ -207,16 +207,16 @@ static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
 }
 
 
-static void drbd_preresume_async(libxl__remus_device *dev);
+static void drbd_preresume_async(libxl__checkpoint_device *dev);
 
-static void drbd_preresume(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_preresume(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     drbd_async_call(egc, dev, drbd_preresume_async, checkpoint_async_call_done);
 }
 
-static void drbd_preresume_async(libxl__remus_device *dev)
+static void drbd_preresume_async(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
     int ackwait = rdd->ackwait;
@@ -235,7 +235,7 @@ static void checkpoint_async_call_done(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
 
     STATE_AO_GC(aodev->ao);
@@ -253,7 +253,7 @@ out:
     aodev->callback(egc, aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_drbd_disk = {
+const libxl__checkpoint_device_instance_ops remus_device_drbd_disk = {
     .kind = LIBXL__DEVICE_KIND_VBD,
     .setup = drbd_setup,
     .teardown = drbd_teardown,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 605fb9a..632c009 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -61,8 +61,8 @@ libxl_error = Enumeration("error", [
     (-15, "LOCK_FAIL"),
     (-16, "JSON_CONFIG_EMPTY"),
     (-17, "DEVICE_EXISTS"),
-    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
-    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
+    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
+    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
     (-20, "VNUMA_CONFIG_INVALID"),
     (-21, "DOMAIN_NOTFOUND"),
     (-22, "ABORTED"),
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 14/18] tools/libxl: fix backword compatibility after the automatic renaming
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (12 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 13/18] tools/libxl: rename remus device to checkpoint device Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-01-29 16:32   ` Konrad Rzeszutek Wilk
  2016-01-29  5:27 ` [PATCH v7 15/18] tools/libxl: adjust the indentation Wen Congyang
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

The error code ERROR_REMUS_XXX was introduced in Xen 4.5, and
changed to ERROR_CHECKPOINT_XXX after previous renaming.
The patch fix the backword compatibility.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5e4aede..f380adb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -892,6 +892,18 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
  */
 #define LIBXL_HAVE_CHECKPOINTED_STREAM 1
 
+/*
+ * ERROR_REMUS_XXX error code only exists from Xen 4.5, Xen 4.6 and it
+ * is changed to ERROR_CHECKPOINT_XXX in Xen 4.7
+ */
+#if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION >= 0x040500 \
+                               && LIBXL_API_VERSION < 0x040700
+#define ERROR_REMUS_DEVOPS_DOES_NOT_MATCH \
+        ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH
+#define ERROR_REMUS_DEVICE_NOT_SUPPORTED \
+        ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED
+#endif
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 15/18] tools/libxl: adjust the indentation
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (13 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 14/18] tools/libxl: fix backword compatibility after the automatic renaming Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 16/18] tools/libxl: store remus_ops in checkpoint device state Wen Congyang
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

This is just tidying up after the "tools/libxl: rename remus device
to checkpoint device" patch automatic renaming.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_checkpoint_device.c | 21 +++++++++++----------
 tools/libxl/libxl_internal.h          | 19 +++++++++++--------
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 109cd23..226f159 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -73,9 +73,9 @@ static void devices_teardown_cb(libxl__egc *egc,
 /* checkpoint device setup and teardown */
 
 static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
-                                              libxl__checkpoint_devices_state *cds,
-                                              libxl__device_kind kind,
-                                              void *libxl_dev)
+                                        libxl__checkpoint_devices_state *cds,
+                                        libxl__device_kind kind,
+                                        void *libxl_dev)
 {
     libxl__checkpoint_device *dev = NULL;
 
@@ -89,9 +89,10 @@ static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-                                libxl__checkpoint_devices_state *cds);
+                                     libxl__checkpoint_devices_state *cds);
 
-void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
+void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                     libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
@@ -137,7 +138,7 @@ out:
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-                                libxl__checkpoint_devices_state *cds)
+                                     libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
@@ -285,12 +286,12 @@ static void devices_checkpoint_cb(libxl__egc *egc,
 
 /* API implementations */
 
-#define define_checkpoint_api(api)                                \
-void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
-                                libxl__checkpoint_devices_state *cds)        \
+#define define_checkpoint_api(api)                                      \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                   \
+                                libxl__checkpoint_devices_state *cds)   \
 {                                                                       \
     int i;                                                              \
-    libxl__checkpoint_device *dev;                                           \
+    libxl__checkpoint_device *dev;                                      \
                                                                         \
     STATE_AO_GC(cds->ao);                                               \
                                                                         \
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 0380408..0f2c96b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2820,7 +2820,8 @@ typedef struct libxl__save_helper_state {
  * Each device type needs to implement the interfaces specified in
  * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the checkpoint device layer is shown below:
+ * The high-level control flow through the checkpoint device layer is shown
+ * below:
  *
  * xl remus
  *  |->  libxl_domain_remus_start
@@ -2881,7 +2882,8 @@ int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
-                                   libxl__checkpoint_devices_state *, int rc);
+                                        libxl__checkpoint_devices_state *,
+                                        int rc);
 
 /*
  * State associated with a checkpoint invocation, including parameters
@@ -2889,7 +2891,7 @@ typedef void libxl__checkpoint_callback(libxl__egc *,
  * save/restore machinery.
  */
 struct libxl__checkpoint_devices_state {
-    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
+    /*-- must be set by caller of libxl__checkpoint_device_(setup|teardown) --*/
 
     libxl__ao *ao;
     uint32_t domid;
@@ -2902,7 +2904,8 @@ struct libxl__checkpoint_devices_state {
     /*
      * this array is allocated before setup the checkpoint devices by the
      * checkpoint abstract layer.
-     * devs may be NULL, means there's no checkpoint devices that has been set up.
+     * devs may be NULL, means there's no checkpoint devices that has been
+     * set up.
      * the size of this array is 'num_devices', which is the total number
      * of libxl nic devices and disk devices(num_nics + num_disks).
      */
@@ -2964,13 +2967,13 @@ struct libxl__checkpoint_device {
 _hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
                                         libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
-                                           libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
-                                              libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
-                                            libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
-                                         libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Legacy conversion helper -----*/
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 16/18] tools/libxl: store remus_ops in checkpoint device state
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (14 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 15/18] tools/libxl: adjust the indentation Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:40   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 17/18] tools/libxl: move remus state into a seperate structure Wen Congyang
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

Checkpoint device is an abstract layer to do checkpoint.
COLO can also use it to do checkpoint. But there are
still some codes in checkpoint device which touch remus.

This patch and:
 tools/libxl: move remus state into a seperate structure
 tools/libxl: seperate device init/cleanup from checkpoint device layer
will seperate remus from checkpoint device layer.

We use remus ops directly in checkpoint device. Store it
in checkpoint device state so that we do not aware of
remus_ops in the checkpoint device layer.

It is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Acked-by:Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_checkpoint_device.c | 10 +---------
 tools/libxl/libxl_internal.h          |  2 ++
 tools/libxl/libxl_remus.c             |  9 +++++++++
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 226f159..bbc6dc4 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,14 +17,6 @@
 
 #include "libxl_internal.h"
 
-extern const libxl__checkpoint_device_instance_ops remus_device_nic;
-extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
-static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
-    &remus_device_nic,
-    &remus_device_drbd_disk,
-    NULL,
-};
-
 /*----- helper functions -----*/
 
 static int init_device_subkind(libxl__checkpoint_devices_state *cds)
@@ -172,7 +164,7 @@ static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
 
     do {
-        dev->ops = remus_ops[++dev->ops_index];
+        dev->ops = dev->cds->ops[++dev->ops_index];
         if (!dev->ops) {
             libxl_device_nic * nic = NULL;
             libxl_device_disk * disk = NULL;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 0f2c96b..ee415fd 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2897,6 +2897,8 @@ struct libxl__checkpoint_devices_state {
     uint32_t domid;
     libxl__checkpoint_callback *callback;
     int device_kind_flags;
+    /* The ops must be pointer array, and the last ops must be NULL. */
+    const libxl__checkpoint_device_instance_ops **ops;
 
     /*----- private for abstract layer only -----*/
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index d088dad..3375331 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -18,6 +18,14 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
+    &remus_device_drbd_disk,
+    NULL,
+};
+
 /*-------------------- Remus setup and teardown ---------------------*/
 
 static void remus_setup_done(libxl__egc *egc,
@@ -50,6 +58,7 @@ void libxl__remus_setup(libxl__egc *egc,
     cds->ao = ao;
     cds->domid = dss->domid;
     cds->callback = remus_setup_done;
+    cds->ops = remus_ops;
 
     dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 17/18] tools/libxl: move remus state into a seperate structure
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (15 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 16/18] tools/libxl: store remus_ops in checkpoint device state Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:41   ` Wei Liu
  2016-01-29  5:27 ` [PATCH v7 18/18] tools/libxl: seperate device init/cleanup from checkpoint device layer Wen Congyang
  2016-01-29 16:43 ` [PATCH v7 00/18] Prerequisite patches for COLO Konrad Rzeszutek Wilk
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

Add a new structure remus state, and move concrete layer's private
member to remus state.
it is pure refactoring and no functional changes.
Init interval in libxl__remus_setup(). It is safe to move this initialisation,
because this value is only used for remus, and remus will use this value after
libxl__remus_setup().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl.c                 |  2 +-
 tools/libxl/libxl_dom_save.c        |  3 +--
 tools/libxl/libxl_internal.h        | 35 +++++++++++++++-----------
 tools/libxl/libxl_netbuffer.c       | 49 +++++++++++++++++++++----------------
 tools/libxl/libxl_remus.c           | 24 ++++++++++++------
 tools/libxl/libxl_remus_disk_drbd.c |  8 +++---
 6 files changed, 72 insertions(+), 49 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index e286329..d08e3b1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -881,7 +881,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Point of no return */
-    libxl__remus_setup(egc, dss);
+    libxl__remus_setup(egc, &dss->rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index ab043f9..7dc1d44 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -392,7 +392,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     }
 
     if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) {
-        dss->interval = r_info->interval;
         if (libxl_defbool_val(r_info->compression))
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
@@ -447,7 +446,7 @@ static void domain_save_done(libxl__egc *egc,
          * from sending checkpoints. Teardown the network buffers and
          * release netlink resources.  This is an async op.
          */
-        libxl__remus_teardown(egc, dss, rc);
+        libxl__remus_teardown(egc, &dss->rs, rc);
         return;
     }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index ee415fd..2492a03 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2896,6 +2896,7 @@ struct libxl__checkpoint_devices_state {
     libxl__ao *ao;
     uint32_t domid;
     libxl__checkpoint_callback *callback;
+    void *concrete_data;
     int device_kind_flags;
     /* The ops must be pointer array, and the last ops must be NULL. */
     const libxl__checkpoint_device_instance_ops **ops;
@@ -2919,16 +2920,6 @@ struct libxl__checkpoint_devices_state {
     int num_disks;
 
     libxl__multidev multidev;
-
-    /*----- private for concrete (device-specific) layer only -----*/
-
-    /* private for nic device subkind ops */
-    char *netbufscript;
-    struct nl_sock *nlsock;
-    struct nl_cache *qdisc_cache;
-
-    /* private for drbd disk subkind ops */
-    char *drbd_probe_script;
 };
 
 /*
@@ -2976,6 +2967,23 @@ _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
                                         libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
                                         libxl__checkpoint_devices_state *cds);
+
+/*----- Remus related state structure -----*/
+typedef struct libxl__remus_state libxl__remus_state;
+struct libxl__remus_state {
+    /* private */
+    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
+    int interval; /* checkpoint interval */
+
+    /*----- private for concrete (device-specific) layer only -----*/
+    /* private for nic device subkind ops */
+    char *netbufscript;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+
+    /* private for drbd disk subkind ops */
+    char *drbd_probe_script;
+};
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Legacy conversion helper -----*/
@@ -3135,9 +3143,8 @@ struct libxl__domain_save_state {
     int hvm;
     int xcflags;
     libxl__domain_suspend_state dsps;
+    libxl__remus_state rs;
     libxl__checkpoint_devices_state cds;
-    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
-    int interval; /* checkpoint interval (for Remus) */
     libxl__stream_write_state sws;
     libxl__logdirty_switch logdirty;
 };
@@ -3551,9 +3558,9 @@ _hidden void libxl__remus_domain_resume_callback(void *data);
 _hidden void libxl__remus_domain_save_checkpoint_callback(void *data);
 /* Remus setup and teardown*/
 _hidden void libxl__remus_setup(libxl__egc *egc,
-                                libxl__domain_save_state *dss);
+                                libxl__remus_state *rs);
 _hidden void libxl__remus_teardown(libxl__egc *egc,
-                                   libxl__domain_save_state *dss,
+                                   libxl__remus_state *rs,
                                    int rc);
 /* Remus callbacks for restore */
 _hidden void libxl__remus_domain_restore_checkpoint_callback(void *data);
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 33c2a42..5c7e8a2 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -42,17 +42,18 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     int rc, ret;
     libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__remus_state *rs = cds->concrete_data;
 
     STATE_AO_GC(cds->ao);
 
-    cds->nlsock = nl_socket_alloc();
-    if (!cds->nlsock) {
+    rs->nlsock = nl_socket_alloc();
+    if (!rs->nlsock) {
         LOG(ERROR, "cannot allocate nl socket");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    ret = nl_connect(cds->nlsock, NETLINK_ROUTE);
+    ret = nl_connect(rs->nlsock, NETLINK_ROUTE);
     if (ret) {
         LOG(ERROR, "failed to open netlink socket: %s",
             nl_geterror(ret));
@@ -61,7 +62,7 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
     }
 
     /* get list of all qdiscs installed on network devs. */
-    ret = rtnl_qdisc_alloc_cache(cds->nlsock, &cds->qdisc_cache);
+    ret = rtnl_qdisc_alloc_cache(rs->nlsock, &rs->qdisc_cache);
     if (ret) {
         LOG(ERROR, "failed to allocate qdisc cache: %s",
             nl_geterror(ret));
@@ -70,10 +71,10 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
     }
 
     if (dss->remus->netbufscript) {
-        cds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+        rs->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
     } else {
-        cds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
-                                      libxl__xen_script_dir_path());
+        rs->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                     libxl__xen_script_dir_path());
     }
 
     rc = 0;
@@ -84,20 +85,22 @@ out:
 
 void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
+    libxl__remus_state *rs = cds->concrete_data;
+
     STATE_AO_GC(cds->ao);
 
     /* free qdisc cache */
-    if (cds->qdisc_cache) {
-        nl_cache_clear(cds->qdisc_cache);
-        nl_cache_free(cds->qdisc_cache);
-        cds->qdisc_cache = NULL;
+    if (rs->qdisc_cache) {
+        nl_cache_clear(rs->qdisc_cache);
+        nl_cache_free(rs->qdisc_cache);
+        rs->qdisc_cache = NULL;
     }
 
     /* close & free nlsock */
-    if (cds->nlsock) {
-        nl_close(cds->nlsock);
-        nl_socket_free(cds->nlsock);
-        cds->nlsock = NULL;
+    if (rs->nlsock) {
+        nl_close(rs->nlsock);
+        nl_socket_free(rs->nlsock);
+        rs->nlsock = NULL;
     }
 }
 
@@ -150,13 +153,14 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
     struct rtnl_qdisc *qdisc = NULL;
+    libxl__remus_state *rs = cds->concrete_data;
 
     STATE_AO_GC(cds->ao);
 
     /* Now that we have brought up REMUS_IFB device with plug qdisc for
      * this vif, so we need to refill the qdisc cache.
      */
-    ret = nl_cache_refill(cds->nlsock, cds->qdisc_cache);
+    ret = nl_cache_refill(rs->nlsock, rs->qdisc_cache);
     if (ret) {
         LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
         rc = ERROR_FAIL;
@@ -164,7 +168,7 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
     }
 
     /* get a handle to the REMUS_IFB interface */
-    ret = rtnl_link_get_kernel(cds->nlsock, 0, remus_nic->ifb, &ifb);
+    ret = rtnl_link_get_kernel(rs->nlsock, 0, remus_nic->ifb, &ifb);
     if (ret) {
         LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
             nl_geterror(ret));
@@ -187,7 +191,7 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
      * There is no need to explicitly free this qdisc as its just a
      * reference from the qdisc cache we allocated earlier.
      */
-    qdisc = rtnl_qdisc_get_by_parent(cds->qdisc_cache, ifindex, TC_H_ROOT);
+    qdisc = rtnl_qdisc_get_by_parent(rs->qdisc_cache, ifindex, TC_H_ROOT);
     if (qdisc) {
         const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
         /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
@@ -238,11 +242,12 @@ static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__checkpoint_devices_state *cds = dev->cds;
     libxl__async_exec_state *aes = &dev->aodev.aes;
+    libxl__remus_state *rs = cds->concrete_data;
 
     STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    char *const script = libxl__strdup(gc, cds->netbufscript);
+    char *const script = libxl__strdup(gc, rs->netbufscript);
     const uint32_t domid = cds->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
@@ -333,6 +338,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__remus_state *rs = cds->concrete_data;
     const char *out_path_base, *hotplug_error = NULL;
 
     STATE_AO_GC(cds->ao);
@@ -377,7 +383,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            cds->netbufscript, vif, hotplug_error);
+            rs->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
@@ -445,6 +451,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
                            int buffer_op)
 {
     int rc, ret;
+    libxl__remus_state *rs = cds->concrete_data;
 
     STATE_AO_GC(cds->ao);
 
@@ -458,7 +465,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
         goto out;
     }
 
-    ret = rtnl_qdisc_add(cds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    ret = rtnl_qdisc_add(rs->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
     if (ret) {
         rc = ERROR_FAIL;
         goto out;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 3375331..00e3c80 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -35,9 +35,10 @@ static void remus_setup_failed(libxl__egc *egc,
 static void remus_checkpoint_stream_written(
     libxl__egc *egc, libxl__stream_write_state *sws, int rc);
 
-void libxl__remus_setup(libxl__egc *egc,
-                        libxl__domain_save_state *dss)
+void libxl__remus_setup(libxl__egc *egc, libxl__remus_state *rs)
 {
+    libxl__domain_save_state *dss = CONTAINER_OF(rs, *dss, rs);
+
     /* Convenience aliases */
     libxl__checkpoint_devices_state *const cds = &dss->cds;
     const libxl_domain_remus_info *const info = dss->remus;
@@ -59,6 +60,8 @@ void libxl__remus_setup(libxl__egc *egc,
     cds->domid = dss->domid;
     cds->callback = remus_setup_done;
     cds->ops = remus_ops;
+    cds->concrete_data = rs;
+    rs->interval = info->interval;
 
     dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
 
@@ -103,15 +106,20 @@ static void remus_teardown_done(libxl__egc *egc,
                                 libxl__checkpoint_devices_state *cds,
                                 int rc);
 void libxl__remus_teardown(libxl__egc *egc,
-                           libxl__domain_save_state *dss,
+                           libxl__remus_state *rs,
                            int rc)
 {
+    libxl__domain_save_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+
     EGC_GC;
 
     LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
         " teardown Remus devices...", rc);
-    dss->cds.callback = remus_teardown_done;
-    libxl__checkpoint_devices_teardown(egc, &dss->cds);
+    cds->callback = remus_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, cds);
 }
 
 static void remus_teardown_done(libxl__egc *egc,
@@ -285,9 +293,9 @@ static void remus_devices_commit_cb(libxl__egc *egc,
      */
 
     /* Set checkpoint interval timeout */
-    rc = libxl__ev_time_register_rel(ao, &dss->checkpoint_timeout,
+    rc = libxl__ev_time_register_rel(ao, &dss->rs.checkpoint_timeout,
                                      remus_next_checkpoint,
-                                     dss->interval);
+                                     dss->rs.interval);
 
     if (rc)
         goto out;
@@ -303,7 +311,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   int rc)
 {
     libxl__domain_save_state *dss =
-                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+                            CONTAINER_OF(ev, *dss, rs.checkpoint_timeout);
 
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 4dddc58..844dd66 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -28,10 +28,11 @@ typedef struct libxl__remus_drbd_disk {
 
 int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
+    libxl__remus_state *rs = cds->concrete_data;
     STATE_AO_GC(cds->ao);
 
-    cds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
-                                       libxl__xen_script_dir_path());
+    rs->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                      libxl__xen_script_dir_path());
 
     return 0;
 }
@@ -96,6 +97,7 @@ static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
     int arraysize, nr = 0, rc;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aodev.aes;
+    libxl__remus_state *rs = dev->cds->concrete_data;
     STATE_AO_GC(dev->cds->ao);
 
     /* setup env & args */
@@ -107,7 +109,7 @@ static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
     arraysize = 3;
     nr = 0;
     GCNEW_ARRAY(aes->args, arraysize);
-    aes->args[nr++] = dev->cds->drbd_probe_script;
+    aes->args[nr++] = rs->drbd_probe_script;
     aes->args[nr++] = disk->pdev_path;
     aes->args[nr++] = NULL;
     assert(nr <= arraysize);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v7 18/18] tools/libxl: seperate device init/cleanup from checkpoint device layer
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (16 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 17/18] tools/libxl: move remus state into a seperate structure Wen Congyang
@ 2016-01-29  5:27 ` Wen Congyang
  2016-02-03 19:41   ` Wei Liu
  2016-01-29 16:43 ` [PATCH v7 00/18] Prerequisite patches for COLO Konrad Rzeszutek Wilk
  18 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-01-29  5:27 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Shriram Rajagopalan, Yang Hongyang

we call (init|cleanup)_subkind_nic and (init|cleanup)_subkind_drbd_disk
directly in checkpoint device. Move them to libxl_remus.c, Call them before
calling libxl__checkpoint_devices_setup() or after calling
libxl__checkpoint_devices_teardown().
it is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_checkpoint_device.c | 42 ++---------------------------------
 tools/libxl/libxl_remus.c             | 42 +++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index bbc6dc4..0a16dbb 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,38 +17,6 @@
 
 #include "libxl_internal.h"
 
-/*----- helper functions -----*/
-
-static int init_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-    /* init device subkind-specific state in the libxl ctx */
-    int rc;
-    STATE_AO_GC(cds->ao);
-
-    if (libxl__netbuffer_enabled(gc)) {
-        rc = init_subkind_nic(cds);
-        if (rc) goto out;
-    }
-
-    rc = init_subkind_drbd_disk(cds);
-    if (rc) goto out;
-
-    rc = 0;
-out:
-    return rc;
-}
-
-static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-    /* cleanup device subkind-specific state in the libxl ctx */
-    STATE_AO_GC(cds->ao);
-
-    if (libxl__netbuffer_enabled(gc))
-        cleanup_subkind_nic(cds);
-
-    cleanup_subkind_drbd_disk(cds);
-}
-
 /*----- setup() and teardown() -----*/
 
 /* callbacks */
@@ -86,14 +54,10 @@ static void checkpoint_devices_setup(libxl__egc *egc,
 void libxl__checkpoint_devices_setup(libxl__egc *egc,
                                      libxl__checkpoint_devices_state *cds)
 {
-    int i, rc;
+    int i;
 
     STATE_AO_GC(cds->ao);
 
-    rc = init_device_subkind(cds);
-    if (rc)
-        goto out;
-
     cds->num_devices = 0;
     cds->num_nics = 0;
     cds->num_disks = 0;
@@ -126,7 +90,7 @@ void libxl__checkpoint_devices_setup(libxl__egc *egc,
     return;
 
 out:
-    cds->callback(egc, cds, rc);
+    cds->callback(egc, cds, 0);
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
@@ -263,8 +227,6 @@ static void devices_teardown_cb(libxl__egc *egc,
     cds->disks = NULL;
     cds->num_disks = 0;
 
-    cleanup_device_subkind(cds);
-
     cds->callback(egc, cds, rc);
 }
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 00e3c80..07a1699 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -26,6 +26,38 @@ static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
     NULL,
 };
 
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(cds);
+        if (rc) goto out;
+    }
+
+    rc = init_subkind_drbd_disk(cds);
+    if (rc) goto out;
+
+    rc = 0;
+out:
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(cds);
+
+    cleanup_subkind_drbd_disk(cds);
+}
+
 /*-------------------- Remus setup and teardown ---------------------*/
 
 static void remus_setup_done(libxl__egc *egc,
@@ -63,6 +95,12 @@ void libxl__remus_setup(libxl__egc *egc, libxl__remus_state *rs)
     cds->concrete_data = rs;
     rs->interval = info->interval;
 
+    if (init_device_subkind(cds)) {
+        LOG(ERROR, "Remus: failed to init device subkind for guest %u",
+            dss->domid);
+        goto out;
+    }
+
     dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
 
     libxl__checkpoint_devices_setup(egc, cds);
@@ -99,6 +137,8 @@ static void remus_setup_failed(libxl__egc *egc,
         LOG(ERROR, "Remus: failed to teardown device after setup failed"
             " for guest with domid %u, rc %d", dss->domid, rc);
 
+    cleanup_device_subkind(cds);
+
     dss->callback(egc, dss, rc);
 }
 
@@ -133,6 +173,8 @@ static void remus_teardown_done(libxl__egc *egc,
         LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
             " rc %d", dss->domid, rc);
 
+    cleanup_device_subkind(cds);
+
     dss->callback(egc, dss, rc);
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c
  2016-01-29  5:27 ` [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c Wen Congyang
@ 2016-01-29 16:29   ` Konrad Rzeszutek Wilk
  2016-02-03 19:39   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:29 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:18PM +0800, Wen Congyang wrote:
> After previous refactoring, we are now able to move all remus code
> into a separate file libxl_remus.c.
> 
> Export following functions for internal use:
> - Remus callbacks
>   * libxl__remus_domain_suspend_callback
>   * libxl__remus_domain_resume_callback
>   * libxl__remus_domain_save_checkpoint_callback
>   * libxl__remus_domain_restore_checkpoint_callback
> - setup/teardown Remus:
>   * libxl__remus_setup
>   * libxl__remus_teardown
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by:Ian Campbell <ian.campbell@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests
  2016-01-29  5:27 ` [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests Wen Congyang
@ 2016-01-29 16:30   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:30 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:21PM +0800, Wen Congyang wrote:
> Before this patch:
> 1. suspend
> a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
>    request to the guest). If the guest doesn't support evtchn, the xenstore
>    variant will be used, suspending the guest via XenBus control node.
> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
>    the guest
> 
> 2. Resume:
> a. fast path(fast=1)
>    Do not change the guest state. We call libxl__domain_resume(.., 1) which
>    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
>    PV:       modify the return code to 1, and than call the domctl:
>              XEN_DOMCTL_resumedomain
>    PVHVM:    same with PV
>    pure HVM: do nothing in modify_returncode, and than call the domctl:
>              XEN_DOMCTL_resumedomain
> b. slow
>    Used when the guest's state have been changed. Will call
>    libxl__domain_resume(..., 0) to resume the guest.
>    PV:       update start info, and reset all secondary CPU states. Than call
>              the domctl: XEN_DOMCTL_resumedomain
>    PVHVM:    can not be resumed. You will get the following error message:
>                  "Cannot resume uncooperative HVM guests"
>    purt HVM: same with PVHVM
> 
> After this patch:
> 1. suspend
>    unchanged
> 
> 2. Resume
> a. fast path:
>    unchanged
> b. slow
>    PV:       unchanged
>    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
>              don't modify the return code, the PV driver will disconnect
>              and reconnect.
>              The guest ends up doing the XENMAPSPACE_shared_info
>              XENMEM_add_to_physmap hypercall and resetting all of its CPU
>              states to point to the shared_info(well except the ones past 32).
>              That is the Linux kernel does that - regardless whether the
>              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
>    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> 
> Under COLO, we will update the guest's state(modify memory, cpu's registers,
> device status...). In this case, we cannot use the fast path to resume it.
> Keep the return code 0, and use a slow path to resume the guest. While
> resuming HVM using slow path is not supported currently, this patch is to
> make the resume call to not fail.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c
  2016-01-29  5:27 ` [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c Wen Congyang
@ 2016-01-29 16:30   ` Konrad Rzeszutek Wilk
  2016-02-03 19:39   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:30 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:19PM +0800, Wen Congyang wrote:
> This is purely code motion.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state
  2016-01-29  5:27 ` [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state Wen Congyang
@ 2016-01-29 16:31   ` Konrad Rzeszutek Wilk
  2016-02-03 19:39   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:31 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:20PM +0800, Wen Congyang wrote:
> Currently struct libxl__domain_suspend_state contains 2 type of states,
> one is save state, another is suspend state. This patch separates those
> two out.
> The motivation of this is that COLO will need to do suspend/resume
> continuously, we need a more common suspend state.
> 
> After this change, dss stands for libxl__domain_save_state,
> dsps stands for libxl__domain_suspend_state.
> 
> Also introduce libxl__domain_suspend_init to initialise the
> libxl__domain_suspend_state.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by:Ian Campbell <ian.campbell@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 14/18] tools/libxl: fix backword compatibility after the automatic renaming
  2016-01-29  5:27 ` [PATCH v7 14/18] tools/libxl: fix backword compatibility after the automatic renaming Wen Congyang
@ 2016-01-29 16:32   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:32 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:30PM +0800, Wen Congyang wrote:
> The error code ERROR_REMUS_XXX was introduced in Xen 4.5, and
> changed to ERROR_CHECKPOINT_XXX after previous renaming.
> The patch fix the backword compatibility.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state
  2016-01-29  5:27 ` [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state Wen Congyang
@ 2016-01-29 16:34   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:34 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Anthony Perard,
	Gui Jianfeng, Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:24PM +0800, Wen Congyang wrote:
> In normal migration, the qemu state is passed to qemu as a parameter.
> With COLO, secondary vm is running. So we will do the following steps
> at every checkpoint:
> 1. suspend both primary vm and secondary vm
> 2. sync the state
> 3. resume both primary vm and secondary vm
> Primary will send qemu's state in step2, and secondary's qemu should
> read it and restore the state before it is resumed. We can not pass
> the state to qemu as a parameter because secondary QEMU already started
> at this point, so we introduce libxl__domain_restore_device_model() to
> do it. This API MUST be called before resuming secondary vm.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Cc: Anthony Perard <anthony.perard@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  2016-01-29  5:27 ` [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Wen Congyang
@ 2016-01-29 16:34   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:34 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:25PM +0800, Wen Congyang wrote:
> Secondary vm is running in COLO mode, we need to send secondary
> vm's dirty page information to primary host at checkpoint, so we
> have to enable qemu logdirty on secondary.
> 
> libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
> qemu logdirty. But it uses libxl__domain_save_state, and calls
> libxl__xc_domain_saverestore_async_callback_done() before exits.
> This can not be used for secondary vm.
> 
> Update libxl__domain_suspend_common_switch_qemu_logdirty() to
> introduce a new API libxl__domain_common_switch_qemu_logdirty().
> This API only uses libxl__logdirty_switch, and calls
> lds->callback before exits. This new API will be used by the patch:
>   secondary vm suspend/resume/checkpoint code
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Reivewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream
  2016-01-29  5:27 ` [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream Wen Congyang
@ 2016-01-29 16:34   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:34 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:22PM +0800, Wen Congyang wrote:
> Introduce enum type libxl_checkpointed_stream in IDL.
> rename the last argument of migrate_receive from "remus" to
> "checkpointed" since the semantics of this parameter has
> changed.
> 
> NOTE:
>  libxl_domain_restore_params and domain_create aren't changed here,
>  checkpointed_stream is still an int. Because we will pass the
>  value from libxl to libxc.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc
  2016-01-29  5:27 ` [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc Wen Congyang
@ 2016-01-29 16:35   ` Konrad Rzeszutek Wilk
  2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:35 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:23PM +0800, Wen Congyang wrote:
> Pass checkpointed_stream from libxl to libxc.
> It won't affact legacy migration because legacy migration
> won't use this param.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc
  2016-01-29  5:27 ` [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Wen Congyang
@ 2016-01-29 16:38   ` Konrad Rzeszutek Wilk
  2016-02-01  5:39     ` Wen Congyang
  2016-02-03 19:40   ` Wei Liu
  1 sibling, 1 reply; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:38 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote:
> In COLO mode, both VMs are running, and are considered in sync if the
> visible network traffic is identical.  After some time, they fall out of
> sync.
> 
> At this point, the two VMs have definitely diverged.  Lets call the
> primary dirty bitmap set A, while the secondary dirty bitmap set B.
> 
> Sets A and B are different.
> 
> Under normal migration, the page data for set A will be sent from the
> primary to the secondary.
> 
> However, the set difference B - A (the one in B but not in A, lets
> call this C) is out-of-date on the secondary (with respect to the
> primary) and will not be sent by the primary (to secondary), as it
> was not memory dirtied by the primary. The secondary needs C page data
> to reconstruct an exact copy of the primary at the checkpoint.
> 
> The secondary cannot calculate C as it doesn't know A.  Instead, the
> secondary must send B to the primary, at which point the primary
> calculates the union of A and B (lets call this D) which is all the
> pages dirtied by both the primary and the secondary, and sends all page
> data covered by D.
> 
> In the general case, D is a superset of both A and B.  Without the
> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
> copy of the primary.
> 
> We transfer the dirty bitmap on libxc side, so we need to introduce back
> channel to libxc.
> 
> Note: it is different from the paper. We change the original design to
> the current one, according to our following concerns:
> 1. The original design needs extra memory on Secondary host. When there's
>    multiple backups on one host, the memory cost is high.
> 2. The memory cache code will be another 1k+, it will make the review
>    more time consuming.
> 
> Note: the back channel will be used in the patch
>  libxc/restore: send dirty pfn list to primary when checkpoint under COLO
> to send dirty pfn list from secondary to primary. The patch is posted in
> another series.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>

It is a bit confusing to have 'back_fd' and then 'send_fd'. 

Could you change the 'send_fd' (in this patch) to be called 
'send_back_fd' so that the connection between:
 tools/libxl: Add back channel to allow migration target send data back
and this patch is clear?

Or perhaps also add it in the commit description that you are using
the 'send_fd' provided by ' tools/libxl: Add back channel to allow migration target send data back'

Otherwise: Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 00/18] Prerequisite patches for COLO
  2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
                   ` (17 preceding siblings ...)
  2016-01-29  5:27 ` [PATCH v7 18/18] tools/libxl: seperate device init/cleanup from checkpoint device layer Wen Congyang
@ 2016-01-29 16:43 ` Konrad Rzeszutek Wilk
  18 siblings, 0 replies; 56+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-29 16:43 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:16PM +0800, Wen Congyang wrote:
> This patchset is Prerequisite for COLO feature. Refer to:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> 
> Patch status:
> 1. Acked patches: patch 2, 3, 4, 9, 10, 15, 16, 18
> 2. Reviewd patches: patch 1, 10, 13, 15, 16, 17, 18
> 3. New patches: none
> Note: patch 4 is updated to fix a bug
> 
> You can get the codes from here:
> https://github.com/wencongyang/xen/tree/colo_pre_v7

Fantastic!

It made it much easier to review. Thank you.

I've sent Reviewed-by on all the patches - and asked one patch:
"12/18] tools/libx{l,c}: add back channel to libxc"

to be modified a bit. I think it may cement the relationship
between the back_fd using the send_fd.

I am not the maintainer of libxl nor libxc so I cannot commit
it in. However it seems that some of the patches have an Ack
from Ian which means they could be committed in. But they
seem to be not-in order. Which means that

#1, #5-#8, #11, #12, #13-#14, #17 need an Ack from Ian's/Wei.

Hopefully my review will make it easier for them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc
  2016-01-29 16:38   ` Konrad Rzeszutek Wilk
@ 2016-02-01  5:39     ` Wen Congyang
  0 siblings, 0 replies; 56+ messages in thread
From: Wen Congyang @ 2016-02-01  5:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On 01/30/2016 12:38 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote:
>> In COLO mode, both VMs are running, and are considered in sync if the
>> visible network traffic is identical.  After some time, they fall out of
>> sync.
>>
>> At this point, the two VMs have definitely diverged.  Lets call the
>> primary dirty bitmap set A, while the secondary dirty bitmap set B.
>>
>> Sets A and B are different.
>>
>> Under normal migration, the page data for set A will be sent from the
>> primary to the secondary.
>>
>> However, the set difference B - A (the one in B but not in A, lets
>> call this C) is out-of-date on the secondary (with respect to the
>> primary) and will not be sent by the primary (to secondary), as it
>> was not memory dirtied by the primary. The secondary needs C page data
>> to reconstruct an exact copy of the primary at the checkpoint.
>>
>> The secondary cannot calculate C as it doesn't know A.  Instead, the
>> secondary must send B to the primary, at which point the primary
>> calculates the union of A and B (lets call this D) which is all the
>> pages dirtied by both the primary and the secondary, and sends all page
>> data covered by D.
>>
>> In the general case, D is a superset of both A and B.  Without the
>> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
>> copy of the primary.
>>
>> We transfer the dirty bitmap on libxc side, so we need to introduce back
>> channel to libxc.
>>
>> Note: it is different from the paper. We change the original design to
>> the current one, according to our following concerns:
>> 1. The original design needs extra memory on Secondary host. When there's
>>    multiple backups on one host, the memory cost is high.
>> 2. The memory cache code will be another 1k+, it will make the review
>>    more time consuming.
>>
>> Note: the back channel will be used in the patch
>>  libxc/restore: send dirty pfn list to primary when checkpoint under COLO
>> to send dirty pfn list from secondary to primary. The patch is posted in
>> another series.
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
> 
> It is a bit confusing to have 'back_fd' and then 'send_fd'. 
> 
> Could you change the 'send_fd' (in this patch) to be called 
> 'send_back_fd' so that the connection between:
>  tools/libxl: Add back channel to allow migration target send data back
> and this patch is clear?
> 
> Or perhaps also add it in the commit description that you are using
> the 'send_fd' provided by ' tools/libxl: Add back channel to allow migration target send data back'

Before this series:
In libxl:
we have send_fd/recv_fd(libxl_domain_remus_start()), and only have restore_fd(libxl_domain_create_restore())
In libxc:
We have io_fd(xc_domain_save()/xc_domain_restore())
The fd in libxc is provided by libxl.

I think after this series, we can add the following fd:
1. add a send_back_fd in libxl_domain_create_restore()
2. add a recv_fd in xc_domain_save()
3. add a send_back_fd in xc_domain_restore()

What about this?

Thanks
Wen Congyang

> 
> Otherwise: Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback
  2016-01-29  5:27 ` [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback Wen Congyang
@ 2016-02-03 19:39   ` Wei Liu
  2016-02-04  5:17     ` Wen Congyang
  0 siblings, 1 reply; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:17PM +0800, Wen Congyang wrote:
> init stream {read/write} state checkpoint_callback in Remus setup callback.
> There's no functional change, it's just refactoring so that we can move
> all remus code into one file.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxl/libxl.c          |  2 ++
>  tools/libxl/libxl_create.c   | 10 +++++++++-
>  tools/libxl/libxl_dom.c      |  5 +----
>  tools/libxl/libxl_internal.h |  4 ++++
>  4 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 94b5656..5346a0c 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -917,6 +917,8 @@ static void libxl__remus_setup(libxl__egc *egc,
>      rds->domid = dss->domid;
>      rds->callback = remus_setup_done;
>  
> +    dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
> +
>      libxl__remus_devices_setup(egc, rds);
>      return;
>  
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index e491d83..8b1efe5 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -718,6 +718,12 @@ static void remus_checkpoint_stream_done(
>      libxl__xc_domain_saverestore_async_callback_done(egc, &stream->shs, rc);
>  }
>  
> +static void libxl__remus_restore_setup(libxl__egc *egc,
> +                                       libxl__domain_create_state *dcs)
> +{
> +    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
> +}
> +
>  /*----- main domain creation -----*/
>  
>  /* We have a linear control flow; only one event callback is
> @@ -1004,6 +1010,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>      libxl__domain_build_state *const state = &dcs->build_state;
>      libxl__srm_restore_autogen_callbacks *const callbacks =
>          &dcs->srs.shs.callbacks.restore.a;
> +    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
>  
>      if (rc) {
>          domcreate_rebuild_done(egc, dcs, rc);
> @@ -1042,9 +1049,10 @@ static void domcreate_bootloader_done(libxl__egc *egc,

A few lines above in this function, there is a line like:

    /* Restore */
    callbacks->checkpoint = libxl__remus_domain_restore_checkpoint_callback;

Do you not need to move this into libxl__remus_restore_setup as well? As
far as I can tell that's only useful for remus.

>      dcs->srs.fd = restore_fd;
>      dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
>      dcs->srs.completion_callback = domcreate_stream_done;
> -    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
>  
>      if (restore_fd >= 0) {
> +        if (checkpointed_stream)
> +            libxl__remus_restore_setup(egc, dcs);
>          libxl__stream_read_start(egc, &dcs->srs);
>          return;
>      }
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 2269998..9e28bc4 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1569,8 +1569,6 @@ out:
>  
>  /*----- remus asynchronous checkpoint callback -----*/
>  
> -static void remus_checkpoint_stream_written(
> -    libxl__egc *egc, libxl__stream_write_state *sws, int rc);
>  static void remus_devices_commit_cb(libxl__egc *egc,
>                                      libxl__remus_devices_state *rds,
>                                      int rc);
> @@ -1588,7 +1586,7 @@ static void libxl__remus_domain_save_checkpoint_callback(void *data)
>      libxl__stream_write_start_checkpoint(egc, &dss->sws);
>  }
>  
> -static void remus_checkpoint_stream_written(
> +void remus_checkpoint_stream_written(
>      libxl__egc *egc, libxl__stream_write_state *sws, int rc)
>  {
>      libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
> @@ -1761,7 +1759,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
>          callbacks->suspend = libxl__remus_domain_suspend_callback;
>          callbacks->postcopy = libxl__remus_domain_resume_callback;
>          callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;

Do you not want to move this to libxl__remus_setup?


Wei.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c
  2016-01-29  5:27 ` [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c Wen Congyang
  2016-01-29 16:29   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:39   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:18PM +0800, Wen Congyang wrote:
> After previous refactoring, we are now able to move all remus code
> into a separate file libxl_remus.c.
> 
> Export following functions for internal use:
> - Remus callbacks
>   * libxl__remus_domain_suspend_callback
>   * libxl__remus_domain_resume_callback
>   * libxl__remus_domain_save_checkpoint_callback
>   * libxl__remus_domain_restore_checkpoint_callback
> - setup/teardown Remus:
>   * libxl__remus_setup
>   * libxl__remus_teardown
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by:Ian Campbell <ian.campbell@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c
  2016-01-29  5:27 ` [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c Wen Congyang
  2016-01-29 16:30   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:39   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:19PM +0800, Wen Congyang wrote:
> This is purely code motion.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state
  2016-01-29  5:27 ` [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state Wen Congyang
  2016-01-29 16:31   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:39   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:20PM +0800, Wen Congyang wrote:
> Currently struct libxl__domain_suspend_state contains 2 type of states,
> one is save state, another is suspend state. This patch separates those
> two out.
> The motivation of this is that COLO will need to do suspend/resume
> continuously, we need a more common suspend state.
> 
> After this change, dss stands for libxl__domain_save_state,
> dsps stands for libxl__domain_suspend_state.
> 
> Also introduce libxl__domain_suspend_init to initialise the
> libxl__domain_suspend_state.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by:Ian Campbell <ian.campbell@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests
  2016-01-29  5:27 ` [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests Wen Congyang
  2016-01-29 16:30   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:40   ` Wei Liu
  2016-02-04  5:30     ` Wen Congyang
  1 sibling, 1 reply; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:21PM +0800, Wen Congyang wrote:
> Before this patch:
> 1. suspend
> a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
>    request to the guest). If the guest doesn't support evtchn, the xenstore
>    variant will be used, suspending the guest via XenBus control node.
> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
>    the guest
> 
> 2. Resume:
> a. fast path(fast=1)
>    Do not change the guest state. We call libxl__domain_resume(.., 1) which
>    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
>    PV:       modify the return code to 1, and than call the domctl:
>              XEN_DOMCTL_resumedomain
>    PVHVM:    same with PV
>    pure HVM: do nothing in modify_returncode, and than call the domctl:

"then"

>              XEN_DOMCTL_resumedomain
> b. slow
>    Used when the guest's state have been changed. Will call
>    libxl__domain_resume(..., 0) to resume the guest.
>    PV:       update start info, and reset all secondary CPU states. Than call
>              the domctl: XEN_DOMCTL_resumedomain
>    PVHVM:    can not be resumed. You will get the following error message:
>                  "Cannot resume uncooperative HVM guests"
>    purt HVM: same with PVHVM

"pure"

> 
> After this patch:
> 1. suspend
>    unchanged
> 
> 2. Resume
> a. fast path:
>    unchanged
> b. slow
>    PV:       unchanged
>    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
>              don't modify the return code, the PV driver will disconnect
>              and reconnect.
>              The guest ends up doing the XENMAPSPACE_shared_info
>              XENMEM_add_to_physmap hypercall and resetting all of its CPU
>              states to point to the shared_info(well except the ones past 32).
>              That is the Linux kernel does that - regardless whether the
>              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
>    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.

In summary, this patch only changes slow path resume. Further more, it
only affects PVHVM and pure HVM variants.

With you patch, pure HVM is able to resume with effectively the same
path via XEN_DOMCTL_resumedomain, albeit it is done in two functions
(_cooperative and _any).

And according to the recently change in documentation, slow path is
always safe.

I think the commit message can be simplified a bit. This is assuming
using XEN_DOMCTL_resumedomain to resume (PV)HVM in slow path is safe.

===

Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path

Previously it was not possible to resume PVHVM or pure HVM guest in slow
path because libxc didn't support that.

Using XEN_DOMCTL_resumedomain without modifying guest state  to resume a
guest is considered to be always safe.  Introduce a function to do that
for (PV)HVM guests in slow path resume.

This patch fixes a bug that denies (PV)HVM slow path resume.  This will
enable COLO to work properly:  COLO requires HVM guest to start in the
new context that has been set up by COLO, hence slow path resume is
required.

===

Does this sound right? Especially the wording about safety.

Ian and Ian, you seemed to have suggested Congyang to write the above
commit message. What do you think about my updated one?

> 
> Under COLO, we will update the guest's state(modify memory, cpu's registers,
> device status...). In this case, we cannot use the fast path to resume it.
> Keep the return code 0, and use a slow path to resume the guest. While
> resuming HVM using slow path is not supported currently, this patch is to
> make the resume call to not fail.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> ---
>  tools/libxc/xc_resume.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index 87d4324..4a9b035 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -108,6 +108,26 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>      return do_domctl(xch, &domctl);
>  }
>  
> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
> +{
> +    DECLARE_DOMCTL;
> +
> +    /*
> +     * The domctl XEN_DOMCTL_resumedomain unpause each vcpu. After
> +     * the domctl, the guest will run.
> +     *
> +     * If it is PVHVM, the guest called the hypercall
> +     *    SCHEDOP_shutdown:SHUTDOWN_suspend
> +     * to suspend itself. We don't modify the return code, so the PV driver
> +     * will disconnect and reconnect.
> +     *
> +     * If it is a HVM, the guest will continue running.
> +     */
> +    domctl.cmd = XEN_DOMCTL_resumedomain;
> +    domctl.domain = domid;
> +    return do_domctl(xch, &domctl);
> +}
> +
>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>  {
>      DECLARE_DOMCTL;
> @@ -137,10 +157,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>       */
>  #if defined(__i386__) || defined(__x86_64__)
>      if ( info.hvm )
> -    {
> -        ERROR("Cannot resume uncooperative HVM guests");
> -        return rc;
> -    }
> +        return xc_domain_resume_hvm(xch, domid);
>  
>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>      {
> -- 
> 2.5.0
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream
  2016-01-29  5:27 ` [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream Wen Congyang
  2016-01-29 16:34   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:22PM +0800, Wen Congyang wrote:
> Introduce enum type libxl_checkpointed_stream in IDL.
> rename the last argument of migrate_receive from "remus" to
> "checkpointed" since the semantics of this parameter has
> changed.
> 
> NOTE:
>  libxl_domain_restore_params and domain_create aren't changed here,
>  checkpointed_stream is still an int. Because we will pass the
>  value from libxl to libxc.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc
  2016-01-29  5:27 ` [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc Wen Congyang
  2016-01-29 16:35   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:40   ` Wei Liu
  2016-02-04  5:18     ` Wen Congyang
  1 sibling, 1 reply; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:23PM +0800, Wen Congyang wrote:
> Pass checkpointed_stream from libxl to libxc.
> It won't affact legacy migration because legacy migration
> won't use this param.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

With one nit below.

>  
> -    if ( ctx->save.debug && !ctx->save.checkpointed )
> +    if ( ctx->save.debug &&
> +         ctx->save.checkpointed != MIG_STREAM_NONE )

You can fold this line to previous one.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state
  2016-01-29  5:27 ` [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state Wen Congyang
  2016-01-29 16:34   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:40   ` Wei Liu
  2016-02-04  5:24     ` Wen Congyang
  1 sibling, 1 reply; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Shriram Rajagopalan,
	Dong Eddie, Gui Jianfeng, Anthony Perard, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:24PM +0800, Wen Congyang wrote:
> In normal migration, the qemu state is passed to qemu as a parameter.
> With COLO, secondary vm is running. So we will do the following steps
> at every checkpoint:
> 1. suspend both primary vm and secondary vm
> 2. sync the state
> 3. resume both primary vm and secondary vm
> Primary will send qemu's state in step2, and secondary's qemu should
> read it and restore the state before it is resumed. We can not pass
> the state to qemu as a parameter because secondary QEMU already started
> at this point, so we introduce libxl__domain_restore_device_model() to
> do it. This API MUST be called before resuming secondary vm.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> ---
>  tools/libxl/libxl_dom_save.c | 20 ++++++++++++++++++++
>  tools/libxl/libxl_internal.h |  4 ++++
>  tools/libxl/libxl_qmp.c      | 10 ++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
> index cd2e7de..7383d2d 100644
> --- a/tools/libxl/libxl_dom_save.c
> +++ b/tools/libxl/libxl_dom_save.c
> @@ -518,6 +518,26 @@ int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
>      return rc;
>  }
>  
> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
> +                                       const char *restore_file)
> +{
> +    int rc;
> +
> +    switch (libxl__device_model_version_running(gc, domid)) {
> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +        /* Will never be supported. */
> +        rc = ERROR_INVAL;
> +        break;

I'm not entirely sure if this statement would be true. The function name
is generic enough to indicate this case should be supported.

However, this function is not used anywhere in this series, so I don't
know whether my comment makes sense.

One way of moving forward is to stick this patch to COLO series itself.
Let's skip this in this prerequisite series.

> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +        rc = libxl__qmp_restore(gc, domid, restore_file);
> +        break;
> +    default:
> +        rc = ERROR_INVAL;
> +    }
> +
> +    return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index fbd1acb..896c119 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1117,6 +1117,8 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
>                                   const char *old_name, const char *new_name,
>                                   xs_transaction_t trans);
>  
> +_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
> +                                               const char *restore_file);
>  _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
>  
>  _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
> @@ -1760,6 +1762,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
>  _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
>  /* Save current QEMU state into fd. */
>  _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
> +/* Load current QEMU state from fd. */

This comment is wrong, it loads QEMU state from file, not fd.

> +_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
>  /* Set dirty bitmap logging status */
>  _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
>  _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> index 714038b..eec8a44 100644
> --- a/tools/libxl/libxl_qmp.c
> +++ b/tools/libxl/libxl_qmp.c
> @@ -905,6 +905,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
>                             NULL, NULL);
>  }
>  
> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
> +{
> +    libxl__json_object *args = NULL;
> +
> +    qmp_parameters_add_string(gc, &args, "filename", state_file);
> +
> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
> +                           NULL, NULL);
> +}
> +

This looks correct FWIW.

>  static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
>                        char *device, char *target, char *arg)
>  {
> -- 
> 2.5.0
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  2016-01-29  5:27 ` [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Wen Congyang
  2016-01-29 16:34   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:40   ` Wei Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:25PM +0800, Wen Congyang wrote:
> Secondary vm is running in COLO mode, we need to send secondary
> vm's dirty page information to primary host at checkpoint, so we
> have to enable qemu logdirty on secondary.
> 
> libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
> qemu logdirty. But it uses libxl__domain_save_state, and calls
> libxl__xc_domain_saverestore_async_callback_done() before exits.
> This can not be used for secondary vm.
> 
> Update libxl__domain_suspend_common_switch_qemu_logdirty() to
> introduce a new API libxl__domain_common_switch_qemu_logdirty().
> This API only uses libxl__logdirty_switch, and calls
> lds->callback before exits. This new API will be used by the patch:
>   secondary vm suspend/resume/checkpoint code
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 10/18] tools/libxl: export logdirty_init
  2016-01-29  5:27 ` [PATCH v7 10/18] tools/libxl: export logdirty_init Wen Congyang
@ 2016-02-03 19:40   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:26PM +0800, Wen Congyang wrote:
> We need to enable logdirty on secondary, so we export logdirty_init
> for internal use. Rename it to libxl__logdirty_init.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 11/18] tools/libxl: Add back channel to allow migration target send data back
  2016-01-29  5:27 ` [PATCH v7 11/18] tools/libxl: Add back channel to allow migration target send data back Wen Congyang
@ 2016-02-03 19:40   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:27PM +0800, Wen Congyang wrote:
> In COLO mode, secondary needs to send the following data to primary:
> 1. In libxl
>    Secondary sends the following CHECKPOINT_CONTEXT to primary:
>    CHECKPOINT_SVM_SUSPENDED, CHECKPOINT_SVM_READY and CHECKPOINT_SVM_RESUMED
> 2. In libxc
>    Secondary sends the dirty pfn list to primary
> 
> But the io_fd only can be written in primary, and only can be read in
> secondary. Save recv_fd in domain_suspend_state, and send_fd in
> domain_create_state. Extend libxl_domain_create_restore API, add a
> send_fd param to it. Add LIBXL_HAVE_CREATE_RESTORE_SEND_FD to indicate
> the API change.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> ---
>  tools/libxl/libxl.c                  |  2 +-
>  tools/libxl/libxl.h                  | 30 ++++++++++++++++++++++++++++--
>  tools/libxl/libxl_create.c           |  9 +++++----
>  tools/libxl/libxl_internal.h         |  2 ++
>  tools/libxl/xl_cmdimpl.c             |  8 +++++++-
>  tools/ocaml/libs/xl/xenlight_stubs.c |  2 +-
>  6 files changed, 44 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index fc7844d..e286329 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -871,7 +871,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,

The TODO before this function can also be deleted now.

This patch is mostly about stashing the recv_fd. Assuming that is going
to be used later:

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc
  2016-01-29  5:27 ` [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Wen Congyang
  2016-01-29 16:38   ` Konrad Rzeszutek Wilk
@ 2016-02-03 19:40   ` Wei Liu
  2016-02-04  5:28     ` Wen Congyang
  1 sibling, 1 reply; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote:
> In COLO mode, both VMs are running, and are considered in sync if the
> visible network traffic is identical.  After some time, they fall out of
> sync.
> 
> At this point, the two VMs have definitely diverged.  Lets call the
> primary dirty bitmap set A, while the secondary dirty bitmap set B.
> 
> Sets A and B are different.
> 
> Under normal migration, the page data for set A will be sent from the
> primary to the secondary.
> 
> However, the set difference B - A (the one in B but not in A, lets
> call this C) is out-of-date on the secondary (with respect to the
> primary) and will not be sent by the primary (to secondary), as it
> was not memory dirtied by the primary. The secondary needs C page data
> to reconstruct an exact copy of the primary at the checkpoint.
> 
> The secondary cannot calculate C as it doesn't know A.  Instead, the
> secondary must send B to the primary, at which point the primary
> calculates the union of A and B (lets call this D) which is all the
> pages dirtied by both the primary and the secondary, and sends all page
> data covered by D.
> 
> In the general case, D is a superset of both A and B.  Without the
> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
> copy of the primary.
> 
> We transfer the dirty bitmap on libxc side, so we need to introduce back
> channel to libxc.
> 
> Note: it is different from the paper. We change the original design to
> the current one, according to our following concerns:
> 1. The original design needs extra memory on Secondary host. When there's
>    multiple backups on one host, the memory cost is high.
> 2. The memory cache code will be another 1k+, it will make the review
>    more time consuming.
> 
> Note: the back channel will be used in the patch

"will not be used" ?

I don't see any read / write to the newly introduced fd.

>  libxc/restore: send dirty pfn list to primary when checkpoint under COLO
> to send dirty pfn list from secondary to primary. The patch is posted in
> another series.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
[...]
>  
>  /*----- helper execution -----*/
> +static int dup_fd_helper(libxl__gc *gc, int fd, const char *what)
> +{
> +    int dup_fd = fd;
> +
> +    if (fd <= 2) {
> +        dup_fd = dup(fd);
> +        if (dup_fd < 0) {
> +            LOGE(ERROR,"dup %s", what);
> +            exit(-1);
> +        }
> +    }
> +    libxl_fd_set_cloexec(CTX, dup_fd, 0);
> +
> +    return dup_fd;
> +}
>  

It would be better if introduction of this helper to be separated into a
different patch.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 13/18] tools/libxl: rename remus device to checkpoint device
  2016-01-29  5:27 ` [PATCH v7 13/18] tools/libxl: rename remus device to checkpoint device Wen Congyang
@ 2016-02-03 19:40   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:29PM +0800, Wen Congyang wrote:
> This patch is auto generated by the following commands:
>  1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c
>  2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' tools/libxl/Makefile
>  3. perl -pi -e 's/\blibxl__remus_devices/libxl__checkpoint_devices/g' tools/libxl/*.[ch]
>  4. perl -pi -e 's/\blibxl__remus_device\b/libxl__checkpoint_device/g' tools/libxl/*.[ch]
>  5. perl -pi -e 's/\blibxl__remus_device_instance_ops\b/libxl__checkpoint_device_instance_ops/g' tools/libxl/*.[ch]
>  6. perl -pi -e 's/\blibxl__remus_callback\b/libxl__checkpoint_callback/g' tools/libxl/*.[ch]
>  7. perl -pi -e 's/\bremus_device_init\b/checkpoint_device_init/g' tools/libxl/*.[ch]
>  8. perl -pi -e 's/\bremus_devices_setup\b/checkpoint_devices_setup/g' tools/libxl/*.[ch]
>  9. perl -pi -e 's/\bdefine_remus_checkpoint_api\b/define_checkpoint_api/g' tools/libxl/*.[ch]
> 10. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch]
> 11. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] tools/libxl/*.idl
> 12. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] tools/libxl/*.idl
> 13. perl -pi -e 's/\bremus\b/checkpoint/g' tools/libxl/libxl_checkpoint_device.[ch]
> 14. perl -pi -e 's/\bremus device/checkpoint device/g' tools/libxl/libxl_internal.h
> 15. perl -pi -e 's/\bRemus device/checkpoint device/g' tools/libxl/libxl_internal.h
> 16. perl -pi -e 's/\bremus abstract/checkpoint abstract/g' tools/libxl/libxl_internal.h
> 17. perl -pi -e 's/\bremus invocation/checkpoint invocation/g' tools/libxl/libxl_internal.h
> 18. perl -pi -e 's/\blibxl__remus_device_\(/libxl__checkpoint_device_(/g' tools/libxl/libxl_internal.h
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Reviewed-Lightly-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Only skim through:

Acked-by: Wei Liu <wei.liu2@citrix.com>

And please fold your next patch into this one so that it doesn't break
application code.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 15/18] tools/libxl: adjust the indentation
  2016-01-29  5:27 ` [PATCH v7 15/18] tools/libxl: adjust the indentation Wen Congyang
@ 2016-02-03 19:40   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:31PM +0800, Wen Congyang wrote:
> This is just tidying up after the "tools/libxl: rename remus device
> to checkpoint device" patch automatic renaming.
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 16/18] tools/libxl: store remus_ops in checkpoint device state
  2016-01-29  5:27 ` [PATCH v7 16/18] tools/libxl: store remus_ops in checkpoint device state Wen Congyang
@ 2016-02-03 19:40   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:32PM +0800, Wen Congyang wrote:
> Checkpoint device is an abstract layer to do checkpoint.
> COLO can also use it to do checkpoint. But there are
> still some codes in checkpoint device which touch remus.
> 
> This patch and:
>  tools/libxl: move remus state into a seperate structure
>  tools/libxl: seperate device init/cleanup from checkpoint device layer
> will seperate remus from checkpoint device layer.
> 
> We use remus ops directly in checkpoint device. Store it
> in checkpoint device state so that we do not aware of
> remus_ops in the checkpoint device layer.
> 
> It is pure refactoring and no functional changes.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Acked-by:Ian Campbell <ian.campbell@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 17/18] tools/libxl: move remus state into a seperate structure
  2016-01-29  5:27 ` [PATCH v7 17/18] tools/libxl: move remus state into a seperate structure Wen Congyang
@ 2016-02-03 19:41   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:41 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:33PM +0800, Wen Congyang wrote:
> Add a new structure remus state, and move concrete layer's private
> member to remus state.
> it is pure refactoring and no functional changes.
> Init interval in libxl__remus_setup(). It is safe to move this initialisation,
> because this value is only used for remus, and remus will use this value after
> libxl__remus_setup().
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Only skim through this patch. I think it mostly touches remus state and
does what it says it does. So

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 18/18] tools/libxl: seperate device init/cleanup from checkpoint device layer
  2016-01-29  5:27 ` [PATCH v7 18/18] tools/libxl: seperate device init/cleanup from checkpoint device layer Wen Congyang
@ 2016-02-03 19:41   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-03 19:41 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Fri, Jan 29, 2016 at 01:27:34PM +0800, Wen Congyang wrote:
> we call (init|cleanup)_subkind_nic and (init|cleanup)_subkind_drbd_disk
> directly in checkpoint device. Move them to libxl_remus.c, Call them before
> calling libxl__checkpoint_devices_setup() or after calling
> libxl__checkpoint_devices_teardown().
> it is pure refactoring and no functional changes.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback
  2016-02-03 19:39   ` Wei Liu
@ 2016-02-04  5:17     ` Wen Congyang
  0 siblings, 0 replies; 56+ messages in thread
From: Wen Congyang @ 2016-02-04  5:17 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On 02/04/2016 03:39 AM, Wei Liu wrote:
> On Fri, Jan 29, 2016 at 01:27:17PM +0800, Wen Congyang wrote:
>> init stream {read/write} state checkpoint_callback in Remus setup callback.
>> There's no functional change, it's just refactoring so that we can move
>> all remus code into one file.
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  tools/libxl/libxl.c          |  2 ++
>>  tools/libxl/libxl_create.c   | 10 +++++++++-
>>  tools/libxl/libxl_dom.c      |  5 +----
>>  tools/libxl/libxl_internal.h |  4 ++++
>>  4 files changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index 94b5656..5346a0c 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -917,6 +917,8 @@ static void libxl__remus_setup(libxl__egc *egc,
>>      rds->domid = dss->domid;
>>      rds->callback = remus_setup_done;
>>  
>> +    dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
>> +
>>      libxl__remus_devices_setup(egc, rds);
>>      return;
>>  
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index e491d83..8b1efe5 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -718,6 +718,12 @@ static void remus_checkpoint_stream_done(
>>      libxl__xc_domain_saverestore_async_callback_done(egc, &stream->shs, rc);
>>  }
>>  
>> +static void libxl__remus_restore_setup(libxl__egc *egc,
>> +                                       libxl__domain_create_state *dcs)
>> +{
>> +    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
>> +}
>> +
>>  /*----- main domain creation -----*/
>>  
>>  /* We have a linear control flow; only one event callback is
>> @@ -1004,6 +1010,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>>      libxl__domain_build_state *const state = &dcs->build_state;
>>      libxl__srm_restore_autogen_callbacks *const callbacks =
>>          &dcs->srs.shs.callbacks.restore.a;
>> +    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
>>  
>>      if (rc) {
>>          domcreate_rebuild_done(egc, dcs, rc);
>> @@ -1042,9 +1049,10 @@ static void domcreate_bootloader_done(libxl__egc *egc,
> 
> A few lines above in this function, there is a line like:
> 
>     /* Restore */
>     callbacks->checkpoint = libxl__remus_domain_restore_checkpoint_callback;
> 
> Do you not need to move this into libxl__remus_restore_setup as well? As
> far as I can tell that's only useful for remus.
> 
>>      dcs->srs.fd = restore_fd;
>>      dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
>>      dcs->srs.completion_callback = domcreate_stream_done;
>> -    dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
>>  
>>      if (restore_fd >= 0) {
>> +        if (checkpointed_stream)
>> +            libxl__remus_restore_setup(egc, dcs);
>>          libxl__stream_read_start(egc, &dcs->srs);
>>          return;
>>      }
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 2269998..9e28bc4 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1569,8 +1569,6 @@ out:
>>  
>>  /*----- remus asynchronous checkpoint callback -----*/
>>  
>> -static void remus_checkpoint_stream_written(
>> -    libxl__egc *egc, libxl__stream_write_state *sws, int rc);
>>  static void remus_devices_commit_cb(libxl__egc *egc,
>>                                      libxl__remus_devices_state *rds,
>>                                      int rc);
>> @@ -1588,7 +1586,7 @@ static void libxl__remus_domain_save_checkpoint_callback(void *data)
>>      libxl__stream_write_start_checkpoint(egc, &dss->sws);
>>  }
>>  
>> -static void remus_checkpoint_stream_written(
>> +void remus_checkpoint_stream_written(
>>      libxl__egc *egc, libxl__stream_write_state *sws, int rc)
>>  {
>>      libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
>> @@ -1761,7 +1759,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
>>          callbacks->suspend = libxl__remus_domain_suspend_callback;
>>          callbacks->postcopy = libxl__remus_domain_resume_callback;
>>          callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;
> 
> Do you not want to move this to libxl__remus_setup?

I think so, and will fix these two in the next version.

Thanks
Wen Congyang

> 
> 
> Wei.
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc
  2016-02-03 19:40   ` Wei Liu
@ 2016-02-04  5:18     ` Wen Congyang
  0 siblings, 0 replies; 56+ messages in thread
From: Wen Congyang @ 2016-02-04  5:18 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On 02/04/2016 03:40 AM, Wei Liu wrote:
> On Fri, Jan 29, 2016 at 01:27:23PM +0800, Wen Congyang wrote:
>> Pass checkpointed_stream from libxl to libxc.
>> It won't affact legacy migration because legacy migration
>> won't use this param.
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> 
> With one nit below.
> 
>>  
>> -    if ( ctx->save.debug && !ctx->save.checkpointed )
>> +    if ( ctx->save.debug &&
>> +         ctx->save.checkpointed != MIG_STREAM_NONE )
> 
> You can fold this line to previous one.

Will fix it in the next version.

Thanks
Wen Congyang

> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state
  2016-02-03 19:40   ` Wei Liu
@ 2016-02-04  5:24     ` Wen Congyang
  2016-02-04  9:41       ` Wei Liu
  0 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-02-04  5:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Shriram Rajagopalan,
	Dong Eddie, Gui Jianfeng, Anthony Perard, Yang Hongyang

On 02/04/2016 03:40 AM, Wei Liu wrote:
> On Fri, Jan 29, 2016 at 01:27:24PM +0800, Wen Congyang wrote:
>> In normal migration, the qemu state is passed to qemu as a parameter.
>> With COLO, secondary vm is running. So we will do the following steps
>> at every checkpoint:
>> 1. suspend both primary vm and secondary vm
>> 2. sync the state
>> 3. resume both primary vm and secondary vm
>> Primary will send qemu's state in step2, and secondary's qemu should
>> read it and restore the state before it is resumed. We can not pass
>> the state to qemu as a parameter because secondary QEMU already started
>> at this point, so we introduce libxl__domain_restore_device_model() to
>> do it. This API MUST be called before resuming secondary vm.
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Cc: Anthony Perard <anthony.perard@citrix.com>
>> ---
>>  tools/libxl/libxl_dom_save.c | 20 ++++++++++++++++++++
>>  tools/libxl/libxl_internal.h |  4 ++++
>>  tools/libxl/libxl_qmp.c      | 10 ++++++++++
>>  3 files changed, 34 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
>> index cd2e7de..7383d2d 100644
>> --- a/tools/libxl/libxl_dom_save.c
>> +++ b/tools/libxl/libxl_dom_save.c
>> @@ -518,6 +518,26 @@ int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
>>      return rc;
>>  }
>>  
>> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
>> +                                       const char *restore_file)
>> +{
>> +    int rc;
>> +
>> +    switch (libxl__device_model_version_running(gc, domid)) {
>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
>> +        /* Will never be supported. */
>> +        rc = ERROR_INVAL;
>> +        break;
> 
> I'm not entirely sure if this statement would be true. The function name
> is generic enough to indicate this case should be supported.
> 
> However, this function is not used anywhere in this series, so I don't
> know whether my comment makes sense.
> 
> One way of moving forward is to stick this patch to COLO series itself.
> Let's skip this in this prerequisite series.

OK, I will put it in the COLO series itself.
This API is used for COLO, and COLO requries the newest qemu with block replication.
The block replication is still in the way. The tranditional qemu doesn't support
block replication and it is hard to backport it.

> 
>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
>> +        rc = libxl__qmp_restore(gc, domid, restore_file);
>> +        break;
>> +    default:
>> +        rc = ERROR_INVAL;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index fbd1acb..896c119 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -1117,6 +1117,8 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
>>                                   const char *old_name, const char *new_name,
>>                                   xs_transaction_t trans);
>>  
>> +_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
>> +                                               const char *restore_file);
>>  _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
>>  
>>  _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
>> @@ -1760,6 +1762,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
>>  _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
>>  /* Save current QEMU state into fd. */
>>  _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
>> +/* Load current QEMU state from fd. */
> 
> This comment is wrong, it loads QEMU state from file, not fd.

will fix it in the next version.

Thanks
Wen Congyang

> 
>> +_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
>>  /* Set dirty bitmap logging status */
>>  _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
>>  _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
>> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
>> index 714038b..eec8a44 100644
>> --- a/tools/libxl/libxl_qmp.c
>> +++ b/tools/libxl/libxl_qmp.c
>> @@ -905,6 +905,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
>>                             NULL, NULL);
>>  }
>>  
>> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
>> +{
>> +    libxl__json_object *args = NULL;
>> +
>> +    qmp_parameters_add_string(gc, &args, "filename", state_file);
>> +
>> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
>> +                           NULL, NULL);
>> +}
>> +
> 
> This looks correct FWIW.
> 
>>  static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
>>                        char *device, char *target, char *arg)
>>  {
>> -- 
>> 2.5.0
>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc
  2016-02-03 19:40   ` Wei Liu
@ 2016-02-04  5:28     ` Wen Congyang
  2016-02-04  9:25       ` Wei Liu
  0 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2016-02-04  5:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Gui Jianfeng,
	Shriram Rajagopalan, Ian Jackson, Yang Hongyang

On 02/04/2016 03:40 AM, Wei Liu wrote:
> On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote:
>> In COLO mode, both VMs are running, and are considered in sync if the
>> visible network traffic is identical.  After some time, they fall out of
>> sync.
>>
>> At this point, the two VMs have definitely diverged.  Lets call the
>> primary dirty bitmap set A, while the secondary dirty bitmap set B.
>>
>> Sets A and B are different.
>>
>> Under normal migration, the page data for set A will be sent from the
>> primary to the secondary.
>>
>> However, the set difference B - A (the one in B but not in A, lets
>> call this C) is out-of-date on the secondary (with respect to the
>> primary) and will not be sent by the primary (to secondary), as it
>> was not memory dirtied by the primary. The secondary needs C page data
>> to reconstruct an exact copy of the primary at the checkpoint.
>>
>> The secondary cannot calculate C as it doesn't know A.  Instead, the
>> secondary must send B to the primary, at which point the primary
>> calculates the union of A and B (lets call this D) which is all the
>> pages dirtied by both the primary and the secondary, and sends all page
>> data covered by D.
>>
>> In the general case, D is a superset of both A and B.  Without the
>> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
>> copy of the primary.
>>
>> We transfer the dirty bitmap on libxc side, so we need to introduce back
>> channel to libxc.
>>
>> Note: it is different from the paper. We change the original design to
>> the current one, according to our following concerns:
>> 1. The original design needs extra memory on Secondary host. When there's
>>    multiple backups on one host, the memory cost is high.
>> 2. The memory cache code will be another 1k+, it will make the review
>>    more time consuming.
>>
>> Note: the back channel will be used in the patch
> 
> "will not be used" ?
> 
> I don't see any read / write to the newly introduced fd.

It is used in COLO series.

Some patches in this series just introduce an API. Thess APIs will be used
in COLO series. Do you mean that these patches should be put in COLO series?
If so, I will check all patches.

> 
>>  libxc/restore: send dirty pfn list to primary when checkpoint under COLO
>> to send dirty pfn list from secondary to primary. The patch is posted in
>> another series.
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
> [...]
>>  
>>  /*----- helper execution -----*/
>> +static int dup_fd_helper(libxl__gc *gc, int fd, const char *what)
>> +{
>> +    int dup_fd = fd;
>> +
>> +    if (fd <= 2) {
>> +        dup_fd = dup(fd);
>> +        if (dup_fd < 0) {
>> +            LOGE(ERROR,"dup %s", what);
>> +            exit(-1);
>> +        }
>> +    }
>> +    libxl_fd_set_cloexec(CTX, dup_fd, 0);
>> +
>> +    return dup_fd;
>> +}
>>  
> 
> It would be better if introduction of this helper to be separated into a
> different patch.

OK, will fix it in the next version.

Thanks
Wen Congyang

> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests
  2016-02-03 19:40   ` Wei Liu
@ 2016-02-04  5:30     ` Wen Congyang
  0 siblings, 0 replies; 56+ messages in thread
From: Wen Congyang @ 2016-02-04  5:30 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Changlong Xie, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On 02/04/2016 03:40 AM, Wei Liu wrote:
> On Fri, Jan 29, 2016 at 01:27:21PM +0800, Wen Congyang wrote:
>> Before this patch:
>> 1. suspend
>> a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
>>    request to the guest). If the guest doesn't support evtchn, the xenstore
>>    variant will be used, suspending the guest via XenBus control node.
>> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
>>    the guest
>>
>> 2. Resume:
>> a. fast path(fast=1)
>>    Do not change the guest state. We call libxl__domain_resume(.., 1) which
>>    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
>>    PV:       modify the return code to 1, and than call the domctl:
>>              XEN_DOMCTL_resumedomain
>>    PVHVM:    same with PV
>>    pure HVM: do nothing in modify_returncode, and than call the domctl:
> 
> "then"
> 
>>              XEN_DOMCTL_resumedomain
>> b. slow
>>    Used when the guest's state have been changed. Will call
>>    libxl__domain_resume(..., 0) to resume the guest.
>>    PV:       update start info, and reset all secondary CPU states. Than call
>>              the domctl: XEN_DOMCTL_resumedomain
>>    PVHVM:    can not be resumed. You will get the following error message:
>>                  "Cannot resume uncooperative HVM guests"
>>    purt HVM: same with PVHVM
> 
> "pure"
> 
>>
>> After this patch:
>> 1. suspend
>>    unchanged
>>
>> 2. Resume
>> a. fast path:
>>    unchanged
>> b. slow
>>    PV:       unchanged
>>    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
>>              don't modify the return code, the PV driver will disconnect
>>              and reconnect.
>>              The guest ends up doing the XENMAPSPACE_shared_info
>>              XENMEM_add_to_physmap hypercall and resetting all of its CPU
>>              states to point to the shared_info(well except the ones past 32).
>>              That is the Linux kernel does that - regardless whether the
>>              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
>>    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> 
> In summary, this patch only changes slow path resume. Further more, it
> only affects PVHVM and pure HVM variants.
> 
> With you patch, pure HVM is able to resume with effectively the same
> path via XEN_DOMCTL_resumedomain, albeit it is done in two functions
> (_cooperative and _any).
> 
> And according to the recently change in documentation, slow path is
> always safe.
> 
> I think the commit message can be simplified a bit. This is assuming
> using XEN_DOMCTL_resumedomain to resume (PV)HVM in slow path is safe.
> 
> ===
> 
> Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> 
> Previously it was not possible to resume PVHVM or pure HVM guest in slow
> path because libxc didn't support that.
> 
> Using XEN_DOMCTL_resumedomain without modifying guest state  to resume a
> guest is considered to be always safe.  Introduce a function to do that
> for (PV)HVM guests in slow path resume.
> 
> This patch fixes a bug that denies (PV)HVM slow path resume.  This will
> enable COLO to work properly:  COLO requires HVM guest to start in the
> new context that has been set up by COLO, hence slow path resume is
> required.
> 
> ===
> 
> Does this sound right? Especially the wording about safety.

It sounds right.

Thanks
Wen Congyang

> 
> Ian and Ian, you seemed to have suggested Congyang to write the above
> commit message. What do you think about my updated one?
> 
>>
>> Under COLO, we will update the guest's state(modify memory, cpu's registers,
>> device status...). In this case, we cannot use the fast path to resume it.
>> Keep the return code 0, and use a slow path to resume the guest. While
>> resuming HVM using slow path is not supported currently, this patch is to
>> make the resume call to not fail.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> ---
>>  tools/libxc/xc_resume.c | 25 +++++++++++++++++++++----
>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index 87d4324..4a9b035 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -108,6 +108,26 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>>      return do_domctl(xch, &domctl);
>>  }
>>  
>> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
>> +{
>> +    DECLARE_DOMCTL;
>> +
>> +    /*
>> +     * The domctl XEN_DOMCTL_resumedomain unpause each vcpu. After
>> +     * the domctl, the guest will run.
>> +     *
>> +     * If it is PVHVM, the guest called the hypercall
>> +     *    SCHEDOP_shutdown:SHUTDOWN_suspend
>> +     * to suspend itself. We don't modify the return code, so the PV driver
>> +     * will disconnect and reconnect.
>> +     *
>> +     * If it is a HVM, the guest will continue running.
>> +     */
>> +    domctl.cmd = XEN_DOMCTL_resumedomain;
>> +    domctl.domain = domid;
>> +    return do_domctl(xch, &domctl);
>> +}
>> +
>>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>  {
>>      DECLARE_DOMCTL;
>> @@ -137,10 +157,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>       */
>>  #if defined(__i386__) || defined(__x86_64__)
>>      if ( info.hvm )
>> -    {
>> -        ERROR("Cannot resume uncooperative HVM guests");
>> -        return rc;
>> -    }
>> +        return xc_domain_resume_hvm(xch, domid);
>>  
>>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>>      {
>> -- 
>> 2.5.0
>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc
  2016-02-04  5:28     ` Wen Congyang
@ 2016-02-04  9:25       ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-04  9:25 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Thu, Feb 04, 2016 at 01:28:14PM +0800, Wen Congyang wrote:
> On 02/04/2016 03:40 AM, Wei Liu wrote:
> > On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote:
> >> In COLO mode, both VMs are running, and are considered in sync if the
> >> visible network traffic is identical.  After some time, they fall out of
> >> sync.
> >>
> >> At this point, the two VMs have definitely diverged.  Lets call the
> >> primary dirty bitmap set A, while the secondary dirty bitmap set B.
> >>
> >> Sets A and B are different.
> >>
> >> Under normal migration, the page data for set A will be sent from the
> >> primary to the secondary.
> >>
> >> However, the set difference B - A (the one in B but not in A, lets
> >> call this C) is out-of-date on the secondary (with respect to the
> >> primary) and will not be sent by the primary (to secondary), as it
> >> was not memory dirtied by the primary. The secondary needs C page data
> >> to reconstruct an exact copy of the primary at the checkpoint.
> >>
> >> The secondary cannot calculate C as it doesn't know A.  Instead, the
> >> secondary must send B to the primary, at which point the primary
> >> calculates the union of A and B (lets call this D) which is all the
> >> pages dirtied by both the primary and the secondary, and sends all page
> >> data covered by D.
> >>
> >> In the general case, D is a superset of both A and B.  Without the
> >> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
> >> copy of the primary.
> >>
> >> We transfer the dirty bitmap on libxc side, so we need to introduce back
> >> channel to libxc.
> >>
> >> Note: it is different from the paper. We change the original design to
> >> the current one, according to our following concerns:
> >> 1. The original design needs extra memory on Secondary host. When there's
> >>    multiple backups on one host, the memory cost is high.
> >> 2. The memory cache code will be another 1k+, it will make the review
> >>    more time consuming.
> >>
> >> Note: the back channel will be used in the patch
> > 
> > "will not be used" ?
> > 
> > I don't see any read / write to the newly introduced fd.
> 
> It is used in COLO series.
> 
> Some patches in this series just introduce an API. Thess APIs will be used
> in COLO series. Do you mean that these patches should be put in COLO series?
> If so, I will check all patches.
> 

Fine by me.

Wei.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state
  2016-02-04  5:24     ` Wen Congyang
@ 2016-02-04  9:41       ` Wei Liu
  2016-02-04  9:46         ` Wei Liu
  0 siblings, 1 reply; 56+ messages in thread
From: Wei Liu @ 2016-02-04  9:41 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Anthony Perard,
	Dong Eddie, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

On Thu, Feb 04, 2016 at 01:24:41PM +0800, Wen Congyang wrote:
> On 02/04/2016 03:40 AM, Wei Liu wrote:
> > On Fri, Jan 29, 2016 at 01:27:24PM +0800, Wen Congyang wrote:
> >> In normal migration, the qemu state is passed to qemu as a parameter.
> >> With COLO, secondary vm is running. So we will do the following steps
> >> at every checkpoint:
> >> 1. suspend both primary vm and secondary vm
> >> 2. sync the state
> >> 3. resume both primary vm and secondary vm
> >> Primary will send qemu's state in step2, and secondary's qemu should
> >> read it and restore the state before it is resumed. We can not pass
> >> the state to qemu as a parameter because secondary QEMU already started
> >> at this point, so we introduce libxl__domain_restore_device_model() to
> >> do it. This API MUST be called before resuming secondary vm.
> >>
> >> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Cc: Anthony Perard <anthony.perard@citrix.com>
> >> ---
> >>  tools/libxl/libxl_dom_save.c | 20 ++++++++++++++++++++
> >>  tools/libxl/libxl_internal.h |  4 ++++
> >>  tools/libxl/libxl_qmp.c      | 10 ++++++++++
> >>  3 files changed, 34 insertions(+)
> >>
> >> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
> >> index cd2e7de..7383d2d 100644
> >> --- a/tools/libxl/libxl_dom_save.c
> >> +++ b/tools/libxl/libxl_dom_save.c
> >> @@ -518,6 +518,26 @@ int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
> >>      return rc;
> >>  }
> >>  
> >> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
> >> +                                       const char *restore_file)
> >> +{
> >> +    int rc;
> >> +
> >> +    switch (libxl__device_model_version_running(gc, domid)) {
> >> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> >> +        /* Will never be supported. */
> >> +        rc = ERROR_INVAL;
> >> +        break;
> > 
> > I'm not entirely sure if this statement would be true. The function name
> > is generic enough to indicate this case should be supported.
> > 
> > However, this function is not used anywhere in this series, so I don't
> > know whether my comment makes sense.
> > 
> > One way of moving forward is to stick this patch to COLO series itself.
> > Let's skip this in this prerequisite series.
> 
> OK, I will put it in the COLO series itself.
> This API is used for COLO, and COLO requries the newest qemu with block replication.
> The block replication is still in the way. The tranditional qemu doesn't support
> block replication and it is hard to backport it.
> 

OK. I'm asking you to support qemu-trad in COLO -- definitely not.

What I was getting at was: this function has a very generic name, which
suggests it will be used to consolidate some code inside libxl. I was
confused because it didn't handle the qemu-trad path (after all,
restoring device model is something you can do with qemu-trad).

As discussed, this patch will be moved to COLO series, let's discuss
that when you post this again.

Wei.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state
  2016-02-04  9:41       ` Wei Liu
@ 2016-02-04  9:46         ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2016-02-04  9:46 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Lars Kurth, Changlong Xie, Wei Liu, Ian Campbell, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen devel, Anthony Perard,
	Dong Eddie, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

On Thu, Feb 04, 2016 at 09:41:54AM +0000, Wei Liu wrote:
> On Thu, Feb 04, 2016 at 01:24:41PM +0800, Wen Congyang wrote:
> > On 02/04/2016 03:40 AM, Wei Liu wrote:
> > > On Fri, Jan 29, 2016 at 01:27:24PM +0800, Wen Congyang wrote:
> > >> In normal migration, the qemu state is passed to qemu as a parameter.
> > >> With COLO, secondary vm is running. So we will do the following steps
> > >> at every checkpoint:
> > >> 1. suspend both primary vm and secondary vm
> > >> 2. sync the state
> > >> 3. resume both primary vm and secondary vm
> > >> Primary will send qemu's state in step2, and secondary's qemu should
> > >> read it and restore the state before it is resumed. We can not pass
> > >> the state to qemu as a parameter because secondary QEMU already started
> > >> at this point, so we introduce libxl__domain_restore_device_model() to
> > >> do it. This API MUST be called before resuming secondary vm.
> > >>
> > >> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > >> Cc: Anthony Perard <anthony.perard@citrix.com>
> > >> ---
> > >>  tools/libxl/libxl_dom_save.c | 20 ++++++++++++++++++++
> > >>  tools/libxl/libxl_internal.h |  4 ++++
> > >>  tools/libxl/libxl_qmp.c      | 10 ++++++++++
> > >>  3 files changed, 34 insertions(+)
> > >>
> > >> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
> > >> index cd2e7de..7383d2d 100644
> > >> --- a/tools/libxl/libxl_dom_save.c
> > >> +++ b/tools/libxl/libxl_dom_save.c
> > >> @@ -518,6 +518,26 @@ int libxl__restore_emulator_xenstore_data(libxl__domain_create_state *dcs,
> > >>      return rc;
> > >>  }
> > >>  
> > >> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid,
> > >> +                                       const char *restore_file)
> > >> +{
> > >> +    int rc;
> > >> +
> > >> +    switch (libxl__device_model_version_running(gc, domid)) {
> > >> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> > >> +        /* Will never be supported. */
> > >> +        rc = ERROR_INVAL;
> > >> +        break;
> > > 
> > > I'm not entirely sure if this statement would be true. The function name
> > > is generic enough to indicate this case should be supported.
> > > 
> > > However, this function is not used anywhere in this series, so I don't
> > > know whether my comment makes sense.
> > > 
> > > One way of moving forward is to stick this patch to COLO series itself.
> > > Let's skip this in this prerequisite series.
> > 
> > OK, I will put it in the COLO series itself.
> > This API is used for COLO, and COLO requries the newest qemu with block replication.
> > The block replication is still in the way. The tranditional qemu doesn't support
> > block replication and it is hard to backport it.
> > 
> 
> OK. I'm asking you to support qemu-trad in COLO -- definitely not.
         ^ NOT

(Not enough caffeine in the morning, sorry.)

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-02-04  9:46 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29  5:27 [PATCH v7 00/18] Prerequisite patches for COLO Wen Congyang
2016-01-29  5:27 ` [PATCH v7 01/18] libxl/remus: init checkpoint_callback in Remus setup callback Wen Congyang
2016-02-03 19:39   ` Wei Liu
2016-02-04  5:17     ` Wen Congyang
2016-01-29  5:27 ` [PATCH v7 02/18] tools/libxl: move remus code into libxl_remus.c Wen Congyang
2016-01-29 16:29   ` Konrad Rzeszutek Wilk
2016-02-03 19:39   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 03/18] tools/libxl: move save/restore code into libxl_dom_save.c Wen Congyang
2016-01-29 16:30   ` Konrad Rzeszutek Wilk
2016-02-03 19:39   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 04/18] libxl/save: Refactor libxl__domain_suspend_state Wen Congyang
2016-01-29 16:31   ` Konrad Rzeszutek Wilk
2016-02-03 19:39   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 05/18] tools/libxc: support to resume uncooperative HVM guests Wen Congyang
2016-01-29 16:30   ` Konrad Rzeszutek Wilk
2016-02-03 19:40   ` Wei Liu
2016-02-04  5:30     ` Wen Congyang
2016-01-29  5:27 ` [PATCH v7 06/18] tools/libxl: introduce enum type libxl_checkpointed_stream Wen Congyang
2016-01-29 16:34   ` Konrad Rzeszutek Wilk
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 07/18] migration/save: pass checkpointed_stream from libxl to libxc Wen Congyang
2016-01-29 16:35   ` Konrad Rzeszutek Wilk
2016-02-03 19:40   ` Wei Liu
2016-02-04  5:18     ` Wen Congyang
2016-01-29  5:27 ` [PATCH v7 08/18] tools/libxl: introduce libxl__domain_restore_device_model to load qemu state Wen Congyang
2016-01-29 16:34   ` Konrad Rzeszutek Wilk
2016-02-03 19:40   ` Wei Liu
2016-02-04  5:24     ` Wen Congyang
2016-02-04  9:41       ` Wei Liu
2016-02-04  9:46         ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 09/18] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Wen Congyang
2016-01-29 16:34   ` Konrad Rzeszutek Wilk
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 10/18] tools/libxl: export logdirty_init Wen Congyang
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 11/18] tools/libxl: Add back channel to allow migration target send data back Wen Congyang
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Wen Congyang
2016-01-29 16:38   ` Konrad Rzeszutek Wilk
2016-02-01  5:39     ` Wen Congyang
2016-02-03 19:40   ` Wei Liu
2016-02-04  5:28     ` Wen Congyang
2016-02-04  9:25       ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 13/18] tools/libxl: rename remus device to checkpoint device Wen Congyang
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 14/18] tools/libxl: fix backword compatibility after the automatic renaming Wen Congyang
2016-01-29 16:32   ` Konrad Rzeszutek Wilk
2016-01-29  5:27 ` [PATCH v7 15/18] tools/libxl: adjust the indentation Wen Congyang
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 16/18] tools/libxl: store remus_ops in checkpoint device state Wen Congyang
2016-02-03 19:40   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 17/18] tools/libxl: move remus state into a seperate structure Wen Congyang
2016-02-03 19:41   ` Wei Liu
2016-01-29  5:27 ` [PATCH v7 18/18] tools/libxl: seperate device init/cleanup from checkpoint device layer Wen Congyang
2016-02-03 19:41   ` Wei Liu
2016-01-29 16:43 ` [PATCH v7 00/18] Prerequisite patches for COLO Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.