xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2016-03-25  6:44 Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state Changlong Xie
                   ` (28 more replies)
  0 siblings, 29 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

This patchset implemented the COLO feature for Xen.
For detail/install/use of COLO feature, refer to:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

You can get the codes from here:
https://github.com/Pating/xen/tree/changlox/colo_v13

Changlog from v12 to v13
1. Rebase to the upstream xen
2. Address commnets from Ian and Liu Wei.
p7, Add A-B
p8, Add A-B
p10, Add A-B
p11, Add A-B
p12, Add LOG(ERROR, ) 
p13, Add A-B
p14, Remove libxl__ao_complete(xxx)
p15, Add A-B
p16, Add A-B
p17, Add A-B, replace "-c" with "--colo" for migrate-receive()
p19, Add A-B, introduce "switch ... case ..." 
p21, Add A-B
p22, Add A-B
p23, replace "forwarddev" with "coloft_fowarddev" 
p24, Add A-B
p25, Add A-B
p26, replace "--script" with "--coloft-script" 

Changlog from v11 to v12
1. Rebase to the upstream xen
2. Address commnets from Ian, Liu Wei and Konard.
Removed old p12,p13; introduce a new p13 what is splited out from old p15, introduce
a new p19 what is splited out from old p20.
p1, add A-B, and will update commit message when "xen-load-devices-state" relevant
patch merged on qemu side
p3, update comments, add assert() in libxl_domain_create_restore() 
p4, rename "dup_fd_helper" as "dup_cloexec", add missed newline
p5, add A-B
p7, remove repeated commit message, update the specification of libxl 
p8, update the specification of libxc 
p9, add A-B
p10, update commit message, fix blank line issue
p12, merged by old p12,p13(restore_callbacks wait_checkpoit/postcopy/suspend), fix blank
line issues, update comments about why COLO only supports HVM 
p13, move stream read manipulations to right place in libxl_internal.h  
p14, merged by old p12(save_callbacks wait_checkpoint), fix blank line issues, update Copyright(C)
p16, add "colo_" prefix for merge_secondary_dirty_bitmap()
p17, update COLO description part on man page
p18, fix long line issue
p19, just introduce colo mode and refactor relevant functions
p20, fix repetitive code in libxl__device_disk_from_xs_be(), make colo_port as int,  
remove unnecessary comments in libxl__build_device_model_args_new(), simplify 
disk_try_backend() and move the main part to in colo_qdisk_setup() in p21 
p21, fix blank line issue, update Copyright(C)
p22, merged by old p22,p23, update Copyright(C), add commets for NETLINK_COLO, remove unnecessary
'{ }', update url in commit message 
p23, fix blank line issue, add some comments for "forwarddev", update Copyright(C)
p24, introduce COLO_PROXY_CHECKPOINT_TIMEOUT, ASYNC_CALL 
p26, move colo_proxy_script setup codes to libxl__colo_restore_setup(), introduce long options
for main_migrate_receive() 

Changlog from v10 to v11
1. Rebased to then upstream xen
2. Address comments from Liu Wei 
p1, update commit message and remove libxl__domain_restore_device_model
p4, add A-B
p5, update commit message
p6, add A-B
p7,p8 add email address and direction info
p10, merged by old p10,p11 and update comments
p11, merged by old p12,p13 and update comments
p14,p15 move colo structures and functions into libxl_colo.h, and list callbacks
in order, also update commit message
p16, merged by old p18,p19,p20 and remove TODOs
p17, use original code for checking postcopy return value
p18, simplify *if* logic, fix wrong comments, and unset dom_info.quiet in COLO 
p19, add A-B 
p20, fix code style, update comments and man page 
p21,p22,p23,p24 move colo structures and functions into libxl_colo.h 

Changlog from v9 to v10
1. Rebased to the upstream xen
2. Fix one bug found in the test
3. Merge some patches from prepare series
4. Split patch 5 to two patches(patch 4 and 5) according to the comments from
   Wei Liu

Changlog from v8 to v9:
1. Rebased to the upstream xen
2. Fix some bugs found in the test

Changelog from v7 to v8:
1. Rebased to the latest libxl migration v2.

Changelog from v6 to v7:
1. Ported to Libxl migration v2
2. Send dirty bitmap from secondary to primary on libxc side
3. Address review comments

Changelog from v5 to v6:
1. based on migration v2(libxc)
2. split the patchset into prerequisite patchset and this main patchset.

Changelog from v4 to v5:
1. rebase to the latest xen upstream
2. disk replication: blktap2->qdisk
3. nic replication: colo-agent->colo-proxy

Changelog from v3 to v4:
1. rebase to newest xen
2. bug fix

Changlog from v2 to v3:
1. rebase to newest remus
2. add nic replication support

Changlog from v1 to v2:
1. rebase to newest remus
2. add disk replication support

Changlong Xie (2):
  libxl_internal: move stream read manipulations to right place
  Introduce COLO mode and refactor relevant function

Wen Congyang (24):
  tools/libxl: introduction of libxl__qmp_restore to load qemu state
  tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  tools/libxl: Add back channel to allow migration target send data back
  tools/libxl: Introduce new helper function dup_fd_helper()
  tools/libx{l,c}: add back channel to libxc
  docs: add colo readme
  docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo
    streams
  libxc/migration: Specification update for DIRTY_PFN_LIST records
  libxc/migration: export read_record for common use
  tools/libxl: add back channel support to write stream
  tools/libxl: add back channel support to read stream
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/resume/checkpoint code
  libxc/restore: support COLO restore
  libxc/save: support COLO save
  implement the cmdline for COLO
  COLO: introduce new API to prepare/start/do/get_error/stop replication
  Support colo mode for qemu disk
  COLO: use qemu block replication
  COLO proxy: implement setup/teardown/preresume/postresume/checkpoint
  COLO nic: implement COLO nic subkind
  setup and control colo proxy on primary side
  setup and control colo proxy on secondary side
  cmdline switches and config vars to control colo-proxy

 docs/README.colo                         |    9 +
 docs/man/xl.conf.pod.5                   |    6 +
 docs/man/xl.pod.1                        |   48 +-
 docs/misc/xl-disk-configuration.txt      |   53 ++
 docs/specs/libxc-migration-stream.pandoc |   27 +-
 docs/specs/libxl-migration-stream.pandoc |   59 +-
 tools/hotplug/Linux/Makefile             |    1 +
 tools/hotplug/Linux/colo-proxy-setup     |  135 ++++
 tools/libxc/include/xenguest.h           |   41 +-
 tools/libxc/xc_nomigrate.c               |    4 +-
 tools/libxc/xc_sr_common.c               |   80 ++-
 tools/libxc/xc_sr_common.h               |   24 +-
 tools/libxc/xc_sr_restore.c              |  246 +++++--
 tools/libxc/xc_sr_save.c                 |  100 ++-
 tools/libxc/xc_sr_stream_format.h        |   31 +-
 tools/libxl/Makefile                     |    4 +
 tools/libxl/libxl.c                      |   87 ++-
 tools/libxl/libxl.h                      |   29 +-
 tools/libxl/libxl_colo.h                 |  143 ++++
 tools/libxl/libxl_colo_nic.c             |  320 +++++++++
 tools/libxl/libxl_colo_proxy.c           |  277 ++++++++
 tools/libxl/libxl_colo_qdisk.c           |  230 +++++++
 tools/libxl/libxl_colo_restore.c         | 1087 ++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_save.c            |  696 +++++++++++++++++++
 tools/libxl/libxl_create.c               |   90 ++-
 tools/libxl/libxl_device.c               |   11 +
 tools/libxl/libxl_dm.c                   |  176 ++++-
 tools/libxl/libxl_dom_save.c             |  103 +--
 tools/libxl/libxl_internal.h             |  216 ++++--
 tools/libxl/libxl_qmp.c                  |  106 +++
 tools/libxl/libxl_remus_disk_drbd.c      |   38 +-
 tools/libxl/libxl_save_callout.c         |   53 +-
 tools/libxl/libxl_save_helper.c          |    8 +-
 tools/libxl/libxl_save_msgs_gen.pl       |   13 +-
 tools/libxl/libxl_sr_stream_format.h     |   11 +
 tools/libxl/libxl_stream_read.c          |  106 ++-
 tools/libxl/libxl_stream_write.c         |  100 ++-
 tools/libxl/libxl_types.idl              |   11 +
 tools/libxl/libxlu_disk_l.l              |   17 +
 tools/libxl/xl.c                         |    3 +
 tools/libxl/xl.h                         |    1 +
 tools/libxl/xl_cmdimpl.c                 |  109 ++-
 tools/libxl/xl_cmdtable.c                |    4 +-
 tools/ocaml/libs/xl/xenlight_stubs.c     |    2 +-
 tools/python/xen/migration/libxc.py      |   68 +-
 tools/python/xen/migration/libxl.py      |    9 +
 46 files changed, 4618 insertions(+), 374 deletions(-)
 create mode 100644 docs/README.colo
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_nic.c
 create mode 100644 tools/libxl/libxl_colo_proxy.c
 create mode 100644 tools/libxl/libxl_colo_qdisk.c
 create mode 100644 tools/libxl/libxl_colo_restore.c
 create mode 100644 tools/libxl/libxl_colo_save.c

-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v13 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 02/26] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Changlong Xie
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

In normal migration, the qemu state is passed to qemu as a parameter.
With COLO, secondary vm is running. So we will do the following steps
at every checkpoint:
1. suspend both primary vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
Primary will send qemu's state in step2, and secondary's qemu should
read it and restore the state before it is resumed. We can not pass the
state to qemu as a parameter because secondary QEMU is already started
at this point, so we introduce libxl__qmp_restore() to do it.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_qmp.c      | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 345a764..fc3426a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1765,6 +1765,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
 _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
 /* Save current QEMU state into fd. */
 _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
+/* Load current QEMU state from file. */
+_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
 /* Set dirty bitmap logging status */
 _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index c45702e..c0bdfcb 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -906,6 +906,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
                            NULL, NULL);
 }
 
+int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "filename", state_file);
+
+    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
+                           NULL, NULL);
+}
+
 static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
                       char *device, char *target, char *arg)
 {
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 02/26] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back Changlong Xie
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Secondary vm is running in COLO mode, we need to send secondary
vm's dirty page information to primary host at checkpoint, so we
have to enable qemu logdirty on secondary.

libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses libxl__domain_save_state, and calls
libxl__xc_domain_saverestore_async_callback_done() before exits.
This can not be used for secondary vm.

Update libxl__domain_suspend_common_switch_qemu_logdirty() to
introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds->callback before exits. This new API will be used by the patch:
  secondary vm suspend/resume/checkpoint codes

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_dom_save.c | 95 ++++++++++++++++++++++++--------------------
 tools/libxl/libxl_internal.h |  8 ++++
 2 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index f3288b9..4df86c3 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -42,7 +42,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_save_state *dss, int rc);
+                                 libxl__logdirty_switch *lds, int rc);
 
 void libxl__logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -52,13 +52,10 @@ void libxl__logdirty_init(libxl__logdirty_switch *lds)
 }
 
 static void domain_suspend_switch_qemu_xen_traditional_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                               (libxl__egc *egc, int domid, unsigned enable,
+                                libxl__logdirty_switch *lds)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_save_state *dss = shs->caller_state;
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
     xs_transaction_t t = 0;
     const char *got;
@@ -120,26 +117,34 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
     LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
     libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,rc);
+    switch_logdirty_done(egc,lds,rc);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                               (libxl__egc *egc, int domid, unsigned enable,
+                                libxl__logdirty_switch *lds)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_save_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
 
     rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
-    if (!rc) {
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
-    } else {
+    if (rc)
         LOG(ERROR,"logdirty switch failed (rc=%d), abandoning suspend",rc);
+
+    lds->callback(egc, lds, rc);
+}
+
+static void domain_suspend_switch_qemu_logdirty_done
+                        (libxl__egc *egc, libxl__logdirty_switch *lds, int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(lds, *dss, logdirty);
+
+    if (rc) {
         dss->rc = rc;
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
-    }
+        libxl__xc_domain_saverestore_async_callback_done(egc,
+                                                         &dss->sws.shs, -1);
+    } else
+        libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
 }
 
 void libxl__domain_suspend_common_switch_qemu_logdirty
@@ -148,42 +153,52 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
     libxl__save_helper_state *shs = user;
     libxl__egc *egc = shs->egc;
     libxl__domain_save_state *dss = shs->caller_state;
-    STATE_AO_GC(dss->ao);
+
+    /* Convenience aliases. */
+    libxl__logdirty_switch *const lds = &dss->logdirty;
+
+    lds->callback = domain_suspend_switch_qemu_logdirty_done;
+    libxl__domain_common_switch_qemu_logdirty(egc, domid, enable, lds);
+}
+
+void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+                                               int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds)
+{
+    STATE_AO_GC(lds->ao);
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_traditional_logdirty(egc, domid, enable,
+                                                            lds);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_logdirty(egc, domid, enable, lds);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_NONE:
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        lds->callback(egc, lds, 0);
         break;
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, abandoning suspend");
-        dss->rc = ERROR_FAIL;
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, ERROR_FAIL);
     }
 }
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
                                     const struct timeval *requested_abs,
                                     int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(ev, *lds, timeout);
+    STATE_AO_GC(lds->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,ERROR_FAIL);
+    switch_logdirty_done(egc,lds,ERROR_FAIL);
 }
 
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
                             const char *watch_path, const char *event_path)
 {
-    libxl__domain_save_state *dss =
-        CONTAINER_OF(watch, *dss, logdirty.watch);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(watch, *lds, watch);
+    STATE_AO_GC(lds->ao);
     const char *got;
     xs_transaction_t t = 0;
     int rc;
@@ -229,28 +244,20 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
     if (rc <= 0) {
         if (rc < 0)
             LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,rc);
+        switch_logdirty_done(egc,lds,rc);
     }
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_save_state *dss,
+                                 libxl__logdirty_switch *lds,
                                  int rc)
 {
-    STATE_AO_GC(dss->ao);
-    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(lds->ao);
 
     libxl__ev_xswatch_deregister(gc, &lds->watch);
     libxl__ev_time_deregister(gc, &lds->timeout);
 
-    int broke;
-    if (rc) {
-        broke = -1;
-        dss->rc = rc;
-    } else {
-        broke = 0;
-    }
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, broke);
+    lds->callback(egc, lds, rc);
 }
 
 /*----- callbacks, called by xc_domain_save -----*/
@@ -347,6 +354,8 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
 
     dss->rc = 0;
     libxl__logdirty_init(&dss->logdirty);
+    dss->logdirty.ao = ao;
+
     dsps->ao = ao;
     dsps->domid = domid;
     rc = libxl__domain_suspend_init(egc, dsps, type);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fc3426a..de65d85 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3104,6 +3104,11 @@ libxl__stream_write_inuse(const libxl__stream_write_state *stream)
 }
 
 typedef struct libxl__logdirty_switch {
+    /* Set by caller of libxl__domain_common_switch_qemu_logdirty */
+    libxl__ao *ao;
+    void (*callback)(libxl__egc *egc, struct libxl__logdirty_switch *lds,
+                     int rc);
+
     const char *cmd;
     const char *cmd_path;
     const char *ret_path;
@@ -3530,6 +3535,9 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
+_hidden void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+                                               int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds);
 _hidden int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
                                                char **buf, uint32_t *len);
 _hidden int libxl__restore_emulator_xenstore_data
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 02/26] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-04-04 12:07   ` Olaf Hering
  2016-03-25  6:44 ` [PATCH v13 04/26] tools/libxl: Introduce new helper function dup_fd_helper() Changlong Xie
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

In COLO mode, secondary needs to send the following data to primary:
1. In libxl
   Secondary sends the following CHECKPOINT_CONTEXT to primary:
   CHECKPOINT_SVM_SUSPENDED, CHECKPOINT_SVM_READY and CHECKPOINT_SVM_RESUMED
2. In libxc
   Secondary sends the dirty pfn list to primary

But the io_fd only can be written in primary, and only can be read in
secondary. Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state. Extend libxl_domain_create_restore API, add a
send_fd param to it. Add LIBXL_HAVE_CREATE_RESTORE_SEND_FD to indicate
the API change.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl.c                  |  3 +--
 tools/libxl/libxl.h                  | 29 ++++++++++++++++++++++++++++-
 tools/libxl/libxl_create.c           | 11 +++++++----
 tools/libxl/libxl_internal.h         |  2 ++
 tools/libxl/xl_cmdimpl.c             |  8 +++++++-
 tools/ocaml/libs/xl/xenlight_stubs.c |  2 +-
 6 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3471c4c..6bc46cb 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -834,7 +834,6 @@ out:
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_save_state *dss, int rc);
 
-/* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
                              uint32_t domid, int send_fd, int recv_fd,
                              const libxl_asyncop_how *ao_how)
@@ -871,7 +870,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->callback = remus_failover_cb;
     dss->domid = domid;
     dss->fd = send_fd;
-    /* TODO do something with recv_fd */
+    dss->recv_fd = recv_fd;
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index f61bc4b..a569286 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -651,6 +651,15 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
 
 /*
+ * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_BACK_FD 1
+ *
+ * If this is defined, libxl_domain_create_restore()'s API includes the
+ * send_back_fd param. This is used only with COLO, for the libxl migration
+ * back channel; other callers should pass -1.
+ */
+#define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_BACK_FD 1
+
+/*
  * LIBXL_HAVE_CREATEINFO_PVH
  * If this is defined, then libxl supports creation of a PVH guest.
  */
@@ -1177,6 +1186,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
+                                int send_back_fd,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
@@ -1197,7 +1207,7 @@ int static inline libxl_domain_create_restore_0x040200(
     libxl_domain_restore_params_init(&params);
 
     ret = libxl_domain_create_restore(
-        ctx, d_config, domid, restore_fd, &params, ao_how, aop_console_how);
+        ctx, d_config, domid, restore_fd, -1, &params, ao_how, aop_console_how);
 
     libxl_domain_restore_params_dispose(&params);
     return ret;
@@ -1205,6 +1215,23 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #define libxl_domain_create_restore libxl_domain_create_restore_0x040200
 
+#elif defined(LIBXL_API_VERSION) && LIBXL_API_VERSION >= 0x040400 \
+                                 && LIBXL_API_VERSION < 0x040700
+
+int static inline libxl_domain_create_restore_0x040400(
+    libxl_ctx *ctx, libxl_domain_config *d_config,
+    uint32_t *domid, int restore_fd,
+    const libxl_domain_restore_params *params,
+    const libxl_asyncop_how *ao_how,
+    const libxl_asyncprogress_how *aop_console_how)
+    LIBXL_EXTERNAL_CALLERS_ONLY
+{
+    return libxl_domain_create_restore(ctx, d_config, domid, restore_fd,
+                                       -1, params, ao_how, aop_console_how);
+}
+
+#define libxl_domain_create_restore libxl_domain_create_restore_0x040400
+
 #endif
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 61b5c01..09f2f13 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1639,7 +1639,7 @@ static void domain_create_cb(libxl__egc *egc,
                              int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-                            uint32_t *domid, int restore_fd,
+                            uint32_t *domid, int restore_fd, int send_back_fd,
                             const libxl_domain_restore_params *params,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
@@ -1654,6 +1654,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
+    cdcs->dcs.send_back_fd = send_back_fd;
     if (restore_fd > -1) {
         cdcs->dcs.restore_params = *params;
         rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
@@ -1832,18 +1833,20 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
                             ao_how, aop_console_how);
 }
 
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
+                                int send_back_fd,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, restore_fd, params,
-                            ao_how, aop_console_how);
+    assert(send_back_fd == -1);
+    return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
+                            params, ao_how, aop_console_how);
 }
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index de65d85..42d8bbd 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3150,6 +3150,7 @@ struct libxl__domain_save_state {
     uint32_t domid;
     int fd;
     int fdfl; /* original flags on fd */
+    int recv_fd;
     libxl_domain_type type;
     int live;
     int debug;
@@ -3491,6 +3492,7 @@ struct libxl__domain_create_state {
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd, libxc_fd;
     int restore_fdfl; /* original flags of restore_fd */
+    int send_back_fd;
     libxl_domain_restore_params restore_params;
     uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index a3610fc..2e64f44 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -159,6 +159,7 @@ struct domain_create {
     char *extra_config; /* extra config string */
     const char *restore_file;
     int migrate_fd; /* -1 means none */
+    int send_back_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
 };
 
@@ -2796,6 +2797,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
     int config_len = 0;
     int restore_fd = -1;
     int restore_fd_to_close = -1;
+    int send_back_fd = -1;
     const libxl_asyncprogress_how *autoconnect_console_how;
     struct save_file_header hdr;
     uint32_t domid_soft_reset = INVALID_DOMID;
@@ -2813,6 +2815,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
         if (migrate_fd >= 0) {
             restore_source = "<incoming migration stream>";
             restore_fd = migrate_fd;
+            send_back_fd = dom_info->send_back_fd;
         } else {
             restore_source = restore_file;
             restore_fd = open(restore_file, O_RDONLY);
@@ -3001,7 +3004,7 @@ start:
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
-                                          &params,
+                                          send_back_fd, &params,
                                           0, autoconnect_console_how);
 
         libxl_domain_restore_params_dispose(&params);
@@ -4754,6 +4757,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.monitor = monitor;
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
+    dom_info.send_back_fd = -1;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
 
@@ -4927,6 +4931,7 @@ int main_restore(int argc, char **argv)
     dom_info.config_file = config_file;
     dom_info.restore_file = checkpoint_file;
     dom_info.migrate_fd = -1;
+    dom_info.send_back_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
@@ -5394,6 +5399,7 @@ int main_create(int argc, char **argv)
     dom_info.quiet = quiet;
     dom_info.config_file = filename;
     dom_info.migrate_fd = -1;
+    dom_info.send_back_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c b/tools/ocaml/libs/xl/xenlight_stubs.c
index 4133527..98b52b9 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -538,7 +538,7 @@ value stub_libxl_domain_create_restore(value ctx, value domain_config, value par
 
 	caml_enter_blocking_section();
 	ret = libxl_domain_create_restore(CTX, &c_dconfig, &c_domid, restore_fd,
-		&c_params, ao_how, NULL);
+		-1, &c_params, ao_how, NULL);
 	caml_leave_blocking_section();
 
 	free(ao_how);
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 04/26] tools/libxl: Introduce new helper function dup_fd_helper()
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (2 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 05/26] tools/libx{l, c}: add back channel to libxc Changlong Xie
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

It is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_save_callout.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 7f1f5d4..06967df 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -119,6 +119,23 @@ void libxl__save_helper_init(libxl__save_helper_state *shs)
 
 /*----- helper execution -----*/
 
+/* This function can not fail. */
+static int dup_cloexec(libxl__gc *gc, int fd, const char *what)
+{
+    int dup_fd = fd;
+
+    if (fd <= 2) {
+        dup_fd = dup(fd);
+        if (dup_fd < 0) {
+            LOGE(ERROR,"dup %s", what);
+            exit(-1);
+        }
+    }
+    libxl_fd_set_cloexec(CTX, dup_fd, 0);
+
+    return dup_fd;
+}
+
 /*
  * Both save and restore share four parameters:
  * 1) Path to libxl-save-helper.
@@ -186,14 +203,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
 
     pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
     if (!pid) {
-        if (stream_fd <= 2) {
-            stream_fd = dup(stream_fd);
-            if (stream_fd < 0) {
-                LOGE(ERROR,"dup migration stream fd");
-                exit(-1);
-            }
-        }
-        libxl_fd_set_cloexec(CTX, stream_fd, 0);
+        stream_fd = dup_cloexec(gc, stream_fd, "migration stream fd");
         *stream_fd_arg = GCSPRINTF("%d", stream_fd);
 
         for (i=0; i<num_preserve_fds; i++)
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 05/26] tools/libx{l, c}: add back channel to libxc
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (3 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 04/26] tools/libxl: Introduce new helper function dup_fd_helper() Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 06/26] docs: add colo readme Changlong Xie
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent from the
primary to the secondary.

However, the set difference B - A (the one in B but not in A, lets
call this C) is out-of-date on the secondary (with respect to the
primary) and will not be sent by the primary (to secondary), as it
was not memory dirtied by the primary. The secondary needs C page data
to reconstruct an exact copy of the primary at the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

We transfer the dirty bitmap on libxc side, so we need to introduce back
channel to libxc.

Note: it is different from the paper. We change the original design to
the current one, according to our following concerns:
1. The original design needs extra memory on Secondary host. When there's
   multiple backups on one host, the memory cost is high.
2. The memory cache code will be another 1k+, it will make the review
   more time consuming.

Note: this patch merely adds new parameters to various prototypes and
functions. The new parameters are used in later patch called
"libxc/restore: send dirty pfn list to primary when checkpoint under
COLO".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/include/xenguest.h   |  4 ++--
 tools/libxc/xc_nomigrate.c       |  4 ++--
 tools/libxc/xc_sr_restore.c      |  2 +-
 tools/libxc/xc_sr_save.c         |  2 +-
 tools/libxl/libxl_save_callout.c | 21 +++++++++++++++------
 tools/libxl/libxl_save_helper.c  |  8 ++++++--
 6 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 4f0b06e..b4f4bfb 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -93,7 +93,7 @@ typedef enum {
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
                    struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type);
+                   xc_migration_stream_t stream_type, int recv_fd);
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
@@ -132,7 +132,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       xc_migration_stream_t stream_type,
-                      struct restore_callbacks *callbacks);
+                      struct restore_callbacks *callbacks, int send_back_fd);
 
 /**
  * This function will create a domain for a paravirtualized Linux
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 08e1f8c..15c838f 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -23,7 +23,7 @@
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type)
+                   xc_migration_stream_t stream_type, int recv_fd)
 {
     errno = ENOSYS;
     return -1;
@@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       xc_migration_stream_t stream_type,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int send_back_fd)
 {
     errno = ENOSYS;
     return -1;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 819401d..2b9a0ea 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -726,7 +726,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_gfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       xc_migration_stream_t stream_type,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int send_back_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 388ae7f..1ccdbbb 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -830,7 +830,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
                    uint32_t max_iters, uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type)
+                   xc_migration_stream_t stream_type, int recv_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 06967df..f15c235 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -27,7 +27,7 @@
  */
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
                        const char *mode_arg,
-                       int stream_fd,
+                       int stream_fd, int back_channel_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums);
 
@@ -50,6 +50,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     /* Convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     const int restore_fd = dcs->libxc_fd;
+    const int send_back_fd = dcs->send_back_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
 
     unsigned cbflags =
@@ -71,7 +72,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     shs->caller_state = dcs;
     shs->need_results = 1;
 
-    run_helper(egc, shs, "--restore-domain", restore_fd, 0, 0,
+    run_helper(egc, shs, "--restore-domain", restore_fd, send_back_fd, 0, 0,
                argnums, ARRAY_SIZE(argnums));
 }
 
@@ -95,7 +96,7 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
     shs->caller_state = dss;
     shs->need_results = 0;
 
-    run_helper(egc, shs, "--save-domain", dss->fd,
+    run_helper(egc, shs, "--save-domain", dss->fd, dss->recv_fd,
                NULL, 0,
                argnums, ARRAY_SIZE(argnums));
     return;
@@ -141,12 +142,14 @@ static int dup_cloexec(libxl__gc *gc, int fd, const char *what)
  * 1) Path to libxl-save-helper.
  * 2) --[restore|save]-domain.
  * 3) stream file descriptor.
+ * 4) back channel file descriptor.
  * n) save/restore specific parameters.
- * 4) A \0 at the end.
+ * 5) A \0 at the end.
  */
-#define HELPER_NR_ARGS 4
+#define HELPER_NR_ARGS 5
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
-                       const char *mode_arg, int stream_fd,
+                       const char *mode_arg,
+                       int stream_fd, int back_channel_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums)
 {
@@ -179,6 +182,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC_BIN "/" "libxl-save-helper";
     *arg++ = mode_arg;
     const char **stream_fd_arg = arg++;
+    const char **back_channel_fd_arg = arg++;
     for (i=0; i<num_argnums; i++)
         *arg++ = GCSPRINTF("%lu", argnums[i]);
     *arg++ = 0;
@@ -206,6 +210,11 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
         stream_fd = dup_cloexec(gc, stream_fd, "migration stream fd");
         *stream_fd_arg = GCSPRINTF("%d", stream_fd);
 
+        if (back_channel_fd >= 0)
+            back_channel_fd = dup_cloexec(gc, back_channel_fd,
+                                          "migration back channel fd");
+        *back_channel_fd_arg = GCSPRINTF("%d", back_channel_fd);
+
         for (i=0; i<num_preserve_fds; i++)
             if (preserve_fds[i] >= 0) {
                 assert(preserve_fds[i] > 2);
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 0fd7022..5fe642a 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -238,6 +238,7 @@ static struct restore_callbacks helper_restore_callbacks;
 int main(int argc, char **argv)
 {
     int r;
+    int send_back_fd, recv_fd;
 
 #define NEXTARG (++argv, assert(*argv), *argv)
 
@@ -247,6 +248,7 @@ int main(int argc, char **argv)
     if (!strcmp(mode,"--save-domain")) {
 
         io_fd =                             atoi(NEXTARG);
+        recv_fd =                           atoi(NEXTARG);
         uint32_t dom =                      strtoul(NEXTARG,0,10);
         uint32_t max_iters =                strtoul(NEXTARG,0,10);
         uint32_t max_factor =               strtoul(NEXTARG,0,10);
@@ -262,12 +264,14 @@ int main(int argc, char **argv)
         setup_signals(save_signal_handler);
 
         r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-                           &helper_save_callbacks, hvm, stream_type);
+                           &helper_save_callbacks, hvm, stream_type,
+                           recv_fd);
         complete(r);
 
     } else if (!strcmp(mode,"--restore-domain")) {
 
         io_fd =                             atoi(NEXTARG);
+        send_back_fd =                      atoi(NEXTARG);
         uint32_t dom =                      strtoul(NEXTARG,0,10);
         unsigned store_evtchn =             strtoul(NEXTARG,0,10);
         domid_t store_domid =               strtoul(NEXTARG,0,10);
@@ -292,7 +296,7 @@ int main(int argc, char **argv)
                               store_domid, console_evtchn, &console_mfn,
                               console_domid, hvm, pae, superpages,
                               stream_type,
-                              &helper_restore_callbacks);
+                              &helper_restore_callbacks, send_back_fd);
         helper_stub_restore_results(store_mfn,console_mfn,0);
         complete(r);
 
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 06/26] docs: add colo readme
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (4 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 05/26] tools/libx{l, c}: add back channel to libxc Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 07/26] docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams Changlong Xie
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

add colo readme, refer to
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 docs/README.colo | 9 +++++++++
 1 file changed, 9 insertions(+)
 create mode 100644 docs/README.colo

diff --git a/docs/README.colo b/docs/README.colo
new file mode 100644
index 0000000..466eb72
--- /dev/null
+++ b/docs/README.colo
@@ -0,0 +1,9 @@
+COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
+project is a high availability solution. Both primary VM (PVM) and secondary VM
+(SVM) run in parallel. They receive the same request from client, and generate
+response in parallel too. If the response packets from PVM and SVM are
+identical, they are released immediately. Otherwise, a VM checkpoint (on demand)
+is conducted.
+
+See the website at http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
+for details.
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 07/26] docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (5 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 06/26] docs: add colo readme Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 08/26] libxc/migration: Specification update for DIRTY_PFN_LIST records Changlong Xie
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/specs/libxl-migration-stream.pandoc | 59 ++++++++++++++++++++++++++++++--
 tools/libxl/libxl_sr_stream_format.h     | 11 ++++++
 tools/python/xen/migration/libxl.py      |  9 +++++
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc
index 2c97d86..a1ba1ac 100644
--- a/docs/specs/libxl-migration-stream.pandoc
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -1,6 +1,8 @@
 % LibXenLight Domain Image Format
 % Andrew Cooper <<andrew.cooper3@citrix.com>>
-% Revision 1
+  Wen Congyang <<wency@cn.fujitsu.com>>
+  Yang Hongyang <<hongyang.yang@easystack.cn>>
+% Revision 2
 
 Introduction
 ============
@@ -119,7 +121,9 @@ type         0x00000000: END
 
              0x00000004: CHECKPOINT_END
 
-             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
+             0x00000005: CHECKPOINT_STATE
+
+             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
              records.
 
              0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
@@ -249,6 +253,57 @@ A checkpoint end record marks the end of a checkpoint in the image.
 The end record contains no fields; its body_length is 0.
 
 
+CHECKPOINT\_STATE
+--------------
+
+A checkpoint state record contains the control information for checkpoint. It
+is only used by COLO, more detail please reference README.colo.
+
+     0     1     2     3     4     5     6     7 octet
+    +------------------------+------------------------+
+    | control_id             | padding                |
+    +------------------------+------------------------+
+
+--------------------------------------------------------------------
+Field            Description
+------------     ---------------------------------------------------
+control_id       0x00000000: Secondary VM is out of sync, start a new checkpoint
+                 (Primary -> Secondary)
+
+                 0x00000001: Secondary VM is suspended (Secondary -> Primary)
+
+                 0x00000002: Secondary VM is ready (Secondary -> Primary)
+
+                 0x00000003: Secondary VM is resumed (Secondary -> Primary)
+
+--------------------------------------------------------------------
+
+In COLO, Primary is running in below loop:
+
+1. Suspend primary vm
+    a. Suspend primary vm
+    b. Read _CHECKPOINT\_SVM\_SUSPENDED_ sent by secondary
+2. Checkpoint
+3. Resume primary vm
+    a. Read _CHECKPOINT\_SVM\_READY_ from secondary
+    b. Resume primary vm
+    c. Read _CHECKPOINT\_SVM\_RESUMED_ from secondary
+4. Wait a new checkpoint
+    a. Send _CHECKPOINT\_NEW_ to secondary
+
+While Secondary is running in below loop:
+
+1. Resume secondary vm
+    a. Send _CHECKPOINT\_SVM\_READY_ to primary
+    b. Resume secondary vm
+    c. Send _CHECKPOINT\_SVM\_RESUMED_ to primary
+2. Wait a new checkpoint
+    a. Read _CHECKPOINT\_NEW_ from primary
+3. Suspend secondary vm
+    a. Suspend secondary vm
+    b. Send _CHECKPOINT\_SVM\_SUSPENDED_ to primary
+4. Checkpoint
+
 Future Extensions
 =================
 
diff --git a/tools/libxl/libxl_sr_stream_format.h b/tools/libxl/libxl_sr_stream_format.h
index 54da360..75f5190 100644
--- a/tools/libxl/libxl_sr_stream_format.h
+++ b/tools/libxl/libxl_sr_stream_format.h
@@ -36,6 +36,7 @@ typedef struct libxl__sr_rec_hdr
 #define REC_TYPE_EMULATOR_XENSTORE_DATA 0x00000002U
 #define REC_TYPE_EMULATOR_CONTEXT       0x00000003U
 #define REC_TYPE_CHECKPOINT_END         0x00000004U
+#define REC_TYPE_CHECKPOINT_STATE       0x00000005U
 
 typedef struct libxl__sr_emulator_hdr
 {
@@ -47,6 +48,16 @@ typedef struct libxl__sr_emulator_hdr
 #define EMULATOR_QEMU_TRADITIONAL    0x00000001U
 #define EMULATOR_QEMU_UPSTREAM       0x00000002U
 
+typedef struct libxl_sr_checkpoint_state
+{
+    uint32_t id;
+} libxl_sr_checkpoint_state;
+
+#define CHECKPOINT_NEW               0x00000000U
+#define CHECKPOINT_SVM_SUSPENDED     0x00000001U
+#define CHECKPOINT_SVM_READY         0x00000002U
+#define CHECKPOINT_SVM_RESUMED       0x00000003U
+
 #endif /* LIBXL__SR_STREAM_FORMAT_H */
 
 /*
diff --git a/tools/python/xen/migration/libxl.py b/tools/python/xen/migration/libxl.py
index fc0acf6..d5f54dc 100644
--- a/tools/python/xen/migration/libxl.py
+++ b/tools/python/xen/migration/libxl.py
@@ -37,6 +37,7 @@ REC_TYPE_libxc_context          = 0x00000001
 REC_TYPE_emulator_xenstore_data = 0x00000002
 REC_TYPE_emulator_context       = 0x00000003
 REC_TYPE_checkpoint_end         = 0x00000004
+REC_TYPE_checkpoint_state       = 0x00000005
 
 rec_type_to_str = {
     REC_TYPE_end                    : "End",
@@ -44,6 +45,7 @@ rec_type_to_str = {
     REC_TYPE_emulator_xenstore_data : "Emulator xenstore data",
     REC_TYPE_emulator_context       : "Emulator context",
     REC_TYPE_checkpoint_end         : "Checkpoint end",
+    REC_TYPE_checkpoint_state       : "Checkpoint state"
 }
 
 # emulator_* header
@@ -212,6 +214,11 @@ class VerifyLibxl(VerifyBase):
         if len(content) != 0:
             raise RecordError("Checkpoint end record with non-zero length")
 
+    def verify_record_checkpoint_state(self, content):
+        """ Checkpoint state """
+        if len(content) == 0:
+            raise RecordError("Checkpoint state record with zero length")
+
 
 record_verifiers = {
     REC_TYPE_end:
@@ -224,4 +231,6 @@ record_verifiers = {
         VerifyLibxl.verify_record_emulator_context,
     REC_TYPE_checkpoint_end:
         VerifyLibxl.verify_record_checkpoint_end,
+    REC_TYPE_checkpoint_state:
+        VerifyLibxl.verify_record_checkpoint_state,
 }
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 08/26] libxc/migration: Specification update for DIRTY_PFN_LIST records
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (6 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 07/26] docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 09/26] libxc/migration: export read_record for common use Changlong Xie
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Used by secondary to send it's dirty bitmap to primary under COLO.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/specs/libxc-migration-stream.pandoc | 27 ++++++++++++-
 tools/libxc/xc_sr_common.c               | 31 ++++++++-------
 tools/libxc/xc_sr_stream_format.h        | 31 ++++++++-------
 tools/python/xen/migration/libxc.py      | 68 ++++++++++++++++++--------------
 4 files changed, 96 insertions(+), 61 deletions(-)

diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
index 8cd678f..31eba10 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -1,6 +1,8 @@
 % LibXenCtrl Domain Image Format
 % David Vrabel <<david.vrabel@citrix.com>>
   Andrew Cooper <<andrew.cooper3@citrix.com>>
+  Wen Congyang <<wency@cn.fujitsu.com>>
+  Yang Hongyang <<hongyang.yang@easystack.cn>>
 % Revision 1
 
 Introduction
@@ -227,7 +229,9 @@ type         0x00000000: END
 
              0x0000000E: CHECKPOINT
 
-             0x0000000F - 0x7FFFFFFF: Reserved for future _mandatory_
+             0x0000000F: CHECKPOINT_DIRTY_PFN_LIST (Secondary -> Primary)
+
+             0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_
              records.
 
              0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
@@ -599,6 +603,27 @@ CHECKPOINT record or an END record.
 
 \clearpage
 
+CHECKPOINT_DIRTY_PFN_LIST
+-------------------------
+
+A checkpoint dirty pfn list record is used to convey information about
+dirty memory in the VM. It is an unordered list of PFNs. Currently only
+applicable in the backchannel of a checkpointed stream. It is only used
+by COLO, more detail please reference README.colo.
+
+     0     1     2     3     4     5     6     7 octet
+    +-------------------------------------------------+
+    | pfn[0]                                          |
+    +-------------------------------------------------+
+    ...
+    +-------------------------------------------------+
+    | pfn[C-1]                                        |
+    +-------------------------------------------------+
+
+The count of pfns is: record->length/sizeof(uint64_t).
+
+\clearpage
+
 Layout
 ======
 
diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 945cfa6..3313a90 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -20,21 +20,22 @@ const char *dhdr_type_to_str(uint32_t type)
 
 static const char *mandatory_rec_types[] =
 {
-    [REC_TYPE_END]                  = "End",
-    [REC_TYPE_PAGE_DATA]            = "Page data",
-    [REC_TYPE_X86_PV_INFO]          = "x86 PV info",
-    [REC_TYPE_X86_PV_P2M_FRAMES]    = "x86 PV P2M frames",
-    [REC_TYPE_X86_PV_VCPU_BASIC]    = "x86 PV vcpu basic",
-    [REC_TYPE_X86_PV_VCPU_EXTENDED] = "x86 PV vcpu extended",
-    [REC_TYPE_X86_PV_VCPU_XSAVE]    = "x86 PV vcpu xsave",
-    [REC_TYPE_SHARED_INFO]          = "Shared info",
-    [REC_TYPE_TSC_INFO]             = "TSC info",
-    [REC_TYPE_HVM_CONTEXT]          = "HVM context",
-    [REC_TYPE_HVM_PARAMS]           = "HVM params",
-    [REC_TYPE_TOOLSTACK]            = "Toolstack",
-    [REC_TYPE_X86_PV_VCPU_MSRS]     = "x86 PV vcpu msrs",
-    [REC_TYPE_VERIFY]               = "Verify",
-    [REC_TYPE_CHECKPOINT]           = "Checkpoint",
+    [REC_TYPE_END]                          = "End",
+    [REC_TYPE_PAGE_DATA]                    = "Page data",
+    [REC_TYPE_X86_PV_INFO]                  = "x86 PV info",
+    [REC_TYPE_X86_PV_P2M_FRAMES]            = "x86 PV P2M frames",
+    [REC_TYPE_X86_PV_VCPU_BASIC]            = "x86 PV vcpu basic",
+    [REC_TYPE_X86_PV_VCPU_EXTENDED]         = "x86 PV vcpu extended",
+    [REC_TYPE_X86_PV_VCPU_XSAVE]            = "x86 PV vcpu xsave",
+    [REC_TYPE_SHARED_INFO]                  = "Shared info",
+    [REC_TYPE_TSC_INFO]                     = "TSC info",
+    [REC_TYPE_HVM_CONTEXT]                  = "HVM context",
+    [REC_TYPE_HVM_PARAMS]                   = "HVM params",
+    [REC_TYPE_TOOLSTACK]                    = "Toolstack",
+    [REC_TYPE_X86_PV_VCPU_MSRS]             = "x86 PV vcpu msrs",
+    [REC_TYPE_VERIFY]                       = "Verify",
+    [REC_TYPE_CHECKPOINT]                   = "Checkpoint",
+    [REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST]    = "Checkpoint dirty pfn list",
 };
 
 const char *rec_type_to_str(uint32_t type)
diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h
index 6d0f8fd..3291b25 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -60,21 +60,22 @@ struct xc_sr_rhdr
 /* Somewhat arbitrary - 8MB */
 #define REC_LENGTH_MAX                (8U << 20)
 
-#define REC_TYPE_END                  0x00000000U
-#define REC_TYPE_PAGE_DATA            0x00000001U
-#define REC_TYPE_X86_PV_INFO          0x00000002U
-#define REC_TYPE_X86_PV_P2M_FRAMES    0x00000003U
-#define REC_TYPE_X86_PV_VCPU_BASIC    0x00000004U
-#define REC_TYPE_X86_PV_VCPU_EXTENDED 0x00000005U
-#define REC_TYPE_X86_PV_VCPU_XSAVE    0x00000006U
-#define REC_TYPE_SHARED_INFO          0x00000007U
-#define REC_TYPE_TSC_INFO             0x00000008U
-#define REC_TYPE_HVM_CONTEXT          0x00000009U
-#define REC_TYPE_HVM_PARAMS           0x0000000aU
-#define REC_TYPE_TOOLSTACK            0x0000000bU
-#define REC_TYPE_X86_PV_VCPU_MSRS     0x0000000cU
-#define REC_TYPE_VERIFY               0x0000000dU
-#define REC_TYPE_CHECKPOINT           0x0000000eU
+#define REC_TYPE_END                        0x00000000U
+#define REC_TYPE_PAGE_DATA                  0x00000001U
+#define REC_TYPE_X86_PV_INFO                0x00000002U
+#define REC_TYPE_X86_PV_P2M_FRAMES          0x00000003U
+#define REC_TYPE_X86_PV_VCPU_BASIC          0x00000004U
+#define REC_TYPE_X86_PV_VCPU_EXTENDED       0x00000005U
+#define REC_TYPE_X86_PV_VCPU_XSAVE          0x00000006U
+#define REC_TYPE_SHARED_INFO                0x00000007U
+#define REC_TYPE_TSC_INFO                   0x00000008U
+#define REC_TYPE_HVM_CONTEXT                0x00000009U
+#define REC_TYPE_HVM_PARAMS                 0x0000000aU
+#define REC_TYPE_TOOLSTACK                  0x0000000bU
+#define REC_TYPE_X86_PV_VCPU_MSRS           0x0000000cU
+#define REC_TYPE_VERIFY                     0x0000000dU
+#define REC_TYPE_CHECKPOINT                 0x0000000eU
+#define REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST  0x0000000fU
 
 #define REC_TYPE_OPTIONAL             0x80000000U
 
diff --git a/tools/python/xen/migration/libxc.py b/tools/python/xen/migration/libxc.py
index b0255ac..85a78f4 100644
--- a/tools/python/xen/migration/libxc.py
+++ b/tools/python/xen/migration/libxc.py
@@ -45,38 +45,40 @@ dhdr_type_to_str = {
 # Records
 RH_FORMAT = "II"
 
-REC_TYPE_end                  = 0x00000000
-REC_TYPE_page_data            = 0x00000001
-REC_TYPE_x86_pv_info          = 0x00000002
-REC_TYPE_x86_pv_p2m_frames    = 0x00000003
-REC_TYPE_x86_pv_vcpu_basic    = 0x00000004
-REC_TYPE_x86_pv_vcpu_extended = 0x00000005
-REC_TYPE_x86_pv_vcpu_xsave    = 0x00000006
-REC_TYPE_shared_info          = 0x00000007
-REC_TYPE_tsc_info             = 0x00000008
-REC_TYPE_hvm_context          = 0x00000009
-REC_TYPE_hvm_params           = 0x0000000a
-REC_TYPE_toolstack            = 0x0000000b
-REC_TYPE_x86_pv_vcpu_msrs     = 0x0000000c
-REC_TYPE_verify               = 0x0000000d
-REC_TYPE_checkpoint           = 0x0000000e
+REC_TYPE_end                        = 0x00000000
+REC_TYPE_page_data                  = 0x00000001
+REC_TYPE_x86_pv_info                = 0x00000002
+REC_TYPE_x86_pv_p2m_frames          = 0x00000003
+REC_TYPE_x86_pv_vcpu_basic          = 0x00000004
+REC_TYPE_x86_pv_vcpu_extended       = 0x00000005
+REC_TYPE_x86_pv_vcpu_xsave          = 0x00000006
+REC_TYPE_shared_info                = 0x00000007
+REC_TYPE_tsc_info                   = 0x00000008
+REC_TYPE_hvm_context                = 0x00000009
+REC_TYPE_hvm_params                 = 0x0000000a
+REC_TYPE_toolstack                  = 0x0000000b
+REC_TYPE_x86_pv_vcpu_msrs           = 0x0000000c
+REC_TYPE_verify                     = 0x0000000d
+REC_TYPE_checkpoint                 = 0x0000000e
+REC_TYPE_checkpoint_dirty_pfn_list  = 0x0000000f
 
 rec_type_to_str = {
-    REC_TYPE_end                  : "End",
-    REC_TYPE_page_data            : "Page data",
-    REC_TYPE_x86_pv_info          : "x86 PV info",
-    REC_TYPE_x86_pv_p2m_frames    : "x86 PV P2M frames",
-    REC_TYPE_x86_pv_vcpu_basic    : "x86 PV vcpu basic",
-    REC_TYPE_x86_pv_vcpu_extended : "x86 PV vcpu extended",
-    REC_TYPE_x86_pv_vcpu_xsave    : "x86 PV vcpu xsave",
-    REC_TYPE_shared_info          : "Shared info",
-    REC_TYPE_tsc_info             : "TSC info",
-    REC_TYPE_hvm_context          : "HVM context",
-    REC_TYPE_hvm_params           : "HVM params",
-    REC_TYPE_toolstack            : "Toolstack",
-    REC_TYPE_x86_pv_vcpu_msrs     : "x86 PV vcpu msrs",
-    REC_TYPE_verify               : "Verify",
-    REC_TYPE_checkpoint           : "Checkpoint",
+    REC_TYPE_end                        : "End",
+    REC_TYPE_page_data                  : "Page data",
+    REC_TYPE_x86_pv_info                : "x86 PV info",
+    REC_TYPE_x86_pv_p2m_frames          : "x86 PV P2M frames",
+    REC_TYPE_x86_pv_vcpu_basic          : "x86 PV vcpu basic",
+    REC_TYPE_x86_pv_vcpu_extended       : "x86 PV vcpu extended",
+    REC_TYPE_x86_pv_vcpu_xsave          : "x86 PV vcpu xsave",
+    REC_TYPE_shared_info                : "Shared info",
+    REC_TYPE_tsc_info                   : "TSC info",
+    REC_TYPE_hvm_context                : "HVM context",
+    REC_TYPE_hvm_params                 : "HVM params",
+    REC_TYPE_toolstack                  : "Toolstack",
+    REC_TYPE_x86_pv_vcpu_msrs           : "x86 PV vcpu msrs",
+    REC_TYPE_verify                     : "Verify",
+    REC_TYPE_checkpoint                 : "Checkpoint",
+    REC_TYPE_checkpoint_dirty_pfn_list  : "Checkpoint dirty pfn list"
 }
 
 # page_data
@@ -403,6 +405,10 @@ class VerifyLibxc(VerifyBase):
         if len(content) != 0:
             raise RecordError("Checkpoint record with non-zero length")
 
+    def verify_record_checkpoint_dirty_pfn_list(self, content):
+        """ checkpoint dirty pfn list """
+        raise RecordError("Found checkpoint dirty pfn list record in stream")
+
 
 record_verifiers = {
     REC_TYPE_end:
@@ -443,4 +449,6 @@ record_verifiers = {
         VerifyLibxc.verify_record_verify,
     REC_TYPE_checkpoint:
         VerifyLibxc.verify_record_checkpoint,
+    REC_TYPE_checkpoint_dirty_pfn_list:
+        VerifyLibxc.verify_record_checkpoint_dirty_pfn_list,
     }
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 09/26] libxc/migration: export read_record for common use
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (7 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 08/26] libxc/migration: Specification update for DIRTY_PFN_LIST records Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 10/26] tools/libxl: add back channel support to write stream Changlong Xie
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

read_record() could be used by primary to read dirty bitmap
record sent by secondary under COLO.
When used by xc save side, we need to pass the backchannel fd
instead of ctx->fd to read_record(), so we added a fd param to
it.
No functional changes.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/xc_sr_common.c  | 49 +++++++++++++++++++++++++++++++++++
 tools/libxc/xc_sr_common.h  | 14 ++++++++++
 tools/libxc/xc_sr_restore.c | 63 +--------------------------------------------
 3 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 3313a90..b228a15 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -89,6 +89,55 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec,
     return -1;
 }
 
+int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_rhdr rhdr;
+    size_t datasz;
+
+    if ( read_exact(fd, &rhdr, sizeof(rhdr)) )
+    {
+        PERROR("Failed to read Record Header from stream");
+        return -1;
+    }
+    else if ( rhdr.length > REC_LENGTH_MAX )
+    {
+        ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
+              rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
+        return -1;
+    }
+
+    datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
+
+    if ( datasz )
+    {
+        rec->data = malloc(datasz);
+
+        if ( !rec->data )
+        {
+            ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
+                  datasz, rhdr.type, rec_type_to_str(rhdr.type));
+            return -1;
+        }
+
+        if ( read_exact(fd, rec->data, datasz) )
+        {
+            free(rec->data);
+            rec->data = NULL;
+            PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
+                   datasz, rhdr.type, rec_type_to_str(rhdr.type));
+            return -1;
+        }
+    }
+    else
+        rec->data = NULL;
+
+    rec->type   = rhdr.type;
+    rec->length = rhdr.length;
+
+    return 0;
+};
+
 static void __attribute__((unused)) build_assertions(void)
 {
     XC_BUILD_BUG_ON(sizeof(struct xc_sr_ihdr) != 24);
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index e7568b5..c990664 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -372,6 +372,20 @@ static inline int write_record(struct xc_sr_context *ctx,
 }
 
 /*
+ * Reads a record from the stream, and fills in the record structure.
+ *
+ * Returns 0 on success and non-0 on failure.
+ *
+ * On success, the records type and size shall be valid.
+ * - If size is 0, data shall be NULL.
+ * - If size is non-0, data shall be a buffer allocated by malloc() which must
+ *   be passed to free() by the caller.
+ *
+ * On failure, the contents of the record structure are undefined.
+ */
+int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
+
+/*
  * This would ideally be private in restore.c, but is needed by
  * x86_pv_localise_page() if we receive pagetables frames ahead of the
  * contents of the frames they point at.
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 2b9a0ea..3e4ca7f 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -69,67 +69,6 @@ static int read_headers(struct xc_sr_context *ctx)
 }
 
 /*
- * Reads a record from the stream, and fills in the record structure.
- *
- * Returns 0 on success and non-0 on failure.
- *
- * On success, the records type and size shall be valid.
- * - If size is 0, data shall be NULL.
- * - If size is non-0, data shall be a buffer allocated by malloc() which must
- *   be passed to free() by the caller.
- *
- * On failure, the contents of the record structure are undefined.
- */
-static int read_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
-{
-    xc_interface *xch = ctx->xch;
-    struct xc_sr_rhdr rhdr;
-    size_t datasz;
-
-    if ( read_exact(ctx->fd, &rhdr, sizeof(rhdr)) )
-    {
-        PERROR("Failed to read Record Header from stream");
-        return -1;
-    }
-    else if ( rhdr.length > REC_LENGTH_MAX )
-    {
-        ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
-              rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
-        return -1;
-    }
-
-    datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
-
-    if ( datasz )
-    {
-        rec->data = malloc(datasz);
-
-        if ( !rec->data )
-        {
-            ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
-                  datasz, rhdr.type, rec_type_to_str(rhdr.type));
-            return -1;
-        }
-
-        if ( read_exact(ctx->fd, rec->data, datasz) )
-        {
-            free(rec->data);
-            rec->data = NULL;
-            PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
-                   datasz, rhdr.type, rec_type_to_str(rhdr.type));
-            return -1;
-        }
-    }
-    else
-        rec->data = NULL;
-
-    rec->type   = rhdr.type;
-    rec->length = rhdr.length;
-
-    return 0;
-};
-
-/*
  * Is a pfn populated?
  */
 static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
@@ -650,7 +589,7 @@ static int restore(struct xc_sr_context *ctx)
 
     do
     {
-        rc = read_record(ctx, &rec);
+        rc = read_record(ctx, ctx->fd, &rec);
         if ( rc )
         {
             if ( ctx->restore.buffer_all_records )
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 10/26] tools/libxl: add back channel support to write stream
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (8 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 09/26] libxc/migration: export read_record for common use Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 11/26] tools/libxl: add back channel support to read stream Changlong Xie
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Add back channel support to write stream. If the write stream is
a back channel stream, this means the write stream is used by
Secondary to send some records back.

Note: The function libxl__stream_write_checkpoint_state() will be used
in later patches called "secondary vm suspend/resume/checkpoint code" and
"primary vm suspend/resume/checkpoint code".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_dom_save.c     |   1 +
 tools/libxl/libxl_internal.h     |   6 +++
 tools/libxl/libxl_stream_write.c | 100 +++++++++++++++++++++++++++++++++++----
 3 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 4df86c3..cd324bb 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -405,6 +405,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     dss->sws.ao  = dss->ao;
     dss->sws.dss = dss;
     dss->sws.fd  = dss->fd;
+    dss->sws.back_channel = false;
     dss->sws.completion_callback = stream_done;
 
     libxl__stream_write_start(egc, &dss->sws);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 42d8bbd..0e5f9f8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3060,6 +3060,7 @@ struct libxl__stream_write_state {
     libxl__ao *ao;
     libxl__domain_save_state *dss;
     int fd;
+    bool back_channel;
     void (*completion_callback)(libxl__egc *egc,
                                 libxl__stream_write_state *sws,
                                 int rc);
@@ -3071,6 +3072,7 @@ struct libxl__stream_write_state {
     bool running;
     bool in_checkpoint;
     bool sync_teardown;  /* Only used to coordinate shutdown on error path. */
+    bool in_checkpoint_state;
     libxl__save_helper_state shs;
 
     /* Main stream-writing data. */
@@ -3094,6 +3096,10 @@ _hidden void libxl__stream_write_start(libxl__egc *egc,
 _hidden void
 libxl__stream_write_start_checkpoint(libxl__egc *egc,
                                      libxl__stream_write_state *stream);
+_hidden void
+libxl__stream_write_checkpoint_state(libxl__egc *egc,
+                                     libxl__stream_write_state *stream,
+                                     libxl_sr_checkpoint_state *srcs);
 _hidden void libxl__stream_write_abort(libxl__egc *egc,
                                        libxl__stream_write_state *stream,
                                        int rc);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index f6ea55d..aba554b 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -49,6 +49,13 @@
  *  - if (hvm)
  *      - Emulator context record
  *  - Checkpoint end record
+ *
+ * For back channel stream:
+ * - libxl__stream_write_start()
+ *    - Set up the stream to running state
+ *
+ * - Use libxl__stream_write_checkpoint_state to write the record. When the
+ *   record is written out, call stream->checkpoint_callback() to return.
  */
 
 /* Success/error/cleanup handling. */
@@ -91,6 +98,12 @@ static void write_checkpoint_end_record(libxl__egc *egc,
 static void checkpoint_end_record_done(libxl__egc *egc,
                                        libxl__stream_write_state *stream);
 
+/* checkpoint state */
+static void write_checkpoint_state_done(libxl__egc *egc,
+                                        libxl__stream_write_state *stream);
+static void checkpoint_state_done(libxl__egc *egc,
+                                  libxl__stream_write_state *stream, int rc);
+
 /*----- Helpers -----*/
 
 static void write_done(libxl__egc *egc,
@@ -225,6 +238,17 @@ void libxl__stream_write_start(libxl__egc *egc,
 
     stream->running = true;
 
+    dc->ao        = ao;
+    dc->readfd    = -1;
+    dc->writewhat = "stream header";
+    dc->copywhat  = "save v2 stream";
+    dc->writefd   = stream->fd;
+    dc->maxsz     = -1;
+    dc->callback  = stream_header_done;
+
+    if (stream->back_channel)
+        return;
+
     if (dss->type == LIBXL_DOMAIN_TYPE_HVM) {
         stream->device_model_version =
             libxl__device_model_version_running(gc, dss->domid);
@@ -249,14 +273,6 @@ void libxl__stream_write_start(libxl__egc *egc,
         stream->emu_sub_hdr.index = 0;
     }
 
-    dc->ao        = ao;
-    dc->readfd    = -1;
-    dc->writewhat = "stream header";
-    dc->copywhat  = "save v2 stream";
-    dc->writefd   = stream->fd;
-    dc->maxsz     = -1;
-    dc->callback  = stream_header_done;
-
     rc = libxl__datacopier_start(dc);
     if (rc)
         goto err;
@@ -279,6 +295,7 @@ void libxl__stream_write_start_checkpoint(libxl__egc *egc,
 {
     assert(stream->running);
     assert(!stream->in_checkpoint);
+    assert(!stream->back_channel);
     stream->in_checkpoint = true;
 
     write_emulator_xenstore_record(egc, stream);
@@ -577,6 +594,21 @@ static void stream_complete(libxl__egc *egc,
         return;
     }
 
+    if (stream->in_checkpoint_state) {
+        assert(rc);
+
+        /*
+         * If an error is encountered while in a checkpoint, pass it
+         * back to libxc.  The failure will come back around to us via
+         * 1. normal stream
+         *    libxl__xc_domain_save_done()
+         * 2. back_channel stream
+         *    libxl__stream_write_abort()
+         */
+        checkpoint_state_done(egc, stream, rc);
+        return;
+    }
+
     stream_done(egc, stream, rc);
 }
 
@@ -584,13 +616,24 @@ static void stream_done(libxl__egc *egc,
                         libxl__stream_write_state *stream, int rc)
 {
     assert(stream->running);
+    assert(!stream->in_checkpoint_state);
     stream->running = false;
 
     if (stream->emu_carefd)
         libxl__carefd_close(stream->emu_carefd);
     free(stream->emu_body);
 
-    check_all_finished(egc, stream, rc);
+    if (!stream->back_channel) {
+        /*
+         * 1. In stream_done(), stream->running is set to false, so
+         *    the stream itself is not in use.
+         * 2. Write stream is a back channel stream, this means it
+         *    is only used by secondary(restore side) to send records
+         *    back, so it doesn't have save helper.
+         * So we don't need invoke check_all_finished here
+         */
+         check_all_finished(egc, stream, rc);
+    }
 }
 
 static void checkpoint_done(libxl__egc *egc,
@@ -642,7 +685,44 @@ static void check_all_finished(libxl__egc *egc,
         libxl__save_helper_inuse(&stream->shs))
         return;
 
-    stream->completion_callback(egc, stream, stream->rc);
+    if (stream->completion_callback)
+        /* back channel stream doesn't have completion_callback() */
+        stream->completion_callback(egc, stream, stream->rc);
+}
+
+/*----- checkpoint state -----*/
+
+void libxl__stream_write_checkpoint_state(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          libxl_sr_checkpoint_state *srcs)
+{
+    struct libxl__sr_rec_hdr rec;
+
+    assert(stream->running);
+    assert(!stream->in_checkpoint);
+    assert(!stream->in_checkpoint_state);
+    stream->in_checkpoint_state = true;
+
+    FILLZERO(rec);
+    rec.type = REC_TYPE_CHECKPOINT_STATE;
+    rec.length = sizeof(*srcs);
+
+    setup_write(egc, stream, "checkpoint state", &rec,
+                srcs, write_checkpoint_state_done);
+}
+
+static void write_checkpoint_state_done(libxl__egc *egc,
+                                        libxl__stream_write_state *stream)
+{
+    checkpoint_state_done(egc, stream, 0);
+}
+
+static void checkpoint_state_done(libxl__egc *egc,
+                                  libxl__stream_write_state *stream, int rc)
+{
+    assert(stream->in_checkpoint_state);
+    stream->in_checkpoint_state = false;
+    stream->checkpoint_callback(egc, stream, rc);
 }
 
 /*
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 11/26] tools/libxl: add back channel support to read stream
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (9 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 10/26] tools/libxl: add back channel support to write stream Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code Changlong Xie
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

This is used by primay to read records sent by secondary.

Note: The function libxl__stream_read_checkpoint_state() will be used
in later patches called "secondary vm suspend/resume/checkpoint code" and
"primary vm suspend/resume/checkpoint code".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_create.c      |  1 +
 tools/libxl/libxl_internal.h    |  4 ++
 tools/libxl/libxl_stream_read.c | 94 +++++++++++++++++++++++++++++++++++++----
 3 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 09f2f13..4d2b95c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1034,6 +1034,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->srs.dcs = dcs;
     dcs->srs.fd = restore_fd;
     dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
+    dcs->srs.back_channel = false;
     dcs->srs.completion_callback = domcreate_stream_done;
 
     if (restore_fd >= 0) {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 0e5f9f8..1fafba8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3444,6 +3444,7 @@ struct libxl__stream_read_state {
     libxl__domain_create_state *dcs;
     int fd;
     bool legacy;
+    bool back_channel;
     void (*completion_callback)(libxl__egc *egc,
                                 libxl__stream_read_state *srs,
                                 int rc);
@@ -3455,6 +3456,7 @@ struct libxl__stream_read_state {
     bool running;
     bool in_checkpoint;
     bool sync_teardown; /* Only used to coordinate shutdown on error path. */
+    bool in_checkpoint_state;
     libxl__save_helper_state shs;
     libxl__conversion_helper_state chs;
 
@@ -3482,6 +3484,8 @@ _hidden void libxl__stream_read_start(libxl__egc *egc,
                                       libxl__stream_read_state *stream);
 _hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
                                                  libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_checkpoint_state(libxl__egc *egc,
+                                                 libxl__stream_read_state *stream);
 _hidden void libxl__stream_read_abort(libxl__egc *egc,
                                       libxl__stream_read_state *stream, int rc);
 static inline bool
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index f4781eb..302ae53 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -118,6 +118,15 @@
  *    record, and therefore the buffered state is inconsistent. In
  *    libxl__xc_domain_restore_done(), we just complete the stream and
  *    stream->completion_callback() will be called to resume the guest
+ *
+ * For back channel stream:
+ * - libxl__stream_read_start()
+ *    - Set up the stream to running state
+ *
+ * - libxl__stream_read_continue()
+ *     - Set up reading the next record from a started stream.
+ *       Add some codes to process_record() to handle the record.
+ *       Then call stream->checkpoint_callback() to return.
  */
 
 /* Success/error/cleanup handling. */
@@ -157,6 +166,10 @@ static void write_emulator_done(libxl__egc *egc,
                                 libxl__datacopier_state *dc,
                                 int rc, int onwrite, int errnoval);
 
+/* Handlers for checkpoint state mini-loop */
+static void checkpoint_state_done(libxl__egc *egc,
+                                  libxl__stream_read_state *stream, int rc);
+
 /*----- Helpers -----*/
 
 /* Helper to set up reading some data from the stream. */
@@ -221,6 +234,14 @@ void libxl__stream_read_start(libxl__egc *egc,
     stream->running = true;
     stream->phase   = SRS_PHASE_NORMAL;
 
+    dc->ao       = stream->ao;
+    dc->copywhat = "restore v2 stream";
+    dc->readfd = stream->fd;
+    dc->writefd  = -1;
+
+    if (stream->back_channel)
+        return;
+
     if (stream->legacy) {
         /* Convert the legacy stream. */
         libxl__conversion_helper_state *chs = &stream->chs;
@@ -241,12 +262,6 @@ void libxl__stream_read_start(libxl__egc *egc,
         stream->fd = libxl__carefd_fd(stream->chs.v2_carefd);
         stream->dcs->libxc_fd = stream->fd;
     }
-    /* stream->fd is now a v2 stream. */
-
-    dc->ao       = stream->ao;
-    dc->copywhat = "restore v2 stream";
-    dc->readfd   = stream->fd;
-    dc->writefd  = -1;
 
     /* Start reading the stream header. */
     rc = setup_read(stream, "stream header",
@@ -532,6 +547,7 @@ static bool process_record(libxl__egc *egc,
     STATE_AO_GC(stream->ao);
     libxl__domain_create_state *dcs = stream->dcs;
     libxl__sr_record_buf *rec;
+    libxl_sr_checkpoint_state *srcs;
     bool further_action_needed = false;
     int rc = 0;
 
@@ -602,6 +618,17 @@ static bool process_record(libxl__egc *egc,
         checkpoint_done(egc, stream, 0);
         break;
 
+    case REC_TYPE_CHECKPOINT_STATE:
+        if (!stream->in_checkpoint_state) {
+            LOG(ERROR, "Unexpected CHECKPOINT_STATE record in stream");
+            rc = ERROR_FAIL;
+            goto err;
+        }
+
+        srcs = rec->body;
+        checkpoint_state_done(egc, stream, srcs->id);
+        break;
+
     default:
         LOG(ERROR, "Unrecognised record 0x%08x", rec->hdr.type);
         rc = ERROR_FAIL;
@@ -713,6 +740,21 @@ static void stream_complete(libxl__egc *egc,
         return;
     }
 
+    if (stream->in_checkpoint_state) {
+        assert(rc);
+
+        /*
+         * If an error is encountered while in a checkpoint, pass it
+         * back to libxc.  The failure will come back around to us via
+         * 1. normal stream
+         *    libxl__xc_domain_restore_done()
+         * 2. back_channel stream
+         *    libxl__stream_read_abort()
+         */
+        checkpoint_state_done(egc, stream, rc);
+        return;
+    }
+
     stream_done(egc, stream, rc);
 }
 
@@ -743,6 +785,7 @@ static void stream_done(libxl__egc *egc,
 
     assert(stream->running);
     assert(!stream->in_checkpoint);
+    assert(!stream->in_checkpoint_state);
     stream->running = false;
 
     if (stream->incoming_record)
@@ -762,7 +805,19 @@ static void stream_done(libxl__egc *egc,
     LIBXL_STAILQ_FOREACH_SAFE(rec, &stream->record_queue, entry, trec)
         free_record(rec);
 
-    check_all_finished(egc, stream, rc);
+    if (!stream->back_channel) {
+        /*
+         * 1. In stream_done(), stream->running is set to false, so
+         *    the stream itself is not in use.
+         * 2. Read stream is a back channel stream, this means it is
+         *    only used by primary(save side) to read records sent by
+         *    secondary(restore side), so it doesn't have restore helper.
+         * 3. Back channel stream doesn't support legacy stream, so
+         *    there is no conversion helper.
+         * So we don't need invoke check_all_finished here
+         */
+        check_all_finished(egc, stream, rc);
+    }
 }
 
 void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
@@ -862,7 +917,30 @@ static void check_all_finished(libxl__egc *egc,
         libxl__conversion_helper_inuse(&stream->chs))
         return;
 
-    stream->completion_callback(egc, stream, stream->rc);
+    if (stream->completion_callback)
+        /* back channel stream doesn't have completion_callback() */
+        stream->completion_callback(egc, stream, stream->rc);
+}
+
+/*----- Checkpoint state handlers -----*/
+
+void libxl__stream_read_checkpoint_state(libxl__egc *egc,
+                                         libxl__stream_read_state *stream)
+{
+    assert(stream->running);
+    assert(!stream->in_checkpoint);
+    assert(!stream->in_checkpoint_state);
+    stream->in_checkpoint_state = true;
+
+    setup_read_record(egc, stream);
+}
+
+static void checkpoint_state_done(libxl__egc *egc,
+                                  libxl__stream_read_state *stream, int rc)
+{
+    assert(stream->in_checkpoint_state);
+    stream->in_checkpoint_state = false;
+    stream->checkpoint_callback(egc, stream, rc);
 }
 
 /*
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (10 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 11/26] tools/libxl: add back channel support to read stream Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-30 14:07   ` Ian Jackson
  2016-03-25  6:44 ` [PATCH v13 13/26] libxl_internal: move stream read manipulations to right place Changlong Xie
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send CHECKPOINT_SVM_READY to master.
   b. If it is not the first resume, call libxl__checkpoint_devices_preresume().
   c. If it is the first resume(resume right after live migration),
      - call libxl__xc_domain_restore_done() to build the secondary vm.
      - enable secondary vm's logdirty.
      - call libxl__domain_resume() to resume secondary vm.
      - call libxl__checkpoint_devices_setup() to setup checkpoint devices.
   d. Send CHECKPOINT_SVM_RESUMED to master.
2. Wait a new checkpoint
   a. Call libxl__checkpoint_devices_commit().
   b. Read CHECKPOINT_NEW from master.
3. Suspend secondary vm
   a. Suspend secondary vm.
   b. Call libxl__checkpoint_devices_postsuspend().
   c. Send CHECKPOINT_SVM_SUSPENDED to master.
4. Checkpoint
   a. Read emulator xenstore data and emulator context
   b. REC_TYPE_CHECKPOINT_END

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     |   20 +
 tools/libxc/xc_sr_save.c           |    3 +-
 tools/libxl/Makefile               |    1 +
 tools/libxl/libxl_colo.h           |   55 ++
 tools/libxl/libxl_colo_restore.c   | 1029 ++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_create.c         |   45 ++
 tools/libxl/libxl_internal.h       |   10 +-
 tools/libxl/libxl_save_callout.c   |    6 +-
 tools/libxl/libxl_save_msgs_gen.pl |   11 +-
 tools/libxl/libxl_stream_read.c    |   12 +
 tools/libxl/libxl_types.idl        |    1 +
 11 files changed, 1180 insertions(+), 13 deletions(-)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_restore.c

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index b4f4bfb..3193d0f 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -78,6 +78,7 @@ struct save_callbacks {
 typedef enum {
     XC_MIG_STREAM_NONE, /* plain stream */
     XC_MIG_STREAM_REMUS,
+    XC_MIG_STREAM_COLO,
 } xc_migration_stream_t;
 
 /**
@@ -97,6 +98,16 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
+    /* Called after a new checkpoint to suspend the guest.
+     */
+    int (*suspend)(void* data);
+
+    /* Called after the secondary vm is ready to resume.
+     * Callback function resumes the guest & the device model,
+     * returns to xc_domain_restore.
+     */
+    int (*postcopy)(void* data);
+
     /* A checkpoint record has been found in the stream.
      * returns: */
 #define XGR_CHECKPOINT_ERROR    0 /* Terminate processing */
@@ -104,6 +115,15 @@ struct restore_callbacks {
 #define XGR_CHECKPOINT_FAILOVER 2 /* Failover and resume VM */
     int (*checkpoint)(void* data);
 
+    /*
+     * Called after the checkpoint callback.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*wait_checkpoint)(void* data);
+
     /* to be provided as the last argument to each callback function */
     void* data;
 };
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1ccdbbb..d3d95d4 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -846,7 +846,8 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
 
     /* If altering migration_stream update this assert too. */
     assert(stream_type == XC_MIG_STREAM_NONE ||
-           stream_type == XC_MIG_STREAM_REMUS);
+           stream_type == XC_MIG_STREAM_REMUS ||
+           stream_type == XC_MIG_STREAM_COLO);
 
     /*
      * TODO: Find some time to better tweak the live migration algorithm.
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 8fa7b87..35a07a7 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -65,6 +65,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_colo_restore.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
new file mode 100644
index 0000000..f2b98cc
--- /dev/null
+++ b/tools/libxl/libxl_colo.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_COLO_H
+#define LIBXL_COLO_H
+
+struct libxl__ao;
+struct libxl__egc;
+
+enum {
+    LIBXL_COLO_SETUPED,
+    LIBXL_COLO_SUSPENDED,
+    LIBXL_COLO_RESUMED,
+};
+
+typedef struct libxl__domain_create_state libxl__domain_create_state;
+typedef void libxl__domain_create_cb(struct libxl__egc *egc,
+                                     libxl__domain_create_state *dcs,
+                                     int rc, uint32_t domid);
+
+typedef struct libxl__colo_restore_state libxl__colo_restore_state;
+typedef void libxl__colo_callback(struct libxl__egc *egc,
+                                  libxl__colo_restore_state *crs, int rc);
+
+struct libxl__colo_restore_state {
+    /* must set by caller of libxl__colo_(setup|teardown) */
+    struct libxl__ao *ao;
+    uint32_t domid;
+    int send_back_fd;
+    int recv_fd;
+    int hvm;
+    libxl__colo_callback *callback;
+
+    /* private, colo restore checkpoint state */
+    libxl__domain_create_cb *saved_cb;
+    void *crcs;
+};
+
+extern void libxl__colo_restore_setup(struct libxl__egc *egc,
+                                      libxl__colo_restore_state *crs);
+extern void libxl__colo_restore_teardown(struct libxl__egc *egc, void *dcs_void,
+                                         int ret, int retval, int errnoval);
+#endif
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
new file mode 100644
index 0000000..a8f74a7
--- /dev/null
+++ b/tools/libxl/libxl_colo_restore.c
@@ -0,0 +1,1029 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *         Yang Hongyang <hongyang.yang@easystack.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_sr_stream_format.h"
+
+typedef struct libxl__colo_restore_checkpoint_state libxl__colo_restore_checkpoint_state;
+struct libxl__colo_restore_checkpoint_state {
+    libxl__domain_suspend_state dsps;
+    libxl__logdirty_switch lds;
+    libxl__colo_restore_state *crs;
+    libxl__stream_write_state sws;
+    int status;
+    bool preresume;
+    /* used for teardown */
+    int teardown_devices;
+    int saved_rc;
+    char *state_file;
+
+    void (*callback)(libxl__egc *,
+                     libxl__colo_restore_checkpoint_state *,
+                     int);
+};
+
+static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    NULL,
+};
+
+/* ===================== colo: common functions ===================== */
+
+static void colo_enable_logdirty(libxl__colo_restore_state *crs, libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    EGC_GC;
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid,
+                          XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                          NULL, 0, NULL, 0, NULL) < 0) {
+        LOG(ERROR, "cannot enable secondary vm's logdirty");
+        lds->callback(egc, lds, ERROR_FAIL);
+        return;
+    }
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(egc, domid, 1, lds);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_disable_logdirty(libxl__colo_restore_state *crs,
+                                  libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    EGC_GC;
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_OFF,
+                          NULL, 0, NULL, 0, NULL) < 0)
+        LOG(WARN, "cannot disable secondary vm's logdirty");
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(egc, domid, 0, lds);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_resume_vm(libxl__egc *egc,
+                           libxl__colo_restore_checkpoint_state *crcs,
+                           int restore_device_model)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    EGC_GC;
+
+    if (!crs->saved_cb) {
+        /* TODO: sync mmu for hvm? */
+        if (restore_device_model) {
+            rc = libxl__qmp_restore(gc, crs->domid, crcs->state_file);
+            if (rc) {
+                LOG(ERROR, "cannot restore device model for secondary vm");
+                crcs->callback(egc, crcs, rc);
+                return;
+            }
+        }
+        rc = libxl__domain_resume(gc, crs->domid, 0);
+        if (rc)
+            LOG(ERROR, "cannot resume secondary vm");
+
+        crcs->callback(egc, crcs, rc);
+        return;
+    }
+
+    /*
+     * TODO: get store gfn and console gfn
+     *  We should call the callback restore_results in
+     *  xc_domain_restore() before resuming the guest.
+     */
+    libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
+
+    return;
+}
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    rc = 0;
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+}
+
+/* ================ colo: setup restore environment ================ */
+
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid);
+
+static int init_dsps(libxl__domain_suspend_state *dsps)
+{
+    int rc = ERROR_FAIL;
+    libxl_domain_type type;
+
+    STATE_AO_GC(dsps->ao);
+
+    libxl__xswait_init(&dsps->pvcontrol);
+    libxl__ev_evtchn_init(&dsps->guest_evtchn);
+    libxl__ev_xswatch_init(&dsps->guest_watch);
+    libxl__ev_time_init(&dsps->guest_timeout);
+
+    type = libxl__domain_type(gc, dsps->domid);
+    if (type == LIBXL_DOMAIN_TYPE_INVALID)
+        goto out;
+
+    dsps->type = type;
+
+    dsps->guest_evtchn.port = -1;
+    dsps->guest_evtchn_lockfd = -1;
+    dsps->guest_responded = 0;
+    dsps->dm_savefile = libxl__device_model_savefile(gc, dsps->domid);
+
+    /* Secondary vm is not created, so we cannot get evtchn port */
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. resume
+ * 2. wait checkpoint
+ * 3. suspend
+ * 4. checkpoint
+ */
+static void libxl__colo_restore_domain_resume_callback(void *data);
+static void libxl__colo_restore_domain_wait_checkpoint_callback(void *data);
+static void libxl__colo_restore_domain_suspend_callback(void *data);
+static void libxl__colo_restore_domain_checkpoint_callback(void *data);
+
+void libxl__colo_restore_setup(libxl__egc *egc,
+                               libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs;
+    int rc = ERROR_FAIL;
+
+    /* Convenience aliases */
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->srs.shs.callbacks.restore.a;
+    const int domid = crs->domid;
+
+    STATE_AO_GC(crs->ao);
+
+    GCNEW(crcs);
+    crs->crcs = crcs;
+    crcs->crs = crs;
+
+    /* setup dsps */
+    crcs->dsps.ao = ao;
+    crcs->dsps.domid = domid;
+    if (init_dsps(&crcs->dsps))
+        goto out;
+
+    callbacks->postcopy = libxl__colo_restore_domain_resume_callback;
+    callbacks->wait_checkpoint = libxl__colo_restore_domain_wait_checkpoint_callback;
+    callbacks->suspend = libxl__colo_restore_domain_suspend_callback;
+    callbacks->checkpoint = libxl__colo_restore_domain_checkpoint_callback;
+
+    /*
+     * Secondary vm is running in colo mode, so we need to call
+     * libxl__xc_domain_restore_done() to create secondary vm.
+     * But we will exit in domain_create_cb(). So replace the
+     * callback here.
+     */
+    crs->saved_cb = dcs->callback;
+    dcs->callback = libxl__colo_domain_create_cb;
+    crcs->state_file = GCSPRINTF(LIBXL_DEVICE_MODEL_RESTORE_FILE".%d", domid);
+    crcs->status = LIBXL_COLO_SETUPED;
+
+    libxl__logdirty_init(&crcs->lds);
+    crcs->lds.ao = ao;
+
+    crcs->sws.fd = crs->send_back_fd;
+    crcs->sws.ao = ao;
+    crcs->sws.back_channel = true;
+
+    dcs->cds.concrete_data = crs;
+
+    libxl__stream_write_start(egc, &crcs->sws);
+
+    rc = 0;
+
+out:
+    crs->callback(egc, crs, rc);
+    return;
+}
+
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid)
+{
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    crcs->callback(egc, crcs, rc);
+}
+
+/* ================ colo: teardown restore environment ================ */
+
+static void colo_restore_teardown_devices_done(libxl__egc *egc,
+    libxl__checkpoint_devices_state *cds, int rc);
+static void do_failover(libxl__egc *egc, libxl__colo_restore_state *crs);
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc);
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc);
+static void libxl__colo_restore_teardown_done(libxl__egc *egc,
+                                              libxl__colo_restore_state *crs,
+                                              int rc);
+
+void libxl__colo_restore_teardown(libxl__egc *egc, void *dcs_void,
+                                  int ret, int retval, int errnoval)
+{
+    libxl__domain_create_state *dcs = dcs_void;
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+    int rc = 1;
+
+    /* convenience aliases */
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    EGC_GC;
+
+    if (ret == 0 && retval == 0)
+        rc = 0;
+
+    LOG(INFO, "%s", rc ? "colo fails" : "failover");
+
+    libxl__stream_write_abort(egc, &crcs->sws, 1);
+    if (crs->saved_cb) {
+        /* crcs->status is LIBXL_COLO_SETUPED */
+        dcs->srs.completion_callback = NULL;
+    }
+    libxl__xc_domain_restore_done(egc, dcs, ret, retval, errnoval);
+
+    crcs->saved_rc = rc;
+    if (!crcs->teardown_devices) {
+        colo_restore_teardown_devices_done(egc, &dcs->cds, 0);
+        return;
+    }
+
+    dcs->cds.callback = colo_restore_teardown_devices_done;
+    libxl__checkpoint_devices_teardown(egc, &dcs->cds);
+}
+
+static void colo_restore_teardown_devices_done(libxl__egc *egc,
+    libxl__checkpoint_devices_state *cds, int rc)
+{
+    libxl__colo_restore_state *crs = cds->concrete_data;
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    EGC_GC;
+
+    if (rc)
+        LOG(ERROR, "COLO: failed to teardown device for guest with domid %u,"
+            " rc %d", cds->domid, rc);
+
+    if (crcs->teardown_devices)
+        cleanup_device_subkind(cds);
+
+    rc = crcs->saved_rc;
+    if (!rc) {
+        crcs->callback = do_failover_done;
+        do_failover(egc, crs);
+        return;
+    }
+
+    libxl__colo_restore_teardown_done(egc, crs, rc);
+}
+
+static void do_failover(libxl__egc *egc, libxl__colo_restore_state *crs)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const int status = crcs->status;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    EGC_GC;
+
+    switch(status) {
+    case LIBXL_COLO_SETUPED:
+        /*
+         * We will come here only when reading emulator xenstore data or
+         * emulator context fails, and libxl__xc_domain_restore_done()
+         * is not called. In this case, the migration is not finished,
+         * so we cannot do failover.
+         */
+        LOG(ERROR, "migration fails");
+        crcs->callback(egc, crcs, ERROR_FAIL);
+        return;
+    case LIBXL_COLO_SUSPENDED:
+    case LIBXL_COLO_RESUMED:
+        /* disable logdirty first */
+        lds->callback = colo_disable_logdirty_done;
+        colo_disable_logdirty(crs, egc);
+        return;
+    default:
+        LOG(ERROR, "invalid status: %d", status);
+        crcs->callback(egc, crcs, ERROR_FAIL);
+    }
+}
+
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    EGC_GC;
+
+    if (rc)
+        LOG(ERROR, "cannot do failover");
+
+    libxl__colo_restore_teardown_done(egc, crs, rc);
+}
+
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    EGC_GC;
+
+    if (rc)
+        LOG(WARN, "cannot disable logdirty");
+
+    if (crcs->status == LIBXL_COLO_SUSPENDED) {
+        /*
+         * failover when reading state from master, so no need to
+         * call libxl__qmp_restore().
+         */
+        colo_resume_vm(egc, crcs, 0);
+        return;
+    }
+
+    /* If we cannot disable logdirty, we still can do failover */
+    crcs->callback(egc, crcs, 0);
+}
+
+static void libxl__colo_restore_teardown_done(libxl__egc *egc,
+                                              libxl__colo_restore_state *crs,
+                                              int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    EGC_GC;
+
+    /* convenience aliases */
+    const int domid = crs->domid;
+    const libxl_ctx *const ctx = libxl__gc_owner(gc);
+    xc_interface *const xch = ctx->xch;
+
+    if (!rc)
+        /* failover, no need to destroy the secondary vm */
+        goto out;
+
+    xc_domain_destroy(xch, domid);
+
+out:
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+
+    dcs->callback(egc, dcs, rc, crs->domid);
+}
+
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc);
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc);
+
+/* ======================== colo: checkpoint ======================= */
+
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. read emulator xenstore data
+ *  2. read emulator context
+ *  3. REC_TYPE_CHECKPOINT_END
+ */
+static void libxl__colo_restore_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    crcs->callback = NULL;
+    dcs->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_start_checkpoint(shs->egc, &dcs->srs);
+}
+
+/* ===================== colo: resume secondary vm ===================== */
+
+/*
+ * Do the following things when resuming secondary vm the first time:
+ *  1. resume secondary vm
+ *  2. enable log dirty
+ *  3. setup checkpoint devices
+ *  4. write CHECKPOINT_SVM_READY
+ *  5. unpause secondary vm
+ *  6. write CHECKPOINT_SVM_RESUMED
+ *
+ * Do the following things when resuming secondary vm:
+ *  1. write CHECKPOINT_SVM_READY
+ *  2. resume secondary vm
+ *  3. write CHECKPOINT_SVM_RESUMED
+ */
+static void colo_send_svm_ready(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs);
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc);
+static void colo_restore_preresume_cb(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+static void colo_restore_resume_vm(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc);
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int retval);
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc);
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc);
+static void colo_setup_checkpoint_devices(libxl__egc *egc,
+                                          libxl__colo_restore_state *crs);
+static void colo_restore_setup_cds_done(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc);
+static void colo_unpause_svm(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state *crcs);
+
+static void libxl__colo_restore_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    if (crcs->teardown_devices)
+        colo_send_svm_ready(shs->egc, crcs);
+    else
+        colo_restore_resume_vm(shs->egc, crcs);
+}
+
+static void colo_send_svm_ready(libxl__egc *egc,
+                               libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl_sr_checkpoint_state srcs = { .id = CHECKPOINT_SVM_READY };
+
+    crcs->callback = colo_send_svm_ready_done;
+    crcs->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_checkpoint_state(egc, &crcs->sws, &srcs);
+}
+
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &dcs->cds;
+
+    if (!crcs->preresume) {
+        crcs->preresume = true;
+        colo_unpause_svm(egc, crcs);
+        return;
+    }
+
+    cds->callback = colo_restore_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
+}
+
+static void colo_restore_preresume_cb(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc)
+{
+    libxl__colo_restore_state *crs = cds->concrete_data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    colo_restore_resume_vm(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_restore_resume_vm(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+
+    crcs->callback = colo_resume_vm_done;
+    colo_resume_vm(egc, crcs, 1);
+}
+
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "cannot resume secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_RESUMED;
+
+    /* avoid calling stream->completion_callback() more than once */
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+
+        dcs->srs.completion_callback = NULL;
+
+        lds->callback = colo_enable_logdirty_done;
+        colo_enable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl_sr_checkpoint_state srcs = { .id = CHECKPOINT_SVM_RESUMED };
+
+    crcs->callback = NULL;
+    crcs->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_checkpoint_state(egc, &crcs->sws, &srcs);
+}
+
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    EGC_GC;
+
+    if (rc) {
+        /*
+         * log-dirty already enabled? There's no test op,
+         * so attempt to disable then reenable it
+         */
+        lds->callback = colo_reenable_logdirty;
+        colo_disable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_setup_checkpoint_devices(egc, crs);
+}
+
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    lds->callback = colo_reenable_logdirty_done;
+    colo_enable_logdirty(crs, egc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    colo_setup_checkpoint_devices(egc, crcs->crs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+/*
+ * We cannot setup checkpoint devices in libxl__colo_restore_setup(),
+ * because the guest is not ready.
+ */
+static void colo_setup_checkpoint_devices(libxl__egc *egc,
+                                          libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &dcs->cds;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crs->ao);
+
+    /* TODO: disk/nic support */
+    cds->device_kind_flags = 0;
+    cds->callback = colo_restore_setup_cds_done;
+    cds->ao = ao;
+    cds->domid = crs->domid;
+    cds->ops = colo_restore_ops;
+
+    if (init_device_subkind(cds))
+        goto out;
+
+    crcs->teardown_devices = 1;
+
+    libxl__checkpoint_devices_setup(egc, cds);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_restore_setup_cds_done(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc)
+{
+    libxl__colo_restore_state *crs = cds->concrete_data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
+
+    colo_send_svm_ready(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_unpause_svm(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    const uint32_t domid = crcs->crs->domid;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    EGC_GC;
+
+    /* We have enabled secondary vm's logdirty, so we can unpause it now */
+    rc = libxl_domain_unpause(CTX, domid);
+    if (rc) {
+        LOG(ERROR, "cannot unpause secondary vm");
+        goto out;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+/* ===================== colo: wait new checkpoint ===================== */
+
+static void colo_restore_commit_cb(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc);
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__colo_restore_checkpoint_state *crcs,
+                                  int real_size);
+
+static void libxl__colo_restore_domain_wait_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &dcs->cds;
+
+    cds->callback = colo_restore_commit_cb;
+    libxl__checkpoint_devices_commit(shs->egc, cds);
+}
+
+static void colo_restore_commit_cb(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc)
+{
+    libxl__colo_restore_state *crs = cds->concrete_data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    crcs->callback = colo_stream_read_done;
+    dcs->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_checkpoint_state(egc, &dcs->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, 0);
+}
+
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__colo_restore_checkpoint_state *crcs,
+                                  int id)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok = 0;
+
+    EGC_GC;
+
+    if (id != CHECKPOINT_NEW) {
+        LOG(ERROR, "invalid section: %d", id);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, ok);
+}
+
+/* ===================== colo: suspend secondary vm ===================== */
+
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. suspend secondary vm
+ *  2. send CHECKPOINT_SVM_SUSPENDED
+ */
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dsps,
+                                 int ok);
+static void colo_restore_postsuspend_cb(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc);
+
+static void libxl__colo_restore_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    STATE_AO_GC(dcs->ao);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dsps = &crcs->dsps;
+
+    /* suspend secondary vm */
+    dsps->callback_common_done = colo_suspend_vm_done;
+
+    libxl__domain_suspend(shs->egc, dsps);
+}
+
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dsps,
+                                 int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(dsps, *crcs, dsps);
+    libxl__colo_restore_state *crs = crcs->crs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &dcs->cds;
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "cannot suspend secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_SUSPENDED;
+
+    cds->callback = colo_restore_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, !rc);
+}
+
+static void colo_restore_postsuspend_cb(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc)
+{
+    libxl__colo_restore_state *crs = cds->concrete_data;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    libxl_sr_checkpoint_state srcs = { .id = CHECKPOINT_SVM_SUSPENDED };
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    crcs->callback = NULL;
+    crcs->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_checkpoint_state(egc, &crcs->sws, &srcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, !rc);
+}
+
+/* ===================== colo: common callback ===================== */
+
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs =
+        CONTAINER_OF(stream, *crcs, sws);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok;
+
+    EGC_GC;
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "sending data fails");
+        ok = 2;
+        goto out;
+    }
+
+    if (!crcs->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    crcs->callback(egc, crcs, 0);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, ok);
+}
+
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(stream, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+    int ok;
+
+    EGC_GC;
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "reading data fails");
+        ok = 2;
+        goto out;
+    }
+
+    if (!crcs->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    /* rc contains the id */
+    crcs->callback(egc, crcs, rc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, ok);
+}
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 4d2b95c..c58dd7e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -985,6 +985,23 @@ static void domcreate_console_available(libxl__egc *egc,
                                         dcs->aop_console_how.for_event));
 }
 
+static void libxl__colo_restore_setup_done(libxl__egc *egc,
+                                           libxl__colo_restore_state *crs,
+                                           int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "colo restore setup fails: %d", rc);
+        domcreate_stream_done(egc, &dcs->srs, rc);
+        return;
+    }
+
+    libxl__stream_read_start(egc, &dcs->srs);
+}
+
 static void domcreate_bootloader_done(libxl__egc *egc,
                                       libxl__bootloader_state *bl,
                                       int rc)
@@ -998,6 +1015,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     const int restore_fd = dcs->restore_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
     const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    libxl_domain_build_info *const info = &d_config->b_info;
 
     if (rc) {
         domcreate_rebuild_done(egc, dcs, rc);
@@ -1026,6 +1045,22 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 
     /* Restore */
 
+    /* COLO only supports HVM now because it does not work very
+     * well with pv drivers:
+     * 1. We need to resume vm in the slow path. In this case we
+     *    need to disconnect/reconnect backend and frontend. It
+     *    will take too much time and the performance is very slow.
+     * 2. PV disk cannot reuse block replication that is implemented
+     *    in QEMU.
+     */
+    if (info->type != LIBXL_DOMAIN_TYPE_HVM &&
+        checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        LOG(ERROR, "COLO only supports HVM, unable to restore domain %d",
+            domid);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     rc = libxl__build_pre(gc, domid, d_config, state);
     if (rc)
         goto out;
@@ -1039,6 +1074,16 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 
     if (restore_fd >= 0) {
         switch (checkpointed_stream) {
+        case LIBXL_CHECKPOINTED_STREAM_COLO:
+            /* colo restore setup */
+            crs->ao = ao;
+            crs->domid = domid;
+            crs->send_back_fd = dcs->send_back_fd;
+            crs->recv_fd = restore_fd;
+            crs->hvm = (info->type == LIBXL_DOMAIN_TYPE_HVM);
+            crs->callback = libxl__colo_restore_setup_done;
+            libxl__colo_restore_setup(egc, crs);
+            break;
         case LIBXL_CHECKPOINTED_STREAM_REMUS:
             libxl__remus_restore_setup(egc, dcs);
             /* fall through */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1fafba8..83ac20a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -87,6 +87,8 @@
 #include "_libxl_types_internal.h"
 #include "_libxl_types_internal_json.h"
 
+#include "libxl_colo.h"
+
 #define LIBXL_INIT_TIMEOUT 10
 #define LIBXL_DESTROY_TIMEOUT 10
 #define LIBXL_HOTPLUG_TIMEOUT 40
@@ -3422,12 +3424,6 @@ _hidden int libxl__destroy_qdisk_backend(libxl__gc *gc, uint32_t domid);
 
 /*----- Domain creation -----*/
 
-typedef struct libxl__domain_create_state libxl__domain_create_state;
-
-typedef void libxl__domain_create_cb(libxl__egc *egc,
-                                     libxl__domain_create_state*,
-                                     int rc, uint32_t domid);
-
 /* State for manipulating a libxl migration v2 stream */
 typedef struct libxl__stream_read_state libxl__stream_read_state;
 
@@ -3510,6 +3506,8 @@ struct libxl__domain_create_state {
     /* private to domain_create */
     int guest_domid;
     libxl__domain_build_state build_state;
+    libxl__colo_restore_state crs;
+    libxl__checkpoint_devices_state cds;
     libxl__bootloader_state bl;
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index f15c235..2e6267d 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -68,7 +68,11 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     shs->ao = ao;
     shs->domid = domid;
     shs->recv_callback = libxl__srm_callout_received_restore;
-    shs->completion_callback = libxl__xc_domain_restore_done;
+    if (dcs->restore_params.checkpointed_stream ==
+        LIBXL_CHECKPOINTED_STREAM_COLO)
+        shs->completion_callback = libxl__colo_restore_teardown;
+    else
+        shs->completion_callback = libxl__xc_domain_restore_done;
     shs->caller_state = dcs;
     shs->need_results = 1;
 
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index d6d2967..cbb6ca1 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -23,14 +23,15 @@ our @msgs = (
                                                  STRING doing_what),
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
-    [  3, 'scxA',   "suspend", [] ],
-    [  4, 'scxA',   "postcopy", [] ],
+    [  3, 'srcxA',  "suspend", [] ],
+    [  4, 'srcxA',  "postcopy", [] ],
     [  5, 'srcxA',  "checkpoint", [] ],
-    [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
+    [  6, 'rcxA',   "wait_checkpoint", [] ],
+    [  7, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
-    [  7, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
                                               'unsigned long', 'console_mfn'] ],
-    [  8, 'srW',    "complete",              [qw(int retval
+    [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
 );
 
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 302ae53..9659051 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -850,6 +850,18 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
      */
     if (libxl__stream_read_inuse(stream)) {
         switch (checkpointed_stream) {
+        case LIBXL_CHECKPOINTED_STREAM_COLO:
+            if (stream->completion_callback) {
+                /*
+                 * restore, just build the secondary vm, don't close
+                 * the stream
+                 */
+                stream->completion_callback(egc, stream, 0);
+            } else {
+                /* failover, just close the stream */
+                stream_complete(egc, stream, 0);
+            }
+            break;
         case LIBXL_CHECKPOINTED_STREAM_REMUS:
             /*
              * Failover from primary. Domain state is currently at a
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 59b183c..4717517 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -233,6 +233,7 @@ libxl_hdtype = Enumeration("hdtype", [
 libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
     (0, "NONE"),
     (1, "REMUS"),
+    (2, "COLO"),
     ])
 
 #
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 13/26] libxl_internal: move stream read manipulations to right place
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (11 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 14/26] primary vm suspend/resume/checkpoint code Changlong Xie
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

No functional changes and this cleanup will make the later
patch called "primary vm suspend/resume/checkpoint code" not
too complicated.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_internal.h | 132 +++++++++++++++++++++----------------------
 1 file changed, 66 insertions(+), 66 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 83ac20a..c7f9b97 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3035,6 +3035,72 @@ static inline bool libxl__conversion_helper_inuse
                     (const libxl__conversion_helper_state *chs)
 { return libxl__ev_child_inuse(&chs->child); }
 
+/* State for reading a libxl migration v2 stream */
+typedef struct libxl__stream_read_state libxl__stream_read_state;
+
+typedef struct libxl__sr_record_buf {
+    /* private to stream read helper */
+    LIBXL_STAILQ_ENTRY(struct libxl__sr_record_buf) entry;
+    libxl__sr_rec_hdr hdr;
+    void *body; /* iff hdr.length != 0 */
+} libxl__sr_record_buf;
+
+struct libxl__stream_read_state {
+    /* filled by the user */
+    libxl__ao *ao;
+    libxl__domain_create_state *dcs;
+    int fd;
+    bool legacy;
+    bool back_channel;
+    void (*completion_callback)(libxl__egc *egc,
+                                libxl__stream_read_state *srs,
+                                int rc);
+    void (*checkpoint_callback)(libxl__egc *egc,
+                                libxl__stream_read_state *srs,
+                                int rc);
+    /* Private */
+    int rc;
+    bool running;
+    bool in_checkpoint;
+    bool sync_teardown; /* Only used to coordinate shutdown on error path. */
+    bool in_checkpoint_state;
+    libxl__save_helper_state shs;
+    libxl__conversion_helper_state chs;
+
+    /* Main stream-reading data. */
+    libxl__datacopier_state dc; /* Only used when reading a record */
+    libxl__sr_hdr hdr;
+    LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue; /* NOGC */
+    enum {
+        SRS_PHASE_NORMAL,
+        SRS_PHASE_BUFFERING,
+        SRS_PHASE_UNBUFFERING,
+    } phase;
+    bool recursion_guard;
+
+    /* Only used while actively reading a record from the stream. */
+    libxl__sr_record_buf *incoming_record; /* NOGC */
+
+    /* Both only used when processing an EMULATOR record. */
+    libxl__datacopier_state emu_dc;
+    libxl__carefd *emu_carefd;
+};
+
+_hidden void libxl__stream_read_init(libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_start(libxl__egc *egc,
+                                      libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
+                                                 libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_checkpoint_state(libxl__egc *egc,
+                                                 libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_abort(libxl__egc *egc,
+                                      libxl__stream_read_state *stream, int rc);
+static inline bool
+libxl__stream_read_inuse(const libxl__stream_read_state *stream)
+{
+    return stream->running;
+}
+
 
 /*----- Domain suspend (save) state structure -----*/
 /*
@@ -3424,72 +3490,6 @@ _hidden int libxl__destroy_qdisk_backend(libxl__gc *gc, uint32_t domid);
 
 /*----- Domain creation -----*/
 
-/* State for manipulating a libxl migration v2 stream */
-typedef struct libxl__stream_read_state libxl__stream_read_state;
-
-typedef struct libxl__sr_record_buf {
-    /* private to stream read helper */
-    LIBXL_STAILQ_ENTRY(struct libxl__sr_record_buf) entry;
-    libxl__sr_rec_hdr hdr;
-    void *body; /* iff hdr.length != 0 */
-} libxl__sr_record_buf;
-
-struct libxl__stream_read_state {
-    /* filled by the user */
-    libxl__ao *ao;
-    libxl__domain_create_state *dcs;
-    int fd;
-    bool legacy;
-    bool back_channel;
-    void (*completion_callback)(libxl__egc *egc,
-                                libxl__stream_read_state *srs,
-                                int rc);
-    void (*checkpoint_callback)(libxl__egc *egc,
-                                libxl__stream_read_state *srs,
-                                int rc);
-    /* Private */
-    int rc;
-    bool running;
-    bool in_checkpoint;
-    bool sync_teardown; /* Only used to coordinate shutdown on error path. */
-    bool in_checkpoint_state;
-    libxl__save_helper_state shs;
-    libxl__conversion_helper_state chs;
-
-    /* Main stream-reading data. */
-    libxl__datacopier_state dc; /* Only used when reading a record */
-    libxl__sr_hdr hdr;
-    LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue; /* NOGC */
-    enum {
-        SRS_PHASE_NORMAL,
-        SRS_PHASE_BUFFERING,
-        SRS_PHASE_UNBUFFERING,
-    } phase;
-    bool recursion_guard;
-
-    /* Only used while actively reading a record from the stream. */
-    libxl__sr_record_buf *incoming_record; /* NOGC */
-
-    /* Both only used when processing an EMULATOR record. */
-    libxl__datacopier_state emu_dc;
-    libxl__carefd *emu_carefd;
-};
-
-_hidden void libxl__stream_read_init(libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_start(libxl__egc *egc,
-                                      libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
-                                                 libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_checkpoint_state(libxl__egc *egc,
-                                                 libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_abort(libxl__egc *egc,
-                                      libxl__stream_read_state *stream, int rc);
-static inline bool
-libxl__stream_read_inuse(const libxl__stream_read_state *stream)
-{
-    return stream->running;
-}
-
 
 struct libxl__domain_create_state {
     /* filled in by user */
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 14/26] primary vm suspend/resume/checkpoint code
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (12 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 13/26] libxl_internal: move stream read manipulations to right place Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-30 14:10   ` Ian Jackson
  2016-03-25  6:44 ` [PATCH v13 15/26] libxc/restore: support COLO restore Changlong Xie
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read CHECKPOINT_SVM_SUSPENDED sent by secondary
2. Checkpoint
   a. Write emulator xenstore data and emulator context
   b. Write checkpoint end record
3. Resume primary vm
   a. Read CHECKPOINT_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read CHECKPOINT_SVM_RESUMED from slave
4. Wait a new checkpoint
   a. Wait a new checkpoint(not implemented)
   b. Send CHECKPOINT_NEW to slave

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     |   9 +
 tools/libxl/Makefile               |   2 +-
 tools/libxl/libxl.c                |   5 +-
 tools/libxl/libxl_colo.h           |   6 +
 tools/libxl/libxl_colo_save.c      | 568 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom_save.c       |   7 +-
 tools/libxl/libxl_internal.h       |  25 +-
 tools/libxl/libxl_save_msgs_gen.pl |   2 +-
 tools/libxl/libxl_types.idl        |   1 +
 9 files changed, 616 insertions(+), 9 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_save.c

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 3193d0f..8ea5a3c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -68,6 +68,15 @@ struct save_callbacks {
      * 1: take another checkpoint */
     int (*checkpoint)(void* data);
 
+    /*
+     * Called after the checkpoint callback.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*wait_checkpoint)(void* data);
+
     /* Enable qemu-dm logging dirty pages to xen */
     int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 35a07a7..c5ef3f0 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -65,7 +65,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
-LIBXL_OBJS-y += libxl_colo_restore.o
+LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6bc46cb..272c6a5 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -880,7 +880,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Point of no return */
-    libxl__remus_setup(egc, &dss->rs);
+    if (libxl_defbool_val(info->colo))
+        libxl__colo_save_setup(egc, &dss->css);
+    else
+        libxl__remus_setup(egc, &dss->rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index f2b98cc..feec7f1 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -18,6 +18,7 @@
 
 struct libxl__ao;
 struct libxl__egc;
+struct libxl__colo_save_state;
 
 enum {
     LIBXL_COLO_SETUPED,
@@ -52,4 +53,9 @@ extern void libxl__colo_restore_setup(struct libxl__egc *egc,
                                       libxl__colo_restore_state *crs);
 extern void libxl__colo_restore_teardown(struct libxl__egc *egc, void *dcs_void,
                                          int ret, int retval, int errnoval);
+extern void libxl__colo_save_setup(struct libxl__egc *egc,
+                                   struct libxl__colo_save_state *css);
+extern void libxl__colo_save_teardown(struct libxl__egc *egc,
+                                      struct libxl__colo_save_state *css,
+                                      int rc);
 #endif
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
new file mode 100644
index 0000000..cca6bde
--- /dev/null
+++ b/tools/libxl/libxl_colo_save.c
@@ -0,0 +1,568 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *         Yang Hongyang <hongyang.yang@easystack.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    NULL,
+};
+
+/* ================= helper functions ================= */
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    rc = 0;
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+}
+
+/* ================= colo: setup save environment ================= */
+
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_devices_state *cds,
+                                 int rc);
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc);
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. suspend
+ * 2. checkpoint
+ * 3. resume
+ * 4. wait checkpoint
+ */
+static void libxl__colo_save_domain_suspend_callback(void *data);
+static void libxl__colo_save_domain_checkpoint_callback(void *data);
+static void libxl__colo_save_domain_resume_callback(void *data);
+static void libxl__colo_save_domain_wait_checkpoint_callback(void *data);
+
+void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__srm_save_autogen_callbacks *const callbacks =
+        &dss->sws.shs.callbacks.save.a;
+
+    STATE_AO_GC(dss->ao);
+
+    if (dss->type != LIBXL_DOMAIN_TYPE_HVM) {
+        LOG(ERROR, "COLO only supports hvm now");
+        goto out;
+    }
+
+    css->send_fd = dss->fd;
+    css->recv_fd = dss->recv_fd;
+    css->svm_running = false;
+
+    /* TODO: disk/nic support */
+    cds->device_kind_flags = 0;
+    cds->ops = colo_ops;
+    cds->callback = colo_save_setup_done;
+    cds->ao = ao;
+    cds->domid = dss->domid;
+    cds->concrete_data = css;
+
+    css->srs.ao = ao;
+    css->srs.fd = css->recv_fd;
+    css->srs.back_channel = true;
+    libxl__stream_read_start(egc, &css->srs);
+
+    if (init_device_subkind(cds))
+        goto out;
+
+    callbacks->suspend = libxl__colo_save_domain_suspend_callback;
+    callbacks->checkpoint = libxl__colo_save_domain_checkpoint_callback;
+    callbacks->postcopy = libxl__colo_save_domain_resume_callback;
+    callbacks->wait_checkpoint = libxl__colo_save_domain_wait_checkpoint_callback;
+
+    libxl__checkpoint_devices_setup(egc, &dss->cds);
+
+    return;
+
+out:
+    dss->callback(egc, dss, ERROR_FAIL);
+}
+
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_devices_state *cds,
+                                 int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    EGC_GC;
+
+    if (!rc) {
+        libxl__domain_save(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+        dss->domid);
+    cds->callback = colo_save_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, cds);
+}
+
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    STATE_AO_GC(cds->ao);
+
+    if (rc)
+        LOG(ERROR, "COLO: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", cds->domid, rc);
+
+    cleanup_device_subkind(cds);
+    dss->callback(egc, dss, rc);
+}
+
+/* ================= colo: teardown save environment ================= */
+
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_devices_state *cds,
+                               int rc);
+
+void libxl__colo_save_teardown(libxl__egc *egc,
+                               libxl__colo_save_state *css,
+                               int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    LOG(WARN, "COLO: Domain suspend terminated with rc %d,"
+        " teardown COLO devices...", rc);
+
+    libxl__stream_read_abort(egc, &css->srs, 1);
+
+    dss->cds.callback = colo_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->cds);
+    return;
+}
+
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_devices_state *cds,
+                               int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    cleanup_device_subkind(cds);
+    dss->callback(egc, dss, rc);
+}
+
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc);
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc);
+
+/* ===================== colo: suspend primary vm ===================== */
+
+static void colo_read_svm_suspended_done(libxl__egc *egc,
+                                         libxl__colo_save_state *css,
+                                         int id);
+/*
+ * Do the following things when suspending primary vm:
+ * 1. suspend primary vm
+ * 2. do postsuspend
+ * 3. read CHECKPOINT_SVM_SUSPENDED
+ * 4. read secondary vm's dirty pages
+ */
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dsps,
+                                         int ok);
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds,
+                                int rc);
+
+static void libxl__colo_save_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *dsps = &dss->dsps;
+
+    dsps->callback_common_done = colo_suspend_primary_vm_done;
+    libxl__domain_suspend(egc, dsps);
+}
+
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dsps,
+                                         int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(dsps, *dss, dsps);
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "cannot suspend primary vm");
+        goto out;
+    }
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+
+    cds->callback = colo_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+    return;
+
+out:
+    dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds,
+                                int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    if (!css->svm_running) {
+        rc = 0;
+        goto out;
+    }
+
+    /*
+     * read CHECKPOINT_SVM_SUSPENDED
+     */
+    css->callback = colo_read_svm_suspended_done;
+    css->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_checkpoint_state(egc, &css->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+static void colo_read_svm_suspended_done(libxl__egc *egc,
+                                         libxl__colo_save_state *css,
+                                         int id)
+{
+    int ok = 0;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (id != CHECKPOINT_SVM_SUSPENDED) {
+        LOG(ERROR, "invalid section: %d, expected: %d", id,
+            CHECKPOINT_SVM_SUSPENDED);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
+
+/* ===================== colo: send tailbuf ========================== */
+
+static void libxl__colo_save_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    /* write emulator xenstore data, emulator context, and checkpoint end */
+    css->callback = NULL;
+    dss->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_start_checkpoint(shs->egc, &dss->sws);
+}
+
+/* ===================== colo: resume primary vm ===================== */
+
+/*
+ * Do the following things when resuming primary vm:
+ *  1. read CHECKPOINT_SVM_READY
+ *  2. do preresume
+ *  3. resume primary vm
+ *  4. read CHECKPOINT_SVM_RESUMED
+ */
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css,
+                                     int id);
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_devices_state *cds,
+                              int rc);
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css,
+                                       int id);
+
+static void libxl__colo_save_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    EGC_GC;
+
+    /* read CHECKPOINT_SVM_READY */
+    css->callback = colo_read_svm_ready_done;
+    css->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_checkpoint_state(egc, &css->srs);
+}
+
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css,
+                                     int id)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (id != CHECKPOINT_SVM_READY) {
+        LOG(ERROR, "invalid section: %d, expected: %d", id,
+            CHECKPOINT_SVM_READY);
+        goto out;
+    }
+
+    css->svm_running = true;
+    dss->cds.callback = colo_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, &dss->cds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_devices_state *cds,
+                              int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    /* Resumes the domain and the device model */
+    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) {
+        LOG(ERROR, "cannot resume primary vm");
+        goto out;
+    }
+
+    /* read CHECKPOINT_SVM_RESUMED */
+    css->callback = colo_read_svm_resumed_done;
+    css->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_checkpoint_state(egc, &css->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css,
+                                       int id)
+{
+    int ok = 0;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (id != CHECKPOINT_SVM_RESUMED) {
+        LOG(ERROR, "invalid section: %d, expected: %d", id,
+            CHECKPOINT_SVM_RESUMED);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
+
+/* ===================== colo: wait new checkpoint ===================== */
+
+/*
+ * Do the following things:
+ * 1. do commit
+ * 2. wait for a new checkpoint
+ * 3. write CHECKPOINT_NEW
+ */
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_devices_state *cds,
+                                  int rc);
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+
+static void libxl__colo_save_domain_wait_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+    libxl__egc *egc = dss->sws.shs.egc;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+
+    cds->callback = colo_device_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
+}
+
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_devices_state *cds,
+                                  int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    /* TODO: wait a new checkpoint */
+    colo_start_new_checkpoint(egc, cds, 0);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc)
+{
+    libxl__colo_save_state *css = cds->concrete_data;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    libxl_sr_checkpoint_state srcs = { .id = CHECKPOINT_NEW };
+
+    if (rc)
+        goto out;
+
+    /* write CHECKPOINT_NEW */
+    css->callback = NULL;
+    dss->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_checkpoint_state(egc, &dss->sws, &srcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+/* ===================== colo: common callback ===================== */
+
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(stream, *dss, sws);
+    int ok;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    EGC_GC;
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "sending data fails");
+        ok = 0;
+        goto out;
+    }
+
+    if (!css->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    css->callback(egc, css, 0);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
+
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(stream, *css, srs);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    int ok;
+
+    EGC_GC;
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "reading data fails");
+        ok = 0;
+        goto out;
+    }
+
+    if (!css->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    /* rc contains the id */
+    css->callback(egc, css, rc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index cd324bb..821f862 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -438,12 +438,15 @@ static void domain_save_done(libxl__egc *egc,
 
     if (dss->remus) {
         /*
-         * With Remus, if we reach this point, it means either
+         * With Remus/COLO, if we reach this point, it means either
          * backup died or some network error occurred preventing us
          * from sending checkpoints. Teardown the network buffers and
          * release netlink resources.  This is an async op.
          */
-        libxl__remus_teardown(egc, &dss->rs, rc);
+        if (libxl_defbool_val(dss->remus->colo))
+            libxl__colo_save_teardown(egc, &dss->css, rc);
+        else
+            libxl__remus_teardown(egc, &dss->rs, rc);
         return;
     }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c7f9b97..b3bb479 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2820,7 +2820,7 @@ typedef struct libxl__save_helper_state {
 /*
  * The abstract checkpoint device layer exposes a common
  * set of API to [external] libxl for manipulating devices attached to
- * a guest protected by Remus. The device layer also exposes a set of
+ * a guest protected by Remus/COLO. The device layer also exposes a set of
  * [internal] interfaces that every device type must implement.
  *
  * The following API are exposed to libxl:
@@ -2838,7 +2838,7 @@ typedef struct libxl__save_helper_state {
  *  +libxl__checkpoint_devices_commit
  *
  * Each device type needs to implement the interfaces specified in
- * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
+ * the libxl__checkpoint_device_instance_ops if it wishes to support Remus/COLO.
  *
  * The high-level control flow through the checkpoint device layer is shown
  * below:
@@ -2858,7 +2858,7 @@ typedef struct libxl__checkpoint_device_instance_ops libxl__checkpoint_device_in
 
 /*
  * Interfaces to be implemented by every device subkind that wishes to
- * support Remus. Functions must be implemented unless otherwise
+ * support Remus/COLO. Functions must be implemented unless otherwise
  * stated. Many of these functions are asynchronous. They call
  * dev->aodev.callback when done.  The actual implementations may be
  * synchronous and call dev->aodev.callback directly (as the last
@@ -3177,6 +3177,18 @@ libxl__stream_write_inuse(const libxl__stream_write_state *stream)
     return stream->running;
 }
 
+/*----- colo related state structure -----*/
+typedef struct libxl__colo_save_state libxl__colo_save_state;
+struct libxl__colo_save_state {
+    int send_fd;
+    int recv_fd;
+
+    /* private */
+    libxl__stream_read_state srs;
+    void (*callback)(libxl__egc *, libxl__colo_save_state *, int);
+    bool svm_running;
+};
+
 typedef struct libxl__logdirty_switch {
     /* Set by caller of libxl__domain_common_switch_qemu_logdirty */
     libxl__ao *ao;
@@ -3235,7 +3247,12 @@ struct libxl__domain_save_state {
     int hvm;
     int xcflags;
     libxl__domain_suspend_state dsps;
-    libxl__remus_state rs;
+    union {
+        /* for Remus */
+        libxl__remus_state rs;
+        /* for COLO */
+        libxl__colo_save_state css;
+    };
     libxl__checkpoint_devices_state cds;
     libxl__stream_write_state sws;
     libxl__logdirty_switch logdirty;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index cbb6ca1..6016706 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -26,7 +26,7 @@ our @msgs = (
     [  3, 'srcxA',  "suspend", [] ],
     [  4, 'srcxA',  "postcopy", [] ],
     [  5, 'srcxA',  "checkpoint", [] ],
-    [  6, 'rcxA',   "wait_checkpoint", [] ],
+    [  6, 'srcxA',  "wait_checkpoint", [] ],
     [  7, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
     [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4717517..95efd82 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -810,6 +810,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("netbuf",       libxl_defbool),
     ("netbufscript", string),
     ("diskbuf",      libxl_defbool),
+    ("colo",         libxl_defbool)
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 15/26] libxc/restore: support COLO restore
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (13 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 14/26] primary vm suspend/resume/checkpoint code Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 16/26] libxc/save: support COLO save Changlong Xie
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

a. call callbacks resume/checkpoint/suspend while secondary vm
   status is consistent with primary
b. send dirty pfn list to primary when checkpoint under colo
c. send store gfn and console gfn to xl before resuming secondary vm

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/include/xenguest.h     |   8 ++
 tools/libxc/xc_sr_common.h         |   8 +-
 tools/libxc/xc_sr_restore.c        | 181 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_restore.c   |   5 -
 tools/libxl/libxl_create.c         |   3 +
 tools/libxl/libxl_save_msgs_gen.pl |   4 +-
 6 files changed, 200 insertions(+), 9 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 8ea5a3c..40902ee 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -133,6 +133,14 @@ struct restore_callbacks {
      */
     int (*wait_checkpoint)(void* data);
 
+    /*
+     * callback to send store gfn and console gfn to xl
+     * if we want to resume vm before xc_domain_save()
+     * exits.
+     */
+    void (*restore_results)(xen_pfn_t store_gfn, xen_pfn_t console_gfn,
+                            void *data);
+
     /* to be provided as the last argument to each callback function */
     void* data;
 };
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index c990664..cf32ab8 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -214,6 +214,10 @@ struct xc_sr_context
             struct xc_sr_restore_ops ops;
             struct restore_callbacks *callbacks;
 
+            int send_back_fd;
+            unsigned long p2m_size;
+            xc_hypercall_buffer_t dirty_bitmap_hbuf;
+
             /* From Image Header. */
             uint32_t format_version;
 
@@ -222,13 +226,13 @@ struct xc_sr_context
             uint32_t guest_page_size;
 
             /* Plain VM, or checkpoints over time. */
-            bool checkpointed;
+            int checkpointed;
 
             /* Currently buffering records between a checkpoint */
             bool buffer_all_records;
 
 /*
- * With Remus, we buffer the records sent by the primary at checkpoint,
+ * With Remus/COLO, we buffer the records sent by the primary at checkpoint,
  * in case the primary will fail, we can recover from the last
  * checkpoint state.
  * This should be enough for most of the cases because primary only send
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 3e4ca7f..728edbc 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -411,6 +411,92 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     return rc;
 }
 
+/*
+ * Send checkpoint dirty pfn list to primary.
+ */
+static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    int rc = -1;
+    unsigned count, written;
+    uint64_t i, *pfns = NULL;
+    struct iovec *iov = NULL;
+    xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size };
+    struct xc_sr_record rec =
+    {
+        .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
+    };
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->restore.dirty_bitmap_hbuf);
+
+    if ( xc_shadow_control(
+             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+             HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
+             NULL, 0, &stats) != ctx->restore.p2m_size )
+    {
+        PERROR("Failed to retrieve logdirty bitmap");
+        goto err;
+    }
+
+    for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
+    {
+        if ( test_bit(i, dirty_bitmap) )
+            count++;
+    }
+
+
+    pfns = malloc(count * sizeof(*pfns));
+    if ( !pfns )
+    {
+        ERROR("Unable to allocate %zu bytes of memory for dirty pfn list",
+              count * sizeof(*pfns));
+        goto err;
+    }
+
+    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
+    {
+        if ( !test_bit(i, dirty_bitmap) )
+            continue;
+
+        if ( written > count )
+        {
+            ERROR("Dirty pfn list exceed");
+            goto err;
+        }
+
+        pfns[written++] = i;
+    }
+
+    /* iovec[] for writev(). */
+    iov = malloc(3 * sizeof(*iov));
+    if ( !iov )
+    {
+        ERROR("Unable to allocate memory for sending dirty bitmap");
+        goto err;
+    }
+
+    rec.length = count * sizeof(*pfns);
+
+    iov[0].iov_base = &rec.type;
+    iov[0].iov_len = sizeof(rec.type);
+
+    iov[1].iov_base = &rec.length;
+    iov[1].iov_len = sizeof(rec.length);
+
+    iov[2].iov_base = pfns;
+    iov[2].iov_len = count * sizeof(*pfns);
+
+    if ( writev_exact(ctx->restore.send_back_fd, iov, 3) )
+    {
+        PERROR("Failed to write dirty bitmap to stream");
+        goto err;
+    }
+
+    rc = 0;
+ err:
+    return rc;
+}
+
 static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
@@ -460,6 +546,53 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
     else
         ctx->restore.buffer_all_records = true;
 
+    if ( ctx->restore.checkpointed == XC_MIG_STREAM_COLO )
+    {
+#define HANDLE_CALLBACK_RETURN_VALUE(ret)                   \
+    do {                                                    \
+        if ( ret == 1 )                                     \
+            rc = 0; /* Success */                           \
+        else                                                \
+        {                                                   \
+            if ( ret == 2 )                                 \
+                rc = BROKEN_CHANNEL;                        \
+            else                                            \
+                rc = -1; /* Some unspecified error */       \
+            goto err;                                       \
+        }                                                   \
+    } while (0)
+
+        /* COLO */
+
+        /* We need to resume guest */
+        rc = ctx->restore.ops.stream_complete(ctx);
+        if ( rc )
+            goto err;
+
+        ctx->restore.callbacks->restore_results(ctx->restore.xenstore_gfn,
+                                                ctx->restore.console_gfn,
+                                                ctx->restore.callbacks->data);
+
+        /* Resume secondary vm */
+        ret = ctx->restore.callbacks->postcopy(ctx->restore.callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+        /* Wait for a new checkpoint */
+        ret = ctx->restore.callbacks->wait_checkpoint(
+                                                ctx->restore.callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+        /* suspend secondary vm */
+        ret = ctx->restore.callbacks->suspend(ctx->restore.callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+#undef HANDLE_CALLBACK_RETURN_VALUE
+
+        rc = send_checkpoint_dirty_pfn_list(ctx);
+        if ( rc )
+            goto err;
+    }
+
  err:
     return rc;
 }
@@ -529,6 +662,21 @@ static int setup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
     int rc;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->restore.dirty_bitmap_hbuf);
+
+    if ( ctx->restore.checkpointed == XC_MIG_STREAM_COLO )
+    {
+        dirty_bitmap = xc_hypercall_buffer_alloc_pages(xch, dirty_bitmap,
+                                NRPAGES(bitmap_size(ctx->restore.p2m_size)));
+
+        if ( !dirty_bitmap )
+        {
+            ERROR("Unable to allocate memory for dirty bitmap");
+            rc = -1;
+            goto err;
+        }
+    }
 
     rc = ctx->restore.ops.setup(ctx);
     if ( rc )
@@ -562,10 +710,15 @@ static void cleanup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
     unsigned i;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->restore.dirty_bitmap_hbuf);
 
     for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
         free(ctx->restore.buffered_records[i].data);
 
+    if ( ctx->restore.checkpointed == XC_MIG_STREAM_COLO )
+        xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
+                                   NRPAGES(bitmap_size(ctx->restore.p2m_size)));
     free(ctx->restore.buffered_records);
     free(ctx->restore.populated_pfns);
     if ( ctx->restore.ops.cleanup(ctx) )
@@ -631,6 +784,15 @@ static int restore(struct xc_sr_context *ctx)
     } while ( rec.type != REC_TYPE_END );
 
  remus_failover:
+
+    if ( ctx->restore.checkpointed == XC_MIG_STREAM_COLO )
+    {
+        /* With COLO, we have already called stream_complete */
+        rc = 0;
+        IPRINTF("COLO Failover");
+        goto done;
+    }
+
     /*
      * With Remus, if we reach here, there must be some error on primary,
      * failover from the last checkpoint state.
@@ -667,6 +829,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       xc_migration_stream_t stream_type,
                       struct restore_callbacks *callbacks, int send_back_fd)
 {
+    xen_pfn_t nr_pfns;
     struct xc_sr_context ctx =
         {
             .xch = xch,
@@ -680,11 +843,21 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     ctx.restore.xenstore_domid = store_domid;
     ctx.restore.checkpointed = stream_type;
     ctx.restore.callbacks = callbacks;
+    ctx.restore.send_back_fd = send_back_fd;
 
     /* Sanity checks for callbacks. */
     if ( stream_type )
         assert(callbacks->checkpoint);
 
+    if ( ctx.restore.checkpointed == XC_MIG_STREAM_COLO )
+    {
+        /* this is COLO restore */
+        assert(callbacks->suspend &&
+               callbacks->postcopy &&
+               callbacks->wait_checkpoint &&
+               callbacks->restore_results);
+    }
+
     DPRINTF("fd %d, dom %u, hvm %u, pae %u, superpages %d"
             ", stream_type %d", io_fd, dom, hvm, pae,
             superpages, stream_type);
@@ -706,6 +879,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if ( read_headers(&ctx) )
         return -1;
 
+    if ( xc_domain_nr_gpfns(xch, dom, &nr_pfns) < 0 )
+    {
+        PERROR("Unable to obtain the guest p2m size");
+        return -1;
+    }
+
+    ctx.restore.p2m_size = nr_pfns;
+
     if ( ctx.dominfo.hvm )
     {
         ctx.restore.ops = restore_ops_x86_hvm;
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index a8f74a7..04b02d8 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -126,11 +126,6 @@ static void colo_resume_vm(libxl__egc *egc,
         return;
     }
 
-    /*
-     * TODO: get store gfn and console gfn
-     *  We should call the callback restore_results in
-     *  xc_domain_restore() before resuming the guest.
-     */
     libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
 
     return;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c58dd7e..d6c794e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1017,6 +1017,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
     libxl__colo_restore_state *const crs = &dcs->crs;
     libxl_domain_build_info *const info = &d_config->b_info;
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->srs.shs.callbacks.restore.a;
 
     if (rc) {
         domcreate_rebuild_done(egc, dcs, rc);
@@ -1044,6 +1046,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     }
 
     /* Restore */
+    callbacks->restore_results = libxl__srm_callout_callback_restore_results;
 
     /* COLO only supports HVM now because it does not work very
      * well with pv drivers:
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 6016706..c2243f2 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -29,8 +29,8 @@ our @msgs = (
     [  6, 'srcxA',  "wait_checkpoint", [] ],
     [  7, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
-    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
-                                              'unsigned long', 'console_mfn'] ],
+    [  8, 'rcx',    "restore_results",       ['unsigned long', 'store_gfn',
+                                              'unsigned long', 'console_gfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
 );
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 16/26] libxc/save: support COLO save
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (14 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 15/26] libxc/restore: support COLO restore Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 17/26] implement the cmdline for COLO Changlong Xie
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

After suspend primary vm, get dirty bitmap on secondary vm,
and send pages both dirty on primary/secondary to secondary.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/xc_sr_common.h |  2 +
 tools/libxc/xc_sr_save.c   | 95 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index cf32ab8..a83f22a 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -184,6 +184,8 @@ struct xc_sr_context
     {
         struct /* Save data. */
         {
+            int recv_fd;
+
             struct xc_sr_save_ops ops;
             struct save_callbacks *callbacks;
 
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index d3d95d4..f574993 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -517,6 +517,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
     return rc;
 }
 
+static int colo_merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_record rec;
+    uint64_t *pfns = NULL;
+    uint64_t pfn;
+    unsigned count, i;
+    int rc;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->save.dirty_bitmap_hbuf);
+
+    rc = read_record(ctx, ctx->save.recv_fd, &rec);
+    if ( rc )
+        goto err;
+
+    if ( rec.type != REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST )
+    {
+        PERROR("Expect dirty bitmap record, but received %u", rec.type );
+        rc = -1;
+        goto err;
+    }
+
+    if ( rec.length % sizeof(*pfns) )
+    {
+        PERROR("Invalid dirty pfn list record length %u", rec.length );
+        rc = -1;
+        goto err;
+    }
+
+    count = rec.length / sizeof(*pfns);
+    pfns = rec.data;
+
+    for ( i = 0; i < count; i++ )
+    {
+        pfn = pfns[i];
+        if (pfn > ctx->save.p2m_size)
+        {
+            PERROR("Invalid pfn %#lx", pfn );
+            rc = -1;
+            goto err;
+        }
+
+        set_bit(pfn, dirty_bitmap);
+    }
+
+    rc = 0;
+
+ err:
+    free(rec.data);
+    return rc;
+}
+
 /*
  * Suspend the domain and send dirty memory.
  * This is the last iteration of the live migration and the
@@ -558,6 +610,16 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 
     bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
 
+    if ( !ctx->save.live && ctx->save.checkpointed == XC_MIG_STREAM_COLO )
+    {
+        rc = colo_merge_secondary_dirty_bitmap(ctx);
+        if ( rc )
+        {
+            PERROR("Failed to get secondary vm's dirty pages");
+            goto out;
+        }
+    }
+
     rc = send_dirty_pages(ctx, stats.dirty_count + ctx->save.nr_deferred_pages);
     if ( rc )
         goto out;
@@ -791,13 +853,39 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
             if ( rc )
                 goto err;
 
+            if ( ctx->save.checkpointed == XC_MIG_STREAM_COLO )
+            {
+                rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+                if ( !rc )
+                {
+                    rc = -1;
+                    goto err;
+                }
+            }
+
             rc = ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
             if ( rc <= 0 )
                 goto err;
 
-            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
-            if ( rc <= 0 )
+            if ( ctx->save.checkpointed == XC_MIG_STREAM_COLO )
+            {
+                rc = ctx->save.callbacks->wait_checkpoint(
+                    ctx->save.callbacks->data);
+                if ( rc <= 0 )
+                    goto err;
+            }
+            else if ( ctx->save.checkpointed == XC_MIG_STREAM_REMUS )
+            {
+                rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+                if ( rc <= 0 )
+                    goto err;
+            }
+            else
+            {
+                ERROR("Unknown checkpointed stream");
+                rc = -1;
                 goto err;
+            }
         }
     } while ( ctx->save.checkpointed != XC_MIG_STREAM_NONE );
 
@@ -843,6 +931,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
     ctx.save.live  = !!(flags & XCFLAGS_LIVE);
     ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
     ctx.save.checkpointed = stream_type;
+    ctx.save.recv_fd = recv_fd;
 
     /* If altering migration_stream update this assert too. */
     assert(stream_type == XC_MIG_STREAM_NONE ||
@@ -863,6 +952,8 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
         assert(callbacks->switch_qemu_logdirty);
     if ( ctx.save.checkpointed )
         assert(callbacks->checkpoint && callbacks->postcopy);
+    if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO )
+        assert(callbacks->wait_checkpoint);
 
     DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
             io_fd, dom, max_iters, max_factor, flags, hvm);
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 17/26] implement the cmdline for COLO
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (15 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 16/26] libxc/save: support COLO save Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 18/26] COLO: introduce new API to prepare/start/do/get_error/stop replication Changlong Xie
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option --colo to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/man/xl.pod.1          | 13 ++++++++--
 tools/libxl/libxl.c        | 22 ++++++++++++++--
 tools/libxl/libxl_create.c |  1 -
 tools/libxl/xl_cmdimpl.c   | 65 +++++++++++++++++++++++++++++++++++-----------
 tools/libxl/xl_cmdtable.c  |  4 ++-
 5 files changed, 84 insertions(+), 21 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index dc6213e..a992a45 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -447,12 +447,16 @@ Print huge (!) amount of debug during the migration process.
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
 
-Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
-mechanism between the two hosts.
+Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
+transport mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
+     COLO support in xl is still in experimental (proof-of-concept) phase.
+     There is no support for network or disk, so the guest will corrupt its
+     disk and confuse its network peers at the moment.
+
 B<OPTIONS>
 
 =over 4
@@ -498,6 +502,11 @@ Disable network output buffering. Requires enabling unsafe mode.
 
 Disable disk replication. Requires enabling unsafe mode.
 
+=item B<-c>
+
+Enable COLO HA. This conflicts with B<-i> and B<-b>, and memory
+checkpoint compression must be disabled.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 272c6a5..349a3c6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -848,12 +848,27 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    /* The caller must set this defbool */
+    if (libxl_defbool_is_default(info->colo)) {
+        LOG(ERROR, "colo mode must be enabled/disabled");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     libxl_defbool_setdefault(&info->allow_unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
-    libxl_defbool_setdefault(&info->compression, true);
+    libxl_defbool_setdefault(&info->compression,
+                             !libxl_defbool_val(info->colo));
     libxl_defbool_setdefault(&info->netbuf, true);
     libxl_defbool_setdefault(&info->diskbuf, true);
 
+    if (libxl_defbool_val(info->colo) &&
+        libxl_defbool_val(info->compression)) {
+            LOG(ERROR, "cannot use memory checkpoint compression in COLO mode");
+            rc = ERROR_FAIL;
+            goto out;
+    }
+
     if (!libxl_defbool_val(info->allow_unsafe) &&
         (libxl_defbool_val(info->blackhole) ||
          !libxl_defbool_val(info->netbuf) ||
@@ -875,7 +890,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->live = 1;
     dss->debug = 0;
     dss->remus = info;
-    dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
+    if (libxl_defbool_val(info->colo))
+        dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_COLO;
+    else
+        dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
 
     assert(info);
 
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d6c794e..be604e5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1893,7 +1893,6 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
-    assert(send_back_fd == -1);
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
                             params, ao_how, aop_console_how);
 }
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 2e64f44..25bd81a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4740,6 +4740,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     char rc_buf;
     char *migration_domname;
     struct domain_create dom_info;
+    const char *ha = checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO ?
+                     "COLO" : "Remus";
 
     signal(SIGPIPE, SIG_IGN);
     /* if we get SIGPIPE we'd rather just have it as an error */
@@ -4757,7 +4759,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.monitor = monitor;
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
-    dom_info.send_back_fd = -1;
+    dom_info.send_back_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
 
@@ -4772,11 +4774,12 @@ static void migrate_receive(int debug, int daemonize, int monitor,
 
     switch (checkpointed) {
     case LIBXL_CHECKPOINTED_STREAM_REMUS:
+    case LIBXL_CHECKPOINTED_STREAM_COLO:
         /* If we are here, it means that the sender (primary) has crashed.
          * TODO: Split-Brain Check.
          */
-        fprintf(stderr, "migration target: Remus Failover for domain %u\n",
-                domid);
+        fprintf(stderr, "migration target: %s Failover for domain %u\n",
+                ha, domid);
 
         /*
          * If domain renaming fails, lets just continue (as we need the domain
@@ -4792,16 +4795,20 @@ static void migrate_receive(int debug, int daemonize, int monitor,
             rc = libxl_domain_rename(ctx, domid, migration_domname,
                                      common_domname);
             if (rc)
-                fprintf(stderr, "migration target (Remus): "
+                fprintf(stderr, "migration target (%s): "
                         "Failed to rename domain from %s to %s:%d\n",
-                        migration_domname, common_domname, rc);
+                        ha, migration_domname, common_domname, rc);
         }
 
+        if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO)
+            /* The guest is running after failover in COLO mode */
+            exit(rc ? -ERROR_FAIL: 0);
+
         rc = libxl_domain_unpause(ctx, domid);
         if (rc)
-            fprintf(stderr, "migration target (Remus): "
+            fprintf(stderr, "migration target (%s): "
                     "Failed to unpause domain %s (id: %u):%d\n",
-                    common_domname, domid, rc);
+                    ha, common_domname, domid, rc);
 
         exit(rc ? -ERROR_FAIL: 0);
     default:
@@ -4948,8 +4955,12 @@ int main_migrate_receive(int argc, char **argv)
     int debug = 0, daemonize = 1, monitor = 1;
     libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
+    static struct option opts[] = {
+        {"colo", 0, 0, 0x100},
+        COMMON_LONG_OPTS
+    };
 
-    SWITCH_FOREACH_OPT(opt, "Fedr", NULL, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedr", opts, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -4963,6 +4974,9 @@ int main_migrate_receive(int argc, char **argv)
     case 'r':
         checkpointed = LIBXL_CHECKPOINTED_STREAM_REMUS;
         break;
+    case 0x100:
+        checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -8338,11 +8352,8 @@ int main_remus(int argc, char **argv)
     int config_len;
 
     memset(&r_info, 0, sizeof(libxl_domain_remus_info));
-    /* Defaults */
-    r_info.interval = 200;
-    libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:ec", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -8370,11 +8381,32 @@ int main_remus(int argc, char **argv)
     case 'e':
         daemonize = 0;
         break;
+    case 'c':
+        libxl_defbool_set(&r_info.colo, true);
     }
 
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    /* Defaults */
+    libxl_defbool_setdefault(&r_info.blackhole, false);
+    libxl_defbool_setdefault(&r_info.colo, false);
+    if (!libxl_defbool_val(r_info.colo) && !r_info.interval)
+        r_info.interval = 200;
+
+    if (libxl_defbool_val(r_info.colo)) {
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
+            perror("Option -c conflicts with -i or -b");
+            exit(-1);
+        }
+
+        if (libxl_defbool_is_default(r_info.compression)) {
+            perror("COLO can't be used with memory compression. "
+                   "Disable memory checkpoint compression now...");
+            libxl_defbool_set(&r_info.compression, false);
+        }
+    }
+
     if (!r_info.netbufscript)
         r_info.netbufscript = default_remus_netbufscript;
 
@@ -8389,8 +8421,9 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            xasprintf(&rune, "exec %s %s xl migrate-receive -r %s",
+            xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
                       ssh_command, host,
+                      libxl_defbool_val(r_info.colo) ? "-c" : "-r",
                       daemonize ? "" : " -e");
         }
 
@@ -8418,7 +8451,8 @@ int main_remus(int argc, char **argv)
      * domain to force failover
      */
     if (libxl_domain_info(ctx, 0, domid)) {
-        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        fprintf(stderr, "%s: Primary domain has been destroyed.\n",
+                libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
         close(send_fd);
         return 0;
     }
@@ -8430,7 +8464,8 @@ int main_remus(int argc, char **argv)
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
-        fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n");
+        fprintf(stderr, "%s: Backup failed? resuming domain at primary.\n",
+                libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
         libxl_domain_resume(ctx, domid, 1, 0);
     }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index b14b881..5911ea8 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -499,7 +499,9 @@ struct cmd_spec cmd_table[] = {
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
       "                        Works only in unsafe mode.\n"
       "-n                      Disable network output buffering. Works only in unsafe mode.\n"
-      "-d                      Disable disk replication. Works only in unsafe mode."
+      "-d                      Disable disk replication. Works only in unsafe mode.\n"
+      "-c                      Enable COLO HA. It is conflict with -i and -b, and memory\n"
+      "                        checkpoint must be disabled"
     },
 #endif
     { "devd",
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 18/26] COLO: introduce new API to prepare/start/do/get_error/stop replication
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (16 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 17/26] implement the cmdline for COLO Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 19/26] Introduce COLO mode and refactor relevant function Changlong Xie
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

We will use qemu block replication, and qemu provides some qmp commands
to prepare replication, start replication, get replication error, and
stop replication. Introduce new API to execute these qmp commands.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_internal.h | 24 +++++++++++
 tools/libxl/libxl_qmp.c      | 96 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 120 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b3bb479..148df05 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1774,6 +1774,30 @@ _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enabl
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
 /* Add a virtual CPU */
 _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int index);
+/* Start NBD server */
+_hidden int libxl__qmp_nbd_server_start(libxl__gc *gc, int domid,
+                                        const char *host, const char *port);
+/* Add a disk to NBD server */
+_hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
+                                      const char *disk);
+/* Start replication */
+_hidden int libxl__qmp_start_replication(libxl__gc *gc, int domid,
+                                         bool primary);
+/* Get replication error that occurs when the vm is running */
+_hidden int libxl__qmp_get_replication_error(libxl__gc *gc, int domid);
+/* Do checkpoint */
+_hidden int libxl__qmp_do_checkpoint(libxl__gc *gc, int domid);
+/* Stop replication */
+_hidden int libxl__qmp_stop_replication(libxl__gc *gc, int domid,
+                                        bool primary);
+/* Stop NBD server */
+_hidden int libxl__qmp_nbd_server_stop(libxl__gc *gc, int domid);
+/* Add or remove a child to/from quorum */
+_hidden int libxl__qmp_x_blockdev_change(libxl__gc *gc, int domid,
+                                         const char *parant,
+                                         const char *child, const char *node);
+/* run a hmp command in qmp mode */
+_hidden int libxl__qmp_hmp(libxl__gc *gc, int domid, const char *command_line);
 /* close and free the QMP handler */
 _hidden void libxl__qmp_close(libxl__qmp_handler *qmp);
 /* remove the socket file, if the file has already been removed,
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index c0bdfcb..3eb279a 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -979,6 +979,102 @@ int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int idx)
     return qmp_run_command(gc, domid, "cpu-add", args, NULL, NULL);
 }
 
+int libxl__qmp_nbd_server_start(libxl__gc *gc, int domid,
+                                const char *host, const char *port)
+{
+    libxl__json_object *args = NULL;
+    libxl__json_object *addr = NULL;
+    libxl__json_object *data = NULL;
+
+    /* 'addr': {
+     *   'type': 'inet',
+     *   'data': {
+     *     'host': '$nbd_host',
+     *     'port': '$nbd_port'
+     *   }
+     * }
+     */
+    qmp_parameters_add_string(gc, &data, "host", host);
+    qmp_parameters_add_string(gc, &data, "port", port);
+
+    qmp_parameters_add_string(gc, &addr, "type", "inet");
+    qmp_parameters_common_add(gc, &addr, "data", data);
+
+    qmp_parameters_common_add(gc, &args, "addr", addr);
+
+    return qmp_run_command(gc, domid, "nbd-server-start", args, NULL, NULL);
+}
+
+int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid, const char *disk)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "device", disk);
+    qmp_parameters_add_bool(gc, &args, "writable", true);
+
+    return qmp_run_command(gc, domid, "nbd-server-add", args, NULL, NULL);
+}
+
+int libxl__qmp_start_replication(libxl__gc *gc, int domid, bool primary)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_bool(gc, &args, "enable", true);
+    qmp_parameters_add_bool(gc, &args, "primary", primary);
+
+    return qmp_run_command(gc, domid, "xen-set-replication", args, NULL, NULL);
+}
+
+int libxl__qmp_get_replication_error(libxl__gc *gc, int domid)
+{
+    return qmp_run_command(gc, domid, "xen-get-replication-error", NULL,
+                           NULL, NULL);
+}
+
+int libxl__qmp_do_checkpoint(libxl__gc *gc, int domid)
+{
+    return qmp_run_command(gc, domid, "xen-do-checkpoint", NULL, NULL, NULL);
+}
+
+int libxl__qmp_stop_replication(libxl__gc *gc, int domid, bool primary)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_bool(gc, &args, "enable", false);
+    qmp_parameters_add_bool(gc, &args, "primary", primary);
+
+    return qmp_run_command(gc, domid, "xen-set-replication", args, NULL, NULL);
+}
+
+int libxl__qmp_nbd_server_stop(libxl__gc *gc, int domid)
+{
+    return qmp_run_command(gc, domid, "nbd-server-stop", NULL, NULL, NULL);
+}
+
+int libxl__qmp_x_blockdev_change(libxl__gc *gc, int domid, const char *parent,
+                                 const char *child, const char *node)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "parent", parent);
+    if (child)
+        qmp_parameters_add_string(gc, &args, "child", child);
+    if (node)
+        qmp_parameters_add_string(gc, &args, "node", node);
+
+    return qmp_run_command(gc, domid, "x-blockdev-change", args, NULL, NULL);
+}
+
+int libxl__qmp_hmp(libxl__gc *gc, int domid, const char *command_line)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "command-line", command_line);
+
+    return qmp_run_command(gc, domid, "human-monitor-command", args,
+                           NULL, NULL);
+}
+
 int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
                                const libxl_domain_config *guest_config)
 {
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 19/26] Introduce COLO mode and refactor relevant function
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (17 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 18/26] COLO: introduce new API to prepare/start/do/get_error/stop replication Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 20/26] Support colo mode for qemu disk Changlong Xie
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

No functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_dm.c | 65 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index cfda24c..1d1b25b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -751,6 +751,51 @@ static int libxl__dm_runas_helper(libxl__gc *gc, const char *username)
     }
 }
 
+/* colo mode */
+enum {
+    LIBXL__COLO_NONE = 0,
+};
+
+static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
+                                         int unit, const char *format,
+                                         const libxl_device_disk *disk,
+                                         int colo_mode)
+{
+    char *drive = NULL;
+
+    switch (colo_mode) {
+    case LIBXL__COLO_NONE:
+        drive = libxl__sprintf
+            (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
+             pdev_path, unit, format);
+        break;
+    default:
+         abort();
+    }
+
+    return drive;
+}
+
+static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
+                                        int unit, const char *format,
+                                        const libxl_device_disk *disk,
+                                        int colo_mode)
+{
+    char *drive = NULL;
+
+    switch (colo_mode) {
+    case LIBXL__COLO_NONE:
+        drive = GCSPRINTF
+            ("file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
+             pdev_path, unit, format);
+        break;
+    default:
+         abort();
+    }
+
+    return drive;
+}
+
 static int libxl__build_device_model_args_new(libxl__gc *gc,
                                         const char *dm, int guest_domid,
                                         const libxl_domain_config *guest_config,
@@ -1170,6 +1215,7 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
             const char *format = qemu_disk_format_string(disks[i].format);
             char *drive;
             const char *pdev_path;
+            int colo_mode;
 
             if (dev_number == -1) {
                 LOG(WARN, "unable to determine"" disk number for %s",
@@ -1214,10 +1260,13 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                  * For other disks we translate devices 0..3 into
                  * hd[a-d] and ignore the rest.
                  */
+
+                colo_mode = LIBXL__COLO_NONE;
                 if (strncmp(disks[i].vdev, "sd", 2) == 0) {
-                    drive = libxl__sprintf
-                        (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,readonly=%s,cache=writeback",
-                         pdev_path, disk, format, disks[i].readwrite ? "off" : "on");
+                    drive = qemu_disk_scsi_drive_string(gc, pdev_path, disk,
+                                                        format,
+                                                        &disks[i],
+                                                        colo_mode);
                 } else if (strncmp(disks[i].vdev, "xvd", 3) == 0) {
                     /*
                      * Do not add any emulated disk when PV disk are
@@ -1240,12 +1289,16 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                         LOG(ERROR, "qemu-xen doesn't support read-only IDE disk drivers");
                         return ERROR_INVAL;
                     }
-                    drive = libxl__sprintf
-                        (gc, "file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
-                         pdev_path, disk, format);
+                    drive = qemu_disk_ide_drive_string(gc, pdev_path, disk,
+                                                       format,
+                                                       &disks[i],
+                                                       colo_mode);
                 } else {
                     continue; /* Do not emulate this disk */
                 }
+
+                if (!drive)
+                    continue;
             }
 
             flexarray_append(dm_args, "-drive");
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 20/26] Support colo mode for qemu disk
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (18 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 19/26] Introduce COLO mode and refactor relevant function Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 21/26] COLO: use qemu block replication Changlong Xie
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Usage: disk = ['...,colo,colo-host=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Note: we just introduce COLO framework, but don't implement COLO
operations in this patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 docs/man/xl.pod.1                   |  38 ++++++++++--
 docs/misc/xl-disk-configuration.txt |  53 +++++++++++++++++
 tools/libxl/libxl.c                 |  51 +++++++++++++++-
 tools/libxl/libxl_create.c          |  26 ++++++++-
 tools/libxl/libxl_device.c          |  11 ++++
 tools/libxl/libxl_dm.c              | 113 +++++++++++++++++++++++++++++++++++-
 tools/libxl/libxl_types.idl         |   7 +++
 tools/libxl/libxlu_disk_l.l         |  17 ++++++
 8 files changed, 308 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index a992a45..6788465 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -450,12 +450,40 @@ Print huge (!) amount of debug during the migration process.
 Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
 transport mechanism between the two hosts.
 
-N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     Disk replication support is limited to DRBD disks.
+B<NOTES>
+
+=over 4
+
+Remus support in xl is still in experimental (proof-of-concept) phase.
+Disk replication support is limited to DRBD disks.
+
+COLO support in xl is still in experimental (proof-of-concept) phase.
+There is no support for network, so the guest will confuse its network
+peers at the moment.
+
+=back
+
+B<EXAMPLE>
 
-     COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk, so the guest will corrupt its
-     disk and confuse its network peers at the moment.
+=over 4
+
+(a) An example for COLO replication's configuration: disk =['...,colo,colo-host
+=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
+
+=item B<colo-host>      :Secondary host's ip address.
+
+=item B<colo-port>      :Secondary host's port, we will run a nbd server on
+secondary host, and the nbd server will listen this port.
+
+=item B<colo-export>    :Nbd server's disk export name of secondary host.
+
+=item B<active-disk>    :Secondary's guest write will be buffered in this disk,
+and it's used by secondary.
+
+=item B<hidden-disk>    :Primary's modified contents will be buffered in this
+disk, and it's used by secondary.
+
+=back
 
 B<OPTIONS>
 
diff --git a/docs/misc/xl-disk-configuration.txt b/docs/misc/xl-disk-configuration.txt
index 29f6ddb..6e73975 100644
--- a/docs/misc/xl-disk-configuration.txt
+++ b/docs/misc/xl-disk-configuration.txt
@@ -234,6 +234,59 @@ were intentionally created non-sparse to avoid fragmentation of the
 file.
 
 
+===============
+COLO PARAMETERS
+===============
+
+
+colo
+----
+
+Enable COLO HA for disk. For better understanding block replication on
+QEMU, please refer to:
+http://wiki.qemu.org/Features/BlockReplication
+
+
+colo-host
+---------
+
+Description:           Secondary host's address
+Mandatory:             Yes when COLO enabled
+
+
+colo-port
+---------
+
+Description:           Secondary port
+                       We will run a nbd server on secondary host,
+                       and the nbd server will listen this port.
+Mandatory:             Yes when COLO enabled
+
+
+colo-export
+-----------
+
+Description:           We will run a nbd server on secondary host,
+                       exportname is the nbd server's disk export name.
+Mandatory:             Yes when COLO enabled
+
+
+active-disk
+-----------
+
+Description:           This is used by secondary. Secondary guest's write
+                       will be buffered in this disk.
+Mandatory:             Yes when COLO enabled
+
+
+hidden-disk
+-----------
+
+Description:           This is used by secondary. It buffers the original
+                       content that is modified by the primary VM.
+Mandatory:             Yes when COLO enabled
+
+
 ============================================
 DEPRECATED PARAMETERS, PREFIXES AND SYNTAXES
 ============================================
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 349a3c6..63fbe16 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2306,6 +2306,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk)
     int rc;
 
     libxl_defbool_setdefault(&disk->discard_enable, !!disk->readwrite);
+    libxl_defbool_setdefault(&disk->colo_enable, false);
+    libxl_defbool_setdefault(&disk->colo_restore_enable, false);
 
     rc = libxl__resolve_domid(gc, disk->backend_domname, &disk->backend_domid);
     if (rc < 0) return rc;
@@ -2504,6 +2506,18 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
                 flexarray_append(back, "params");
                 flexarray_append(back, GCSPRINTF("%s:%s",
                               libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                if (libxl_defbool_val(disk->colo_enable)) {
+                    flexarray_append(back, "colo-host");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_host));
+                    flexarray_append(back, "colo-port");
+                    flexarray_append(back, libxl__sprintf(gc, "%d", disk->colo_port));
+                    flexarray_append(back, "colo-export");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_export));
+                    flexarray_append(back, "active-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->active_disk));
+                    flexarray_append(back, "hidden-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->hidden_disk));
+                }
                 assert(device->backend_kind == LIBXL__DEVICE_KIND_QDISK);
                 break;
             default:
@@ -2619,7 +2633,12 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         goto cleanup;
     }
 
-    /* "params" may not be present; but everything else must be. */
+    /*
+     * "params" may not be present; but everything else must be.
+     * colo releated entries(colo-host, colo-port, colo-export,
+     * active-disk and hidden-disk) are present only if colo is
+     * enabled.
+     */
     tmp = xs_read(ctx->xsh, XBT_NULL,
                   GCSPRINTF("%s/params", be_path), &len);
     if (tmp && strchr(tmp, ':')) {
@@ -2629,6 +2648,36 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         disk->pdev_path = tmp;
     }
 
+    tmp = xs_read(ctx->xsh, XBT_NULL,
+                  GCSPRINTF("%s/colo-host", be_path), &len);
+    if (tmp) {
+        libxl_defbool_set(&disk->colo_enable, true);
+        disk->colo_host = tmp;
+
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      GCSPRINTF("%s/colo-port", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/colo-port", be_path);
+            goto cleanup;
+        }
+        disk->colo_port = atoi(tmp);
+
+#define XS_READ_COLO(param, item) do {                                  \
+        tmp = xs_read(ctx->xsh, XBT_NULL,                               \
+                      GCSPRINTF("%s/"#param"", be_path), &len);         \
+        if (!tmp) {                                                     \
+            LOG(ERROR, "Missing xenstore node %s/"#param"", be_path);   \
+            goto cleanup;                                               \
+        }                                                               \
+        disk->item = tmp;                                               \
+} while (0)
+        XS_READ_COLO(colo-export, colo_export);
+        XS_READ_COLO(active-disk, active_disk);
+        XS_READ_COLO(hidden-disk, hidden_disk);
+#undef XS_READ_COLO
+    } else {
+        libxl_defbool_set(&disk->colo_enable, false);
+    }
 
     tmp = libxl__xs_read(gc, XBT_NULL,
                          GCSPRINTF("%s/type", be_path));
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index be604e5..e2ec25c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1876,12 +1876,30 @@ static void domain_create_cb(libxl__egc *egc,
 
     libxl__ao_complete(egc, ao, rc);
 }
-    
+
+
+static void set_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, true);
+}
+
+static void unset_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, false);
+}
+
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
+    unset_disk_colo_restore(d_config);
     return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
                             ao_how, aop_console_how);
 }
@@ -1893,6 +1911,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        set_disk_colo_restore(d_config);
+    } else {
+        unset_disk_colo_restore(d_config);
+    }
+
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
                             params, ao_how, aop_console_how);
 }
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 4ced9b6..6a411c6 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -196,6 +196,9 @@ static int disk_try_backend(disk_try_backend_args *a,
             goto bad_format;
         }
 
+        if (libxl_defbool_val(a->disk->colo_enable))
+            goto bad_colo;
+
         if (a->disk->backend_domid != LIBXL_TOOLSTACK_DOMID) {
             LOG(DEBUG, "Disk vdev=%s, is using a storage driver domain, "
                        "skipping physical device check", a->disk->vdev);
@@ -218,6 +221,9 @@ static int disk_try_backend(disk_try_backend_args *a,
     case LIBXL_DISK_BACKEND_TAP:
         if (a->disk->script) goto bad_script;
 
+        if (libxl_defbool_val(a->disk->colo_enable))
+            goto bad_colo;
+
         if (a->disk->is_cdrom) {
             LOG(DEBUG, "Disk vdev=%s, backend tap unsuitable for cdroms",
                        a->disk->vdev);
@@ -256,6 +262,11 @@ static int disk_try_backend(disk_try_backend_args *a,
     LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with script=...",
         a->disk->vdev, libxl_disk_backend_to_string(backend));
     return 0;
+
+ bad_colo:
+    LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
 }
 
 int libxl__device_disk_set_backend(libxl__gc *gc, libxl_device_disk *disk) {
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 1d1b25b..4c3dff8 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -754,6 +754,8 @@ static int libxl__dm_runas_helper(libxl__gc *gc, const char *username)
 /* colo mode */
 enum {
     LIBXL__COLO_NONE = 0,
+    LIBXL__COLO_PRIMARY,
+    LIBXL__COLO_SECONDARY,
 };
 
 static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
@@ -762,6 +764,9 @@ static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
                                          int colo_mode)
 {
     char *drive = NULL;
+    const char *exportname = disk->colo_export;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
 
     switch (colo_mode) {
     case LIBXL__COLO_NONE:
@@ -769,6 +774,45 @@ static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
             (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
              pdev_path, unit, format);
         break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=scsi,bus=0,unit=x,cache=writeback,driver=quorum,\
+         *  id=exportname,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  read-pattern=fifo,\
+         *  vote-threshold=1
+         */
+        drive = GCSPRINTF(
+            "if=scsi,bus=0,unit=%d,cache=writeback,driver=quorum,"
+            "id=%s,"
+            "children.0.file.filename=%s,"
+            "children.0.driver=%s,"
+            "read-pattern=fifo,"
+            "vote-threshold=1",
+            unit, exportname, pdev_path, format);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=scsi,bus=0,unit=x,cache=writeback,driver=replication,\
+         *  mode=secondary,\
+         *  file.driver=qcow2,\
+         *  file.file.filename=active_disk,\
+         *  file.backing.driver=qcow2,\
+         *  file.backing.file.filename=hidden_disk,\
+         *  file.backing.backing=exportname,
+         */
+        drive = GCSPRINTF(
+            "if=scsi,bus=0,unit=%d,cache=writeback,driver=replication,"
+            "mode=secondary,"
+            "file.driver=qcow2,"
+            "file.file.filename=%s,"
+            "file.backing.driver=qcow2,"
+            "file.backing.file.filename=%s,"
+            "file.backing.backing=%s",
+            unit, active_disk, hidden_disk, exportname);
     default:
          abort();
     }
@@ -782,6 +826,9 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
                                         int colo_mode)
 {
     char *drive = NULL;
+    const char *exportname = disk->colo_export;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
 
     switch (colo_mode) {
     case LIBXL__COLO_NONE:
@@ -789,6 +836,46 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
             ("file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
              pdev_path, unit, format);
         break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=ide,index=x,media=disk,cache=writeback,driver=quorum,\
+         *  id=exportname,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  read-pattern=fifo,\
+         *  vote-threshold=1
+         */
+        drive = GCSPRINTF(
+            "if=ide,index=%d,media=disk,cache=writeback,driver=quorum,"
+            "id=%s,"
+            "children.0.file.filename=%s,"
+            "children.0.driver=%s,"
+            "read-pattern=fifo,"
+            "vote-threshold=1",
+             unit, exportname, pdev_path, format);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=ide,index=x,media=disk,cache=writeback,driver=replication,\
+         *  mode=secondary,\
+         *  file.driver=qcow2,\
+         *  file.file.filename=active_disk,\
+         *  file.backing.driver=qcow2,\
+         *  file.backing.file.filename=hidden_disk,\
+         *  file.backing.backing=exportname,
+         */
+        drive = GCSPRINTF(
+            "if=ide,index=%d,media=disk,cache=writeback,driver=replication,"
+            "mode=secondary,"
+            "file.driver=qcow2,"
+            "file.file.filename=%s,"
+            "file.backing.driver=qcow2,"
+            "file.backing.file.filename=%s,"
+            "file.backing.backing=%s",
+            unit, active_disk, hidden_disk, exportname);
+        break;
     default:
          abort();
     }
@@ -1261,8 +1348,24 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                  * hd[a-d] and ignore the rest.
                  */
 
-                colo_mode = LIBXL__COLO_NONE;
+                if (libxl_defbool_val(disks[i].colo_enable)) {
+                    if (libxl_defbool_val(disks[i].colo_restore_enable))
+                        colo_mode = LIBXL__COLO_SECONDARY;
+                    else
+                        colo_mode = LIBXL__COLO_PRIMARY;
+                } else {
+                    colo_mode = LIBXL__COLO_NONE;
+                }
+
                 if (strncmp(disks[i].vdev, "sd", 2) == 0) {
+                    if (colo_mode == LIBXL__COLO_SECONDARY) {
+                        drive = libxl__sprintf
+                            (gc, "if=none,driver=%s,file=%s,id=%s",
+                             format, pdev_path, disks[i].colo_export);
+
+                        flexarray_append(dm_args, "-drive");
+                        flexarray_append(dm_args, drive);
+                    }
                     drive = qemu_disk_scsi_drive_string(gc, pdev_path, disk,
                                                         format,
                                                         &disks[i],
@@ -1289,6 +1392,14 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                         LOG(ERROR, "qemu-xen doesn't support read-only IDE disk drivers");
                         return ERROR_INVAL;
                     }
+                    if (colo_mode == LIBXL__COLO_SECONDARY) {
+                        drive = libxl__sprintf
+                            (gc, "if=none,driver=%s,file=%s,id=%s",
+                             format, pdev_path, disks[i].colo_export);
+
+                        flexarray_append(dm_args, "-drive");
+                        flexarray_append(dm_args, drive);
+                    }
                     drive = qemu_disk_ide_drive_string(gc, pdev_path, disk,
                                                        format,
                                                        &disks[i],
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 95efd82..0470423 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -576,6 +576,13 @@ libxl_device_disk = Struct("device_disk", [
     ("is_cdrom", integer),
     ("direct_io_safe", bool),
     ("discard_enable", libxl_defbool),
+    ("colo_enable", libxl_defbool),
+    ("colo_restore_enable", libxl_defbool),
+    ("colo_host", string),
+    ("colo_port", integer),
+    ("colo_export", string),
+    ("active_disk", string),
+    ("hidden_disk", string)
     ])
 
 libxl_device_nic = Struct("device_nic", [
diff --git a/tools/libxl/libxlu_disk_l.l b/tools/libxl/libxlu_disk_l.l
index 1a5deb5..cf2eec2 100644
--- a/tools/libxl/libxlu_disk_l.l
+++ b/tools/libxl/libxlu_disk_l.l
@@ -113,6 +113,16 @@ static void setbackendtype(DiskParseContext *dpc, const char *str) {
     else xlu__disk_err(dpc,str,"unknown value for backendtype");
 }
 
+/* Sets ->colo-port from the string.  COLO need this. */
+static void setcoloport(DiskParseContext *dpc, const char *str) {
+    int port = atoi(str);
+    if (port) {
+       dpc->disk->colo_port = port;
+    } else {
+	xlu__disk_err(dpc,str,"unknown value for colo_port");
+    }
+}
+
 #define DEPRECATE(usewhatinstead) /* not currently reported */
 
 /* Handles a vdev positional parameter which includes a devtype. */
@@ -176,6 +186,13 @@ script=[^,]*,?	{ STRIP(','); SAVESTRING("script", script, FROMEQUALS); }
 direct-io-safe,? { DPC->disk->direct_io_safe = 1; }
 discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, true); }
 no-discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, false); }
+colo,?		{ libxl_defbool_set(&DPC->disk->colo_enable, true); }
+no-colo,?	{ libxl_defbool_set(&DPC->disk->colo_enable, false); }
+colo-host=[^,]*,?	{ STRIP(','); SAVESTRING("colo-host", colo_host, FROMEQUALS); }
+colo-port=[^,]*,?	{ STRIP(','); setcoloport(DPC, FROMEQUALS); }
+colo-export=[^,]*,?	{ STRIP(','); SAVESTRING("colo-export", colo_export, FROMEQUALS); }
+active-disk=[^,]*,?	{ STRIP(','); SAVESTRING("active-disk", active_disk, FROMEQUALS); }
+hidden-disk=[^,]*,?	{ STRIP(','); SAVESTRING("hidden-disk", hidden_disk, FROMEQUALS); }
 
  /* the target magic parameter, eats the rest of the string */
 
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 21/26] COLO: use qemu block replication
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (19 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 20/26] Support colo mode for qemu disk Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 22/26] COLO proxy: implement setup/teardown/preresume/postresume/checkpoint Changlong Xie
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Use qemu block replication as our block replication solution.
Note that guest must be paused before starting COLO, otherwise,
the disk won't be consistent between primary and secondary.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/Makefile             |   1 +
 tools/libxl/libxl_colo.h         |  15 +++
 tools/libxl/libxl_colo_qdisk.c   | 230 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_restore.c |  42 ++++++-
 tools/libxl/libxl_colo_save.c    |  54 ++++++++-
 tools/libxl/libxl_internal.h     |   5 +
 6 files changed, 342 insertions(+), 5 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_qdisk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index c5ef3f0..701c069 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -66,6 +66,7 @@ endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
+LIBXL_OBJS-y += libxl_colo_qdisk.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index feec7f1..90345f4 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -19,6 +19,7 @@
 struct libxl__ao;
 struct libxl__egc;
 struct libxl__colo_save_state;
+struct libxl__checkpoint_devices_state;
 
 enum {
     LIBXL_COLO_SETUPED,
@@ -26,6 +27,10 @@ enum {
     LIBXL_COLO_RESUMED,
 };
 
+typedef struct libxl__colo_qdisk {
+    bool setuped;
+} libxl__colo_qdisk;
+
 typedef struct libxl__domain_create_state libxl__domain_create_state;
 typedef void libxl__domain_create_cb(struct libxl__egc *egc,
                                      libxl__domain_create_state *dcs,
@@ -47,8 +52,18 @@ struct libxl__colo_restore_state {
     /* private, colo restore checkpoint state */
     libxl__domain_create_cb *saved_cb;
     void *crcs;
+
+    /* private, used by qdisk block replication */
+    bool qdisk_used;
+    bool qdisk_setuped;
+    const char *host;
+    const char *port;
 };
 
+int init_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
+
+void cleanup_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
+
 extern void libxl__colo_restore_setup(struct libxl__egc *egc,
                                       libxl__colo_restore_state *crs);
 extern void libxl__colo_restore_teardown(struct libxl__egc *egc, void *dcs_void,
diff --git a/tools/libxl/libxl_colo_qdisk.c b/tools/libxl/libxl_colo_qdisk.c
new file mode 100644
index 0000000..c23b81b
--- /dev/null
+++ b/tools/libxl/libxl_colo_qdisk.c
@@ -0,0 +1,230 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/* ========== init() and cleanup() ========== */
+
+int init_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+    /*
+     * We don't know if we use qemu block replication, so
+     * we cannot start block replication here.
+     */
+    return 0;
+}
+
+void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== setup() and teardown() ========== */
+
+static void colo_qdisk_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                             bool primary)
+{
+    const libxl_device_disk *disk = dev->backend_dev;
+    int ret, rc = 0;
+    libxl__colo_qdisk *colo_qdisk = NULL;
+    char port[32];
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = dev->cds;
+    const char *host = disk->colo_host;
+    const char *export_name = disk->colo_export;
+    const int domid = cds->domid;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    if (disk->backend != LIBXL_DISK_BACKEND_QDISK ||
+        !libxl_defbool_val(disk->colo_enable) ||
+        !host || !export_name || (disk->colo_port <= 0) ||
+        !disk->active_disk || !disk->hidden_disk) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    dev->matched = true;
+
+    GCNEW(colo_qdisk);
+    dev->concrete_data = colo_qdisk;
+
+    if (primary) {
+        libxl__colo_save_state *css = cds->concrete_data;
+
+        css->qdisk_used = true;
+        /* NBD server is not ready, so we cannot start block replication now */
+        goto out;
+    } else {
+        libxl__colo_restore_state *crs = cds->concrete_data;
+        sprintf(port, "%d", disk->colo_port);
+
+        if (!crs->qdisk_used) {
+            /* start nbd server */
+            ret = libxl__qmp_nbd_server_start(gc, domid, host, port);
+            if (ret) {
+                rc = ERROR_FAIL;
+                goto out;
+            }
+            crs->host = host;
+            crs->port = port;
+        } else {
+            if (strcmp(crs->host, host) || strcmp(crs->port, port)) {
+                LOG(ERROR, "The host and port of all disks must be the same");
+                rc = ERROR_FAIL;
+                goto out;
+            }
+        }
+
+        crs->qdisk_used = true;
+
+        ret = libxl__qmp_nbd_server_add(gc, domid, export_name);
+        if (ret)
+            rc = ERROR_FAIL;
+
+        colo_qdisk->setuped = true;
+    }
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_qdisk_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                                bool primary)
+{
+    int ret, rc = 0;
+    const libxl__colo_qdisk *colo_qdisk = dev->concrete_data;
+    const libxl_device_disk *disk = dev->backend_dev;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = dev->cds;
+    const int domid = cds->domid;
+    const char *export_name = disk->colo_export;
+
+    EGC_GC;
+
+    if (primary) {
+        if (!colo_qdisk->setuped)
+            goto out;
+
+        /*
+         * There is no way to get the child name, but we know it is children.1
+         */
+        ret = libxl__qmp_x_blockdev_change(gc, domid, export_name,
+                                           "children.1", NULL);
+        if (ret)
+            rc = ERROR_FAIL;
+    } else {
+        libxl__colo_restore_state *crs = cds->concrete_data;
+
+        if (crs->qdisk_used) {
+            ret = libxl__qmp_nbd_server_stop(gc, domid);
+            if (ret)
+                rc = ERROR_FAIL;
+        }
+    }
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/* ========== checkpointing APIs ========== */
+
+static void colo_qdisk_save_preresume(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_qdisk *colo_qdisk = dev->concrete_data;
+    const libxl_device_disk *disk = dev->backend_dev;
+    int ret, rc = 0;
+    char *node = NULL;
+    char *cmd = NULL;
+
+    /* Convenience aliases */
+    const int domid = dev->cds->domid;
+    const char *host = disk->colo_host;
+    int port = disk->colo_port;
+    const char *export_name = disk->colo_export;
+
+    EGC_GC;
+
+    if (colo_qdisk->setuped)
+        goto out;
+
+    /* qmp command doesn't support the driver "nbd" */
+    node = GCSPRINTF("colo_node%d",
+                     libxl__device_disk_dev_number(disk->vdev, NULL, NULL));
+    cmd = GCSPRINTF("drive_add buddy driver=replication,mode=primary,"
+                    "file.driver=nbd,file.host=%s,file.port=%d,"
+                    "file.export=%s,node-name=%s,if=none",
+                    host, port, export_name, node);
+    ret = libxl__qmp_hmp(gc, domid, cmd);
+    if (ret)
+        rc = ERROR_FAIL;
+
+    ret = libxl__qmp_x_blockdev_change(gc, domid, export_name, NULL, node);
+    if (ret)
+        rc = ERROR_FAIL;
+
+    colo_qdisk->setuped = true;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/* ======== primary ======== */
+
+static void colo_qdisk_save_setup(libxl__egc *egc,
+                                  libxl__checkpoint_device *dev)
+{
+    colo_qdisk_setup(egc, dev, true);
+}
+
+static void colo_qdisk_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    colo_qdisk_teardown(egc, dev, true);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_qdisk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = colo_qdisk_save_setup,
+    .teardown = colo_qdisk_save_teardown,
+    .preresume = colo_qdisk_save_preresume,
+};
+
+/* ======== secondary ======== */
+
+static void colo_qdisk_restore_setup(libxl__egc *egc,
+                                     libxl__checkpoint_device *dev)
+{
+    colo_qdisk_setup(egc, dev, false);
+}
+
+static void colo_qdisk_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    colo_qdisk_teardown(egc, dev, false);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = colo_qdisk_restore_setup,
+    .teardown = colo_qdisk_restore_teardown,
+};
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 04b02d8..2ab69ed 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -37,7 +37,10 @@ struct libxl__colo_restore_checkpoint_state {
                      int);
 };
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
+
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    &colo_restore_device_qdisk,
     NULL,
 };
 
@@ -137,7 +140,11 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_qdisk(cds);
+    if (rc)  goto out;
+
     rc = 0;
+out:
     return rc;
 }
 
@@ -145,6 +152,8 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
+
+    cleanup_subkind_qdisk(cds);
 }
 
 /* ================ colo: setup restore environment ================ */
@@ -213,6 +222,8 @@ void libxl__colo_restore_setup(libxl__egc *egc,
     GCNEW(crcs);
     crs->crcs = crcs;
     crcs->crs = crs;
+    crs->qdisk_setuped = false;
+    crs->qdisk_used = false;
 
     /* setup dsps */
     crcs->dsps.ao = ao;
@@ -301,6 +312,11 @@ void libxl__colo_restore_teardown(libxl__egc *egc, void *dcs_void,
     }
     libxl__xc_domain_restore_done(egc, dcs, ret, retval, errnoval);
 
+    if (crs->qdisk_setuped) {
+        libxl__qmp_stop_replication(gc, crs->domid, false);
+        crs->qdisk_setuped = false;
+    }
+
     crcs->saved_rc = rc;
     if (!crcs->teardown_devices) {
         colo_restore_teardown_devices_done(egc, &dcs->cds, 0);
@@ -573,6 +589,13 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    if (crs->qdisk_setuped) {
+        if (libxl__qmp_do_checkpoint(gc, crs->domid)) {
+            LOG(ERROR, "doing checkpoint fails");
+            goto out;
+        }
+    }
+
     colo_restore_resume_vm(egc, crcs);
 
     return;
@@ -730,8 +753,8 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
     STATE_AO_GC(crs->ao);
 
-    /* TODO: disk/nic support */
-    cds->device_kind_flags = 0;
+    /* TODO: nic support */
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
     cds->callback = colo_restore_setup_cds_done;
     cds->ao = ao;
     cds->domid = crs->domid;
@@ -768,6 +791,14 @@ static void colo_restore_setup_cds_done(libxl__egc *egc,
         goto out;
     }
 
+    if (crs->qdisk_used && !crs->qdisk_setuped) {
+        if (libxl__qmp_start_replication(gc, crs->domid, false)) {
+            LOG(ERROR, "starting replication fails");
+            goto out;
+        }
+        crs->qdisk_setuped = true;
+    }
+
     colo_send_svm_ready(egc, crcs);
 
     return;
@@ -922,13 +953,18 @@ static void colo_suspend_vm_done(libxl__egc *egc,
 
     crcs->status = LIBXL_COLO_SUSPENDED;
 
+    if (libxl__qmp_get_replication_error(gc, crs->domid)) {
+        LOG(ERROR, "replication error occurs when secondary vm is running");
+        goto out;
+    }
+
     cds->callback = colo_restore_postsuspend_cb;
     libxl__checkpoint_devices_postsuspend(egc, cds);
 
     return;
 
 out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, !rc);
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, 0);
 }
 
 static void colo_restore_postsuspend_cb(libxl__egc *egc,
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index cca6bde..d73e632 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -18,7 +18,10 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
+
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    &colo_save_device_qdisk,
     NULL,
 };
 
@@ -30,7 +33,11 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_qdisk(cds);
+    if (rc) goto out;
+
     rc = 0;
+out:
     return rc;
 }
 
@@ -38,6 +45,8 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
+
+    cleanup_subkind_qdisk(cds);
 }
 
 /* ================= colo: setup save environment ================= */
@@ -79,9 +88,12 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->send_fd = dss->fd;
     css->recv_fd = dss->recv_fd;
     css->svm_running = false;
+    css->paused = true;
+    css->qdisk_setuped = false;
+    css->qdisk_used = false;
 
-    /* TODO: disk/nic support */
-    cds->device_kind_flags = 0;
+    /* TODO: nic support */
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
     cds->ops = colo_ops;
     cds->callback = colo_save_setup_done;
     cds->ao = ao;
@@ -163,6 +175,11 @@ void libxl__colo_save_teardown(libxl__egc *egc,
 
     libxl__stream_read_abort(egc, &css->srs, 1);
 
+    if (css->qdisk_setuped) {
+        libxl__qmp_stop_replication(gc, dss->domid, true);
+        css->qdisk_setuped = false;
+    }
+
     dss->cds.callback = colo_teardown_done;
     libxl__checkpoint_devices_teardown(egc, &dss->cds);
     return;
@@ -291,6 +308,11 @@ static void colo_read_svm_suspended_done(libxl__egc *egc,
         goto out;
     }
 
+    if (!css->paused && libxl__qmp_get_replication_error(gc, dss->domid)) {
+        LOG(ERROR, "replication error occurs when primary vm is running");
+        goto out;
+    }
+
     ok = 1;
 
 out:
@@ -389,12 +411,40 @@ static void colo_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    if (css->qdisk_used && !css->qdisk_setuped) {
+        if (libxl__qmp_start_replication(gc, dss->domid, true)) {
+            LOG(ERROR, "starting replication fails");
+            goto out;
+        }
+        css->qdisk_setuped = true;
+    }
+
+    if (!css->paused) {
+        if (libxl__qmp_do_checkpoint(gc, dss->domid)) {
+            LOG(ERROR, "doing checkpoint fails");
+            goto out;
+        }
+    }
+
     /* Resumes the domain and the device model */
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) {
         LOG(ERROR, "cannot resume primary vm");
         goto out;
     }
 
+    /*
+     * The guest should be paused before doing colo because there is
+     * no disk migration.
+     */
+    if (css->paused) {
+        rc = libxl_domain_unpause(CTX, dss->domid);
+        if (rc) {
+            LOG(ERROR, "cannot unpause primary vm");
+            goto out;
+        }
+        css->paused = false;
+    }
+
     /* read CHECKPOINT_SVM_RESUMED */
     css->callback = colo_read_svm_resumed_done;
     css->srs.checkpoint_callback = colo_common_read_stream_done;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 148df05..c3366d7 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3211,6 +3211,11 @@ struct libxl__colo_save_state {
     libxl__stream_read_state srs;
     void (*callback)(libxl__egc *, libxl__colo_save_state *, int);
     bool svm_running;
+    bool paused;
+
+    /* private, used by qdisk block replication */
+    bool qdisk_used;
+    bool qdisk_setuped;
 };
 
 typedef struct libxl__logdirty_switch {
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 22/26] COLO proxy: implement setup/teardown/preresume/postresume/checkpoint
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (20 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 21/26] COLO: use qemu block replication Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 23/26] COLO nic: implement COLO nic subkind Changlong Xie
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

setup/teardown/preresume/postresume/checkpoint of COLO proxy module.
we use netlink to communicate with proxy module.
About colo-proxy module:
http://www.spinics.net/lists/netdev/msg333520.html
https://github.com/wencongyang/colo-proxy
How to use:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/Makefile           |   1 +
 tools/libxl/libxl_colo.h       |  32 +++++
 tools/libxl/libxl_colo_proxy.c | 277 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h   |   3 +
 4 files changed, 313 insertions(+)
 create mode 100644 tools/libxl/libxl_colo_proxy.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 701c069..72f3b1a 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -67,6 +67,7 @@ endif
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
+LIBXL_OBJS-y += libxl_colo_proxy.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 90345f4..a529ce8 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -16,21 +16,43 @@
 #ifndef LIBXL_COLO_H
 #define LIBXL_COLO_H
 
+#include <linux/netlink.h>
+
 struct libxl__ao;
 struct libxl__egc;
 struct libxl__colo_save_state;
 struct libxl__checkpoint_devices_state;
 
+/* Consistent with the new COLO netlink channel in kernel side */
+#define NETLINK_COLO 28
+
 enum {
     LIBXL_COLO_SETUPED,
     LIBXL_COLO_SUSPENDED,
     LIBXL_COLO_RESUMED,
 };
 
+enum colo_netlink_op {
+    COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1),
+    COLO_CHECKPOINT,
+    COLO_FAILOVER,
+    COLO_PROXY_INIT,
+    COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
+};
+
 typedef struct libxl__colo_qdisk {
     bool setuped;
 } libxl__colo_qdisk;
 
+typedef struct libxl__colo_proxy_state libxl__colo_proxy_state;
+struct libxl__colo_proxy_state {
+    /* set by caller of colo_proxy_setup */
+    struct libxl__ao *ao;
+
+    int sock_fd;
+    int index;
+};
+
 typedef struct libxl__domain_create_state libxl__domain_create_state;
 typedef void libxl__domain_create_cb(struct libxl__egc *egc,
                                      libxl__domain_create_state *dcs,
@@ -58,6 +80,9 @@ struct libxl__colo_restore_state {
     bool qdisk_setuped;
     const char *host;
     const char *port;
+
+    /* private, used by colo-proxy */
+    libxl__colo_proxy_state cps;
 };
 
 int init_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
@@ -73,4 +98,11 @@ extern void libxl__colo_save_setup(struct libxl__egc *egc,
 extern void libxl__colo_save_teardown(struct libxl__egc *egc,
                                       struct libxl__colo_save_state *css,
                                       int rc);
+extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
+extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
+extern void colo_proxy_preresume(libxl__colo_proxy_state *cps);
+extern void colo_proxy_postresume(libxl__colo_proxy_state *cps);
+extern int colo_proxy_checkpoint(libxl__colo_proxy_state *cps,
+                                 unsigned int timeout_us);
+
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
new file mode 100644
index 0000000..991bd0d
--- /dev/null
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -0,0 +1,277 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Yang Hongyang <hongyang.yang@easystack.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/* ========= colo-proxy: helper functions ========== */
+
+static int colo_proxy_send(libxl__colo_proxy_state *cps, uint8_t *buff,
+                           uint64_t size, int type)
+{
+    struct sockaddr_nl sa;
+    struct nlmsghdr msg;
+    struct iovec iov;
+    struct msghdr mh;
+    int ret;
+
+    STATE_AO_GC(cps->ao);
+
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_pid = 0;
+    sa.nl_groups = 0;
+
+    msg.nlmsg_len = NLMSG_SPACE(0);
+    msg.nlmsg_flags = NLM_F_REQUEST;
+    if (type == COLO_PROXY_INIT)
+        msg.nlmsg_flags |= NLM_F_ACK;
+    msg.nlmsg_seq = 0;
+    msg.nlmsg_pid = cps->index;
+    msg.nlmsg_type = type;
+
+    iov.iov_base = &msg;
+    iov.iov_len = msg.nlmsg_len;
+
+    mh.msg_name = &sa;
+    mh.msg_namelen = sizeof(sa);
+    mh.msg_iov = &iov;
+    mh.msg_iovlen = 1;
+    mh.msg_control = NULL;
+    mh.msg_controllen = 0;
+    mh.msg_flags = 0;
+
+    ret = sendmsg(cps->sock_fd, &mh, 0);
+    if (ret <= 0) {
+        LOG(ERROR, "can't send msg to kernel by netlink: %s",
+            strerror(errno));
+    }
+
+    return ret;
+}
+
+/* error: return -1, otherwise return 0 */
+static int64_t colo_proxy_recv(libxl__colo_proxy_state *cps, uint8_t **buff,
+                               unsigned int timeout_us)
+{
+    struct sockaddr_nl sa;
+    struct iovec iov;
+    struct msghdr mh = {
+        .msg_name = &sa,
+        .msg_namelen = sizeof(sa),
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+    };
+    struct timeval tv;
+    uint32_t size = 16384;
+    int64_t len = 0;
+    int ret;
+
+    STATE_AO_GC(cps->ao);
+    uint8_t *tmp = libxl__malloc(NOGC, size);
+
+    if (timeout_us) {
+        tv.tv_sec = timeout_us / 1000000;
+        tv.tv_usec = timeout_us % 1000000;
+        setsockopt(cps->sock_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    }
+
+    iov.iov_base = tmp;
+    iov.iov_len = size;
+next:
+    ret = recvmsg(cps->sock_fd, &mh, 0);
+    if (ret <= 0) {
+        if (errno != EAGAIN && errno != EWOULDBLOCK)
+            LOGE(ERROR, "can't recv msg from kernel by netlink");
+        goto err;
+    }
+
+    len += ret;
+    if (mh.msg_flags & MSG_TRUNC) {
+        size += 16384;
+        tmp = libxl__realloc(NOGC, tmp, size);
+        iov.iov_base = tmp + len;
+        iov.iov_len = size - len;
+        goto next;
+    }
+
+    *buff = tmp;
+    ret = len;
+    goto out;
+
+err:
+    free(tmp);
+    *buff = NULL;
+
+out:
+    if (timeout_us) {
+        tv.tv_sec = 0;
+        tv.tv_usec = 0;
+        setsockopt(cps->sock_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    }
+    return ret;
+}
+
+/* ========= colo-proxy: setup and teardown ========== */
+
+int colo_proxy_setup(libxl__colo_proxy_state *cps)
+{
+    int skfd = 0;
+    struct sockaddr_nl sa;
+    struct nlmsghdr *h;
+    int i = 1;
+    int ret = ERROR_FAIL;
+    uint8_t *buff = NULL;
+    int64_t size;
+
+    STATE_AO_GC(cps->ao);
+
+    skfd = socket(PF_NETLINK, SOCK_RAW, NETLINK_COLO);
+    if (skfd < 0) {
+        LOG(ERROR, "can not create a netlink socket: %s", strerror(errno));
+        goto out;
+    }
+    cps->sock_fd = skfd;
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_groups = 0;
+retry:
+    sa.nl_pid = i++;
+
+    if (i > 10) {
+        LOG(ERROR, "netlink bind error");
+        goto out;
+    }
+
+    ret = bind(skfd, (struct sockaddr *)&sa, sizeof(sa));
+    if (ret < 0 && errno == EADDRINUSE) {
+        LOG(ERROR, "colo index %d has already in used", sa.nl_pid);
+        goto retry;
+    } else if (ret < 0) {
+        LOG(ERROR, "netlink bind error");
+        goto out;
+    }
+
+    cps->index = sa.nl_pid;
+    ret = colo_proxy_send(cps, NULL, 0, COLO_PROXY_INIT);
+    if (ret < 0)
+        goto out;
+
+    /* receive ack */
+    size = colo_proxy_recv(cps, &buff, 500000);
+    if (size < 0) {
+        LOG(ERROR, "Can't recv msg from kernel by netlink: %s",
+            strerror(errno));
+        goto out;
+    }
+
+    if (size) {
+        h = (struct nlmsghdr *)buff;
+        if (h->nlmsg_type == NLMSG_ERROR) {
+            /* ack's type is NLMSG_ERROR */
+            struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
+
+            if (size - sizeof(*h) < sizeof(*err)) {
+                LOG(ERROR, "NLMSG_LENGTH is too short");
+                goto out;
+            }
+
+            if (err->error) {
+                LOG(ERROR, "NLMSG_ERROR contains error %d", err->error);
+                goto out;
+            }
+        }
+    }
+
+    ret = 0;
+
+out:
+    free(buff);
+    if (ret) {
+        close(cps->sock_fd);
+        cps->sock_fd = -1;
+    }
+    return ret;
+}
+
+void colo_proxy_teardown(libxl__colo_proxy_state *cps)
+{
+    if (cps->sock_fd >= 0) {
+        close(cps->sock_fd);
+        cps->sock_fd = -1;
+    }
+}
+
+/* ========= colo-proxy: preresume, postresume and checkpoint ========== */
+
+void colo_proxy_preresume(libxl__colo_proxy_state *cps)
+{
+    colo_proxy_send(cps, NULL, 0, COLO_CHECKPOINT);
+    /* TODO: need to handle if the call fails... */
+}
+
+void colo_proxy_postresume(libxl__colo_proxy_state *cps)
+{
+    /* nothing to do... */
+}
+
+typedef struct colo_msg {
+    bool is_checkpoint;
+} colo_msg;
+
+/*
+ * Return value:
+ * -1: error
+ *  0: no checkpoint event is received before timeout
+ *  1: do checkpoint
+ */
+int colo_proxy_checkpoint(libxl__colo_proxy_state *cps,
+                          unsigned int timeout_us)
+{
+    uint8_t *buff;
+    int64_t size;
+    struct nlmsghdr *h;
+    struct colo_msg *m;
+    int ret = -1;
+
+    STATE_AO_GC(cps->ao);
+
+    size = colo_proxy_recv(cps, &buff, timeout_us);
+
+    /* timeout, return no checkpoint message. */
+    if (size <= 0)
+        return 0;
+
+    h = (struct nlmsghdr *) buff;
+
+    if (h->nlmsg_type == NLMSG_ERROR) {
+        LOG(ERROR, "receive NLMSG_ERROR");
+        goto out;
+    }
+
+    if (h->nlmsg_len < NLMSG_LENGTH(sizeof(*m))) {
+        LOG(ERROR, "NLMSG_LENGTH is too short");
+        goto out;
+    }
+
+    m = NLMSG_DATA(h);
+
+    ret = m->is_checkpoint ? 1 : 0;
+
+out:
+    free(buff);
+    return ret;
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c3366d7..8f02222 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3216,6 +3216,9 @@ struct libxl__colo_save_state {
     /* private, used by qdisk block replication */
     bool qdisk_used;
     bool qdisk_setuped;
+
+    /* private, used by colo-proxy */
+    libxl__colo_proxy_state cps;
 };
 
 typedef struct libxl__logdirty_switch {
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 23/26] COLO nic: implement COLO nic subkind
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (21 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 22/26] COLO proxy: implement setup/teardown/preresume/postresume/checkpoint Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25 12:56   ` Wei Liu
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 24/26] setup and control colo proxy on primary side Changlong Xie
                   ` (5 subsequent siblings)
  28 siblings, 2 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/colo-proxy-setup | 135 +++++++++++++++
 tools/libxl/Makefile                 |   1 +
 tools/libxl/libxl_colo.h             |  10 ++
 tools/libxl/libxl_colo_nic.c         | 320 +++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h         |   2 +
 tools/libxl/libxl_types.idl          |   1 +
 7 files changed, 470 insertions(+)
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo_nic.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 6e10118..9bb852b 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -28,6 +28,7 @@ XEN_SCRIPTS += block-iscsi
 XEN_SCRIPTS += block-tap
 XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
+XEN_SCRIPTS += colo-proxy-setup
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
 
diff --git a/tools/hotplug/Linux/colo-proxy-setup b/tools/hotplug/Linux/colo-proxy-setup
new file mode 100755
index 0000000..94e2034
--- /dev/null
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -0,0 +1,135 @@
+#! /bin/bash
+
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+. "$dir/hotplugpath.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+    echo "Invalid command: $command"
+    log err "Invalid command: $command"
+    exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${forwarddev:?}
+: ${mode:?}
+: ${index:?}
+: ${bridge:?}
+
+forwardbr="colobr0"
+
+if [ "$mode" != "primary" -a "$mode" != "secondary" ]
+then
+    echo "Invalid mode: $mode"
+    log err "Invalid mode: $mode"
+    exit 1
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 1
+fi
+
+function setup_primary()
+{
+    do_without_error tc qdisc add dev $vifname root handle 1: prio
+    do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \
+        12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \
+        dev $forwarddev
+
+    do_without_error modprobe nf_conntrack_ipv4
+    do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev
+
+    iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -I INPUT -i $forwarddev -j MARK --set-mark $index
+}
+
+function teardown_primary()
+{
+    do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc qdisc del dev $vifname root handle 1: prio
+
+    do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error ip6tables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -F
+    do_without_error rmmod xt_PMYCOLO
+}
+
+function setup_secondary()
+{
+    do_without_error brctl delif $bridge $vifname
+    do_without_error brctl addbr $forwardbr
+    do_without_error brctl addif $forwardbr $vifname
+    do_without_error brctl addif $forwardbr $forwarddev
+    do_without_error ip link set dev $forwardbr up
+    do_without_error modprobe xt_SECCOLO
+
+    iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+}
+
+function teardown_secondary()
+{
+    do_without_error brctl delif $forwardbr $forwarddev
+    do_without_error brctl delif $forwardbr $vifname
+    do_without_error brctl delbr $forwardbr
+    do_without_error brctl addif $bridge $vifname
+
+    do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error ip6tables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error rmmod xt_SECCOLO
+}
+
+case "$command" in
+    setup)
+        if [ "$mode" = "primary" ]
+        then
+            setup_primary
+        else
+            setup_secondary
+        fi
+
+        success
+        ;;
+    teardown)
+        if [ "$mode" = "primary" ]
+        then
+            teardown_primary
+        else
+            teardown_secondary
+        fi
+        ;;
+esac
+
+if [ "$mode" = "primary" ]
+then
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev."
+else
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev,"\
+              " forwardbr: $forwardbr."
+fi
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 72f3b1a..a433aaa 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -68,6 +68,7 @@ LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
 LIBXL_OBJS-y += libxl_colo_proxy.o
+LIBXL_OBJS-y += libxl_colo_nic.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index a529ce8..5fbb659 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -40,6 +40,11 @@ enum colo_netlink_op {
     COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
 };
 
+typedef struct libxl__colo_device_nic {
+    int devid;
+    const char *vif;
+} libxl__colo_device_nic;
+
 typedef struct libxl__colo_qdisk {
     bool setuped;
 } libxl__colo_qdisk;
@@ -70,6 +75,7 @@ struct libxl__colo_restore_state {
     int recv_fd;
     int hvm;
     libxl__colo_callback *callback;
+    char *colo_proxy_script;
 
     /* private, colo restore checkpoint state */
     libxl__domain_create_cb *saved_cb;
@@ -89,6 +95,10 @@ int init_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
 
 void cleanup_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
 
+int init_subkind_colo_nic(struct libxl__checkpoint_devices_state *cds);
+
+void cleanup_subkind_colo_nic(struct libxl__checkpoint_devices_state *cds);
+
 extern void libxl__colo_restore_setup(struct libxl__egc *egc,
                                       libxl__colo_restore_state *crs);
 extern void libxl__colo_restore_teardown(struct libxl__egc *egc, void *dcs_void,
diff --git a/tools/libxl/libxl_colo_nic.c b/tools/libxl/libxl_colo_nic.c
new file mode 100644
index 0000000..2e00c28
--- /dev/null
+++ b/tools/libxl/libxl_colo_nic.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+enum {
+    primary,
+    secondary,
+};
+
+/* ========== init() and cleanup() ========== */
+
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+    return 0;
+}
+
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== helper functions ========== */
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                     libxl__async_exec_state *aes,
+                                     int rc, int status);
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status);
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__checkpoint_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->cds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $forwarddev
+ * $mode(primary/secondary)
+ * $index
+ * $bridge
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op,
+                             libxl__colo_proxy_state *cps, int side,
+                             char *colo_proxy_script)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const char *const vif = colo_nic->vif;
+
+    arraysize = 11;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "forwarddev";
+    env[nr++] = libxl__strdup(gc, nic->coloft_forwarddev);
+    env[nr++] = "mode";
+    if (side == primary)
+        env[nr++] = "primary";
+    else
+        env[nr++] = "secondary";
+    env[nr++] = "index";
+    env[nr++] = GCSPRINTF("%d", cps->index);
+    env[nr++] = "bridge";
+    env[nr++] = libxl__strdup(gc, nic->bridge);
+    env[nr++] = NULL;
+    assert(nr == arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = colo_proxy_script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->cds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = colo_save_teardown_script_cb;
+    else
+        aes->callback = colo_save_setup_script_cb;
+}
+
+/* ========== setup() and teardown() ========== */
+
+static void colo_nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                           libxl__colo_proxy_state *cps, int side,
+                           char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices, we begin to setup the nic device
+     */
+    dev->matched = 1;
+
+    if (!nic->coloft_forwarddev) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    GCNEW(colo_nic);
+    dev->concrete_data = colo_nic;
+    colo_nic->devid = nic->devid;
+    colo_nic->vif = get_vifname(dev, nic);
+    if (!colo_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup", cps, side, colo_proxy_script);
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    const char *out_path_base, *hotplug_error = NULL;
+
+    EGC_GC;
+
+    /* Convenience aliases */
+    const uint32_t domid = cds->domid;
+    const int devid = colo_nic->devid;
+    const char *const vif = colo_nic->vif;
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    if (rc)
+        goto out;
+
+    out_path_base = GCSPRINTF("%s/colo_proxy/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "colo_proxy script %s setup failed for vif %s: %s",
+            aes->args[0], vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void colo_nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                              libxl__colo_proxy_state *cps, int side,
+                              char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+
+    if (!colo_nic || !colo_nic->vif) {
+        /* colo nic has not yet been set up, just return */
+        rc = 0;
+        goto out;
+    }
+
+    setup_async_exec(dev, "teardown", cps, side, colo_proxy_script);
+
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/* ======== primary ======== */
+
+static void colo_nic_save_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = dev->cds->concrete_data;
+
+    colo_nic_setup(egc, dev, &css->cps, primary, css->colo_proxy_script);
+}
+
+static void colo_nic_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = dev->cds->concrete_data;
+
+    colo_nic_teardown(egc, dev, &css->cps, primary, css->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_save_setup,
+    .teardown = colo_nic_save_teardown,
+};
+
+/* ======== secondary ======== */
+
+static void colo_nic_restore_setup(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = dev->cds->concrete_data;
+
+    colo_nic_setup(egc, dev, &crs->cps, secondary, crs->colo_proxy_script);
+}
+
+static void colo_nic_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = dev->cds->concrete_data;
+
+    colo_nic_teardown(egc, dev, &crs->cps, secondary, crs->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_restore_setup,
+    .teardown = colo_nic_restore_teardown,
+};
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8f02222..759b8d0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3206,6 +3206,7 @@ typedef struct libxl__colo_save_state libxl__colo_save_state;
 struct libxl__colo_save_state {
     int send_fd;
     int recv_fd;
+    char *colo_proxy_script;
 
     /* private */
     libxl__stream_read_state srs;
@@ -3554,6 +3555,7 @@ struct libxl__domain_create_state {
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
     int guest_domid;
+    const char *colo_proxy_script;
     libxl__domain_build_state build_state;
     libxl__colo_restore_state crs;
     libxl__checkpoint_devices_state cds;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0470423..6893732 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -600,6 +600,7 @@ libxl_device_nic = Struct("device_nic", [
     ("rate_bytes_per_interval", uint64),
     ("rate_interval_usecs", uint32),
     ("gatewaydev", string),
+    ("coloft_forwarddev", string)
     ])
 
 libxl_device_pci = Struct("device_pci", [
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 24/26] setup and control colo proxy on primary side
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (22 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 23/26] COLO nic: implement COLO nic subkind Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-25  6:44 ` [PATCH v13 25/26] setup and control colo proxy on secondary side Changlong Xie
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_colo.h            | 25 ++++++++++
 tools/libxl/libxl_colo_save.c       | 94 +++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_internal.h        |  1 +
 tools/libxl/libxl_remus_disk_drbd.c | 38 ++-------------
 4 files changed, 115 insertions(+), 43 deletions(-)

diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 5fbb659..30fd1dc 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -26,6 +26,31 @@ struct libxl__checkpoint_devices_state;
 /* Consistent with the new COLO netlink channel in kernel side */
 #define NETLINK_COLO 28
 
+/* Maximum time(5s) to wait for colo proxy checkpoit */
+#define COLO_PROXY_CHECKPOINT_TIMEOUT 5000000
+
+#define ASYNC_CALL(egc, ao, child, param, func, callback) do {          \
+    int pid = -1;                                                       \
+    STATE_AO_GC(ao);                                                    \
+                                                                        \
+    pid = libxl__ev_child_fork(gc, child, callback);                    \
+    if (pid == -1) {                                                    \
+        LOG(ERROR, "unable to fork");                                   \
+        goto out;                                                       \
+    }                                                                   \
+                                                                        \
+    if (!pid) {                                                         \
+        /* child */                                                     \
+        func(param);                                                    \
+        /* notreached */                                                \
+        abort();                                                        \
+    }                                                                   \
+                                                                        \
+    return;                                                             \
+out:                                                                    \
+    callback(egc, child, -1, 1);                                        \
+} while (0)
+
 enum {
     LIBXL_COLO_SETUPED,
     LIBXL_COLO_SUSPENDED,
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index d73e632..e2fdc4b 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -18,9 +18,11 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    &colo_save_device_nic,
     &colo_save_device_qdisk,
     NULL,
 };
@@ -33,9 +35,15 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
-    rc = init_subkind_qdisk(cds);
+    rc = init_subkind_colo_nic(cds);
     if (rc) goto out;
 
+    rc = init_subkind_qdisk(cds);
+    if (rc) {
+        cleanup_subkind_colo_nic(cds);
+        goto out;
+    }
+
     rc = 0;
 out:
     return rc;
@@ -46,6 +54,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
 
+    cleanup_subkind_colo_nic(cds);
     cleanup_subkind_qdisk(cds);
 }
 
@@ -91,9 +100,16 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->paused = true;
     css->qdisk_setuped = false;
     css->qdisk_used = false;
+    libxl__ev_child_init(&css->child);
+
+    if (dss->remus->netbufscript)
+        css->colo_proxy_script = libxl__strdup(gc, dss->remus->netbufscript);
+    else
+        css->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                           libxl__xen_script_dir_path());
 
-    /* TODO: nic support */
-    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VIF) |
+                             (1 << LIBXL__DEVICE_KIND_VBD);
     cds->ops = colo_ops;
     cds->callback = colo_save_setup_done;
     cds->ao = ao;
@@ -104,6 +120,12 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->srs.fd = css->recv_fd;
     css->srs.back_channel = true;
     libxl__stream_read_start(egc, &css->srs);
+    css->cps.ao = ao;
+    if (colo_proxy_setup(&css->cps)) {
+        LOG(ERROR, "COLO: failed to setup colo proxy for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
 
     if (init_device_subkind(cds))
         goto out;
@@ -193,6 +215,7 @@ static void colo_teardown_done(libxl__egc *egc,
     libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
 
     cleanup_device_subkind(cds);
+    colo_proxy_teardown(&css->cps);
     dss->callback(egc, dss, rc);
 }
 
@@ -387,6 +410,8 @@ static void colo_read_svm_ready_done(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_preresume(&css->cps);
+
     css->svm_running = true;
     dss->cds.callback = colo_preresume_cb;
     libxl__checkpoint_devices_preresume(egc, &dss->cds);
@@ -471,6 +496,8 @@ static void colo_read_svm_resumed_done(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_postresume(&css->cps);
+
     ok = 1;
 
 out:
@@ -479,6 +506,61 @@ out:
 
 /* ===================== colo: wait new checkpoint ===================== */
 
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css);
+static void colo_proxy_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       int pid,
+                                       int status);
+
+static void colo_proxy_wait_for_checkpoint(libxl__egc *egc,
+                                           libxl__colo_save_state *css)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    ASYNC_CALL(egc, dss->cds.ao, &css->child, css,
+               colo_proxy_async_wait_for_checkpoint,
+               colo_proxy_async_call_done);
+}
+
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css)
+{
+    int req;
+
+    req = colo_proxy_checkpoint(&css->cps, COLO_PROXY_CHECKPOINT_TIMEOUT);
+    if (req < 0) {
+        /* some error happens */
+        _exit(1);
+    } else if (!req) {
+        /* no checkpoint is needed, do a checkpoint every 5s */
+        _exit(0);
+    } else {
+        /* net packets is not consistent, we need to start a checkpoint */
+        _exit(0);
+    }
+}
+
+static void colo_proxy_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       int pid,
+                                       int status)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(child, *css, child);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    EGC_GC;
+
+    if (status) {
+        LOG(ERROR, "failed to wait for new checkpoint");
+        colo_start_new_checkpoint(egc, &dss->cds, ERROR_FAIL);
+        return;
+    }
+
+    colo_start_new_checkpoint(egc, &dss->cds, 0);
+}
+
 /*
  * Do the following things:
  * 1. do commit
@@ -488,9 +570,6 @@ out:
 static void colo_device_commit_cb(libxl__egc *egc,
                                   libxl__checkpoint_devices_state *cds,
                                   int rc);
-static void colo_start_new_checkpoint(libxl__egc *egc,
-                                      libxl__checkpoint_devices_state *cds,
-                                      int rc);
 
 static void libxl__colo_save_domain_wait_checkpoint_callback(void *data)
 {
@@ -520,8 +599,7 @@ static void colo_device_commit_cb(libxl__egc *egc,
         goto out;
     }
 
-    /* TODO: wait a new checkpoint */
-    colo_start_new_checkpoint(egc, cds, 0);
+    colo_proxy_wait_for_checkpoint(egc, css);
     return;
 
 out:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 759b8d0..e3c919d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3220,6 +3220,7 @@ struct libxl__colo_save_state {
 
     /* private, used by colo-proxy */
     libxl__colo_proxy_state cps;
+    libxl__ev_child child;
 };
 
 typedef struct libxl__logdirty_switch {
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 844dd66..d08e470 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -42,38 +42,6 @@ void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
     return;
 }
 
-/*----- helper functions, for async calls -----*/
-static void drbd_async_call(libxl__egc *egc,
-                            libxl__checkpoint_device *dev,
-                            void func(libxl__checkpoint_device *),
-                            libxl__ev_child_callback callback)
-{
-    int pid, rc;
-    libxl__ao_device *aodev = &dev->aodev;
-    STATE_AO_GC(dev->cds->ao);
-
-    /* Fork and call */
-    pid = libxl__ev_child_fork(gc, &aodev->child, callback);
-    if (pid == -1) {
-        LOG(ERROR, "unable to fork");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    if (!pid) {
-        /* child */
-        func(dev);
-        /* notreached */
-        abort();
-    }
-
-    return;
-
-out:
-    aodev->rc = rc;
-    aodev->callback(egc, aodev);
-}
-
 /*----- match(), setup() and teardown() -----*/
 
 /* callbacks */
@@ -213,9 +181,9 @@ static void drbd_preresume_async(libxl__checkpoint_device *dev);
 
 static void drbd_preresume(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->cds->ao);
-
-    drbd_async_call(egc, dev, drbd_preresume_async, checkpoint_async_call_done);
+    ASYNC_CALL(egc, dev->cds->ao, &dev->aodev.child, dev,
+               drbd_preresume_async,
+               checkpoint_async_call_done);
 }
 
 static void drbd_preresume_async(libxl__checkpoint_device *dev)
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 25/26] setup and control colo proxy on secondary side
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (23 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 24/26] setup and control colo proxy on primary side Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-30 14:24   ` Ian Jackson
  2016-03-25  6:44 ` [PATCH v13 26/26] cmdline switches and config vars to control colo-proxy Changlong Xie
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 tools/libxl/libxl_colo_restore.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 2ab69ed..c8ad796 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -37,9 +37,11 @@ struct libxl__colo_restore_checkpoint_state {
                      int);
 };
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    &colo_restore_device_nic,
     &colo_restore_device_qdisk,
     NULL,
 };
@@ -140,8 +142,14 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_colo_nic(cds);
+    if (rc) goto out;
+
     rc = init_subkind_qdisk(cds);
-    if (rc)  goto out;
+    if (rc) {
+        cleanup_subkind_colo_nic(cds);
+        goto out;
+    }
 
     rc = 0;
 out:
@@ -153,6 +161,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
 
+    cleanup_subkind_colo_nic(cds);
     cleanup_subkind_qdisk(cds);
 }
 
@@ -343,6 +352,8 @@ static void colo_restore_teardown_devices_done(libxl__egc *egc,
     if (crcs->teardown_devices)
         cleanup_device_subkind(cds);
 
+    colo_proxy_teardown(&crs->cps);
+
     rc = crcs->saved_rc;
     if (!rc) {
         crcs->callback = do_failover_done;
@@ -596,6 +607,8 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
         }
     }
 
+    colo_proxy_preresume(&crs->cps);
+
     colo_restore_resume_vm(egc, crcs);
 
     return;
@@ -632,6 +645,8 @@ static void colo_resume_vm_done(libxl__egc *egc,
 
     crcs->status = LIBXL_COLO_RESUMED;
 
+    colo_proxy_postresume(&crs->cps);
+
     /* avoid calling stream->completion_callback() more than once */
     if (crs->saved_cb) {
         dcs->callback = crs->saved_cb;
@@ -753,13 +768,20 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
     STATE_AO_GC(crs->ao);
 
-    /* TODO: nic support */
-    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VIF) |
+                             (1 << LIBXL__DEVICE_KIND_VBD);
     cds->callback = colo_restore_setup_cds_done;
     cds->ao = ao;
     cds->domid = crs->domid;
     cds->ops = colo_restore_ops;
 
+    crs->cps.ao = ao;
+    if (colo_proxy_setup(&crs->cps)) {
+        LOG(ERROR, "COLO: failed to setup colo proxy for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
+
     if (init_device_subkind(cds))
         goto out;
 
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 26/26] cmdline switches and config vars to control colo-proxy
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (24 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 25/26] setup and control colo proxy on secondary side Changlong Xie
@ 2016-03-25  6:44 ` Changlong Xie
  2016-03-28  3:47   ` [PATCH v13.1 " Changlong Xie
  2016-03-25 15:51 ` [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wei Liu
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-25  6:44 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Changlong Xie, Wen Congyang, Li Zhijian,
	Gui Jianfeng, Jiang Yunhong, Dong Eddie, Anthony Perard,
	Shriram Rajagopalan, Yang Hongyang

From: Wen Congyang <wency@cn.fujitsu.com>

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5           |  6 ++++++
 docs/man/xl.pod.1                |  7 ++++--
 tools/libxl/libxl.c              |  6 ++++++
 tools/libxl/libxl_colo_restore.c |  5 +++++
 tools/libxl/libxl_create.c       |  9 ++++++--
 tools/libxl/libxl_types.idl      |  1 +
 tools/libxl/xl.c                 |  3 +++
 tools/libxl/xl.h                 |  1 +
 tools/libxl/xl_cmdimpl.c         | 46 +++++++++++++++++++++++++++++++---------
 9 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 8ae19bb..8f7fd28 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -111,6 +111,12 @@ Configures the default script used by Remus to setup network buffering.
 
 Default: C</etc/xen/scripts/remus-netbuf-setup>
 
+=item B<colo.default.proxyscript="PATH">
+
+Configures the default script used by COLO to setup colo-proxy.
+
+Default: C</etc/xen/scripts/colo-proxy-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 6788465..6dac73e 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -458,8 +458,6 @@ Remus support in xl is still in experimental (proof-of-concept) phase.
 Disk replication support is limited to DRBD disks.
 
 COLO support in xl is still in experimental (proof-of-concept) phase.
-There is no support for network, so the guest will confuse its network
-peers at the moment.
 
 =back
 
@@ -483,6 +481,11 @@ and it's used by secondary.
 =item B<hidden-disk>    :Primary's modified contents will be buffered in this
 disk, and it's used by secondary.
 
+(b) An example for COLO network configuration: vif =[ '...,forwarddev=xxx,...']
+
+=item B<forwarddev>     :Forward devices for primary and secondary, there are
+directly connected.
+
 =back
 
 B<OPTIONS>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 63fbe16..aabf3a7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3378,6 +3378,11 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid,
         flexarray_append(back, nic->ifname);
     }
 
+    if (nic->coloft_forwarddev) {
+        flexarray_append(back, "forwarddev");
+        flexarray_append(back, nic->coloft_forwarddev);
+    }
+
     flexarray_append(back, "mac");
     flexarray_append(back,GCSPRINTF(LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nic->mac)));
     if (nic->ip) {
@@ -3500,6 +3505,7 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc,
     nic->ip = READ_BACKEND(NOGC, "ip");
     nic->bridge = READ_BACKEND(NOGC, "bridge");
     nic->script = READ_BACKEND(NOGC, "script");
+    nic->coloft_forwarddev = READ_BACKEND(NOGC, "forwarddev");
 
     /* vif_ioemu nics use the same xenstore entries as vif interfaces */
     tmp = READ_BACKEND(gc, "type");
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index c8ad796..3483f39 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -233,6 +233,11 @@ void libxl__colo_restore_setup(libxl__egc *egc,
     crcs->crs = crs;
     crs->qdisk_setuped = false;
     crs->qdisk_used = false;
+    if (dcs->colo_proxy_script)
+        crs->colo_proxy_script = libxl__strdup(gc, dcs->colo_proxy_script);
+    else
+        crs->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                           libxl__xen_script_dir_path());
 
     /* setup dsps */
     crcs->dsps.ao = ao;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e2ec25c..d6028aa 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1690,6 +1690,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid, int restore_fd, int send_back_fd,
                             const libxl_domain_restore_params *params,
+                            const char *colo_proxy_script,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1713,6 +1714,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     }
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.domid_soft_reset = INVALID_DOMID;
+    cdcs->dcs.colo_proxy_script = colo_proxy_script;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
     cdcs->domid_out = domid;
 
@@ -1900,7 +1902,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, NULL, NULL,
                             ao_how, aop_console_how);
 }
 
@@ -1911,14 +1913,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
+    char *colo_proxy_script = NULL;
+
     if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        colo_proxy_script = params->colo_proxy_script;
         set_disk_colo_restore(d_config);
     } else {
         unset_disk_colo_restore(d_config);
     }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
-                            params, ao_how, aop_console_how);
+                            params, colo_proxy_script, ao_how, aop_console_how);
 }
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 6893732..03eb860 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -386,6 +386,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
     ("stream_version", uint32, {'init_val': '1'}),
+    ("colo_proxy_script", string),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index dfae84a..a272258 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -45,6 +45,7 @@ char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
 char *default_remus_netbufscript = NULL;
+char *default_colo_proxy_script = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -179,6 +180,8 @@ static void parse_global_config(const char *configfile,
 
     xlu_cfg_replace_string (config, "remus.default.netbufscript",
         &default_remus_netbufscript, 0);
+    xlu_cfg_replace_string (config, "colo.default.proxyscript",
+        &default_colo_proxy_script, 0);
 
     xlu_cfg_destroy(config);
 }
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 309627a..e601ca1 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -194,6 +194,7 @@ extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
 extern char *default_remus_netbufscript;
+extern char *default_colo_proxy_script;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25bd81a..ba77862 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -158,6 +158,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    char *colo_proxy_script;
     int migrate_fd; /* -1 means none */
     int send_back_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
@@ -1053,6 +1054,8 @@ static int parse_nic_config(libxl_device_nic *nic, XLU_Config **config, char *to
         replace_string(&nic->model, oparg);
     } else if (MATCH_OPTION("rate", token, oparg)) {
         parse_vif_rate(config, oparg, nic);
+    } else if (MATCH_OPTION("forwarddev", token, oparg)) {
+        replace_string(&nic->coloft_forwarddev, oparg);
     } else if (MATCH_OPTION("accel", token, oparg)) {
         fprintf(stderr, "the accel parameter for vifs is currently not supported\n");
     } else {
@@ -3001,6 +3004,7 @@ start:
         params.checkpointed_stream = dom_info->checkpointed_stream;
         params.stream_version =
             (hdr.mandatory_flags & XL_MANDATORY_FLAG_STREAMv2) ? 2 : 1;
+        params.colo_proxy_script = dom_info->colo_proxy_script;
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
@@ -4733,7 +4737,8 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 
 static void migrate_receive(int debug, int daemonize, int monitor,
                             int send_fd, int recv_fd,
-                            libxl_checkpointed_stream checkpointed)
+                            libxl_checkpointed_stream checkpointed,
+                            char *colo_proxy_script)
 {
     uint32_t domid;
     int rc, rc2;
@@ -4762,6 +4767,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_back_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
+    dom_info.colo_proxy_script = colo_proxy_script;
 
     rc = create_domain(&dom_info);
     if (rc < 0) {
@@ -4955,8 +4961,10 @@ int main_migrate_receive(int argc, char **argv)
     int debug = 0, daemonize = 1, monitor = 1;
     libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
+    char *script = NULL;
     static struct option opts[] = {
         {"colo", 0, 0, 0x100},
+        {"coloft-script", 1, 0, 0x200},
         COMMON_LONG_OPTS
     };
 
@@ -4977,6 +4985,9 @@ int main_migrate_receive(int argc, char **argv)
     case 0x100:
         checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
         break;
+    case 0x200:
+        script = optarg;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -4985,7 +4996,7 @@ int main_migrate_receive(int argc, char **argv)
     }
     migrate_receive(debug, daemonize, monitor,
                     STDOUT_FILENO, STDIN_FILENO,
-                    checkpointed);
+                    checkpointed, script);
 
     return 0;
 }
@@ -8395,8 +8406,10 @@ int main_remus(int argc, char **argv)
         r_info.interval = 200;
 
     if (libxl_defbool_val(r_info.colo)) {
-        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
-            perror("Option -c conflicts with -i or -b");
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole) ||
+            !libxl_defbool_is_default(r_info.netbuf) ||
+            !libxl_defbool_is_default(r_info.diskbuf)) {
+            perror("option -c is conflict with -i, -d, -n or -b");
             exit(-1);
         }
 
@@ -8407,8 +8420,12 @@ int main_remus(int argc, char **argv)
         }
     }
 
-    if (!r_info.netbufscript)
-        r_info.netbufscript = default_remus_netbufscript;
+    if (!r_info.netbufscript) {
+        if (libxl_defbool_val(r_info.colo))
+            r_info.netbufscript = default_colo_proxy_script;
+        else
+            r_info.netbufscript = default_remus_netbufscript;
+    }
 
     if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
@@ -8421,10 +8438,19 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
-                      ssh_command, host,
-                      libxl_defbool_val(r_info.colo) ? "-c" : "-r",
-                      daemonize ? "" : " -e");
+            if (!libxl_defbool_val(r_info.colo)) {
+                xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
+                          ssh_command, host,
+                          "-r",
+                          daemonize ? "" : " -e");
+            } else {
+                xasprintf(&rune, "exec %s %s xl migrate-receive %s %s %s %s",
+                          ssh_command, host,
+                          "--colo",
+                          r_info.netbufscript ? "--coloft-script" : "",
+                          r_info.netbufscript ? r_info.netbufscript : "",
+                          daemonize ? "" : " -e");
+            }
         }
 
         save_domain_core_begin(domid, NULL, &config_data, &config_len);
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 23/26] COLO nic: implement COLO nic subkind
  2016-03-25  6:44 ` [PATCH v13 23/26] COLO nic: implement COLO nic subkind Changlong Xie
@ 2016-03-25 12:56   ` Wei Liu
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
  1 sibling, 0 replies; 55+ messages in thread
From: Wei Liu @ 2016-03-25 12:56 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen devel,
	Anthony Perard, Dong Eddie, Gui Jianfeng, Shriram Rajagopalan,
	Yang Hongyang

On Fri, Mar 25, 2016 at 02:44:30PM +0800, Changlong Xie wrote:
[...]
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 0470423..6893732 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -600,6 +600,7 @@ libxl_device_nic = Struct("device_nic", [
>      ("rate_bytes_per_interval", uint64),
>      ("rate_interval_usecs", uint32),
>      ("gatewaydev", string),
> +    ("coloft_forwarddev", string)

This lacks the stability warning text. See my reply to your earlier
inquiry.

    Note that the COLO configuration settings should be considered
    unstable.  They may change incompatibly in future versions of
    Xen.

You can submit a new patch with the required change in a reply to this
one, and title it

 [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind

I checked, this hunk seems to be the only place that needs adding that
snippet -- there is no modification to xl manpage and libxl.h in this
patch.

Wei.

>      ])
>  
>  libxl_device_pci = Struct("device_pci", [
> -- 
> 1.9.3
> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (25 preceding siblings ...)
  2016-03-25  6:44 ` [PATCH v13 26/26] cmdline switches and config vars to control colo-proxy Changlong Xie
@ 2016-03-25 15:51 ` Wei Liu
  2016-03-28  3:52   ` Changlong Xie
  2016-03-30 14:50 ` Ian Jackson
  2016-03-31  2:28 ` Changlong Xie
  28 siblings, 1 reply; 55+ messages in thread
From: Wei Liu @ 2016-03-25 15:51 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen devel,
	Anthony Perard, Dong Eddie, Gui Jianfeng, Shriram Rajagopalan,
	Yang Hongyang

On Fri, Mar 25, 2016 at 02:44:07PM +0800, Changlong Xie wrote:
> This patchset implemented the COLO feature for Xen.
> For detail/install/use of COLO feature, refer to:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> 
> You can get the codes from here:
> https://github.com/Pating/xen/tree/changlox/colo_v13
> 
> Changlog from v12 to v13
> 1. Rebase to the upstream xen
> 2. Address commnets from Ian and Liu Wei.
> p7, Add A-B
> p8, Add A-B
> p10, Add A-B
> p11, Add A-B
> p12, Add LOG(ERROR, ) 
> p13, Add A-B
> p14, Remove libxl__ao_complete(xxx)
> p15, Add A-B
> p16, Add A-B
> p17, Add A-B, replace "-c" with "--colo" for migrate-receive()
> p19, Add A-B, introduce "switch ... case ..." 
> p21, Add A-B
> p22, Add A-B
> p23, replace "forwarddev" with "coloft_fowarddev" 
> p24, Add A-B
> p25, Add A-B
> p26, replace "--script" with "--coloft-script" 

I went over those unacked patches. The major thing I found is that you
didn't add in the warning text as Ian suggested. I've pointed out one
instance where you should add that. However, xl manage and libxl header
file changes are spread across multiple commits, so I'm not quite sure
which particular commit you should add in warning text.

I propose that you submit a separate patch to xl manpage and libxl
header file for adding warning text after the majority part of this
series is merged.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v13.1 20/26] Support colo mode for qemu disk
  2016-03-25  6:44 ` [PATCH v13 20/26] Support colo mode for qemu disk Changlong Xie
@ 2016-03-28  3:46   ` Changlong Xie
  2016-03-30 14:17     ` Ian Jackson
  2016-03-30 14:36     ` Ian Jackson
  0 siblings, 2 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-28  3:46 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Wen Congyang, Li Zhijian, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Anthony Perard, Shriram Rajagopalan,
	Yang Hongyang

 From 468ff9fb2f6699314c28f30a7d7d09eac9aa6756 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Mon, 21 Mar 2016 15:38:30 +0800
Subject: [PATCH v13.1 20/26] Support colo mode for qemu disk

Usage: disk = 
['...,colo,colo-host=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Note: we just introduce COLO framework, but don't implement COLO
operations in this patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
  docs/man/xl.pod.1                   |  41 +++++++++++--
  docs/misc/xl-disk-configuration.txt |  55 ++++++++++++++++++
  tools/libxl/libxl.c                 |  51 +++++++++++++++-
  tools/libxl/libxl_create.c          |  26 ++++++++-
  tools/libxl/libxl_device.c          |  11 ++++
  tools/libxl/libxl_dm.c              | 113 
+++++++++++++++++++++++++++++++++++-
  tools/libxl/libxl_types.idl         |   9 +++
  tools/libxl/libxlu_disk_l.l         |  19 ++++++
  8 files changed, 317 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index a992a45..2664402 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -450,12 +450,43 @@ Print huge (!) amount of debug during the 
migration process.
  Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
  transport mechanism between the two hosts.

-N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     Disk replication support is limited to DRBD disks.
+B<NOTES>
+
+=over 4
+
+Remus support in xl is still in experimental (proof-of-concept) phase.
+Disk replication support is limited to DRBD disks.
+
+COLO support in xl is still in experimental (proof-of-concept) phase.
+There is no support for network, so the guest will confuse its network
+peers at the moment.

-     COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk, so the guest will corrupt its
-     disk and confuse its network peers at the moment.
+=back
+
+B<EXAMPLE>
+
+=over 4
+
+(a) An example for COLO replication's configuration: disk 
=['...,colo,colo-host
+=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
+
+=item B<colo-host>      :Secondary host's ip address.
+
+=item B<colo-port>      :Secondary host's port, we will run a nbd server on
+secondary host, and the nbd server will listen this port.
+
+=item B<colo-export>    :Nbd server's disk export name of secondary host.
+
+=item B<active-disk>    :Secondary's guest write will be buffered in 
this disk,
+and it's used by secondary.
+
+=item B<hidden-disk>    :Primary's modified contents will be buffered 
in this
+disk, and it's used by secondary.
+
+Note that the COLO configuration settings should be considered 
unstable. They
+may change incompatibly in future versions of Xen.
+
+=back

  B<OPTIONS>

diff --git a/docs/misc/xl-disk-configuration.txt 
b/docs/misc/xl-disk-configuration.txt
index 29f6ddb..b3402bc 100644
--- a/docs/misc/xl-disk-configuration.txt
+++ b/docs/misc/xl-disk-configuration.txt
@@ -234,6 +234,61 @@ were intentionally created non-sparse to avoid 
fragmentation of the
  file.


+===============
+COLO PARAMETERS
+===============
+
+
+colo
+----
+
+Enable COLO HA for disk. For better understanding block replication on
+QEMU, please refer to:
+http://wiki.qemu.org/Features/BlockReplication
+Note that the COLO configuration settings should be considered unstable.
+They may change incompatibly in future versions of Xen.
+
+
+colo-host
+---------
+
+Description:           Secondary host's address
+Mandatory:             Yes when COLO enabled
+
+
+colo-port
+---------
+
+Description:           Secondary port
+                       We will run a nbd server on secondary host,
+                       and the nbd server will listen this port.
+Mandatory:             Yes when COLO enabled
+
+
+colo-export
+-----------
+
+Description:           We will run a nbd server on secondary host,
+                       exportname is the nbd server's disk export name.
+Mandatory:             Yes when COLO enabled
+
+
+active-disk
+-----------
+
+Description:           This is used by secondary. Secondary guest's write
+                       will be buffered in this disk.
+Mandatory:             Yes when COLO enabled
+
+
+hidden-disk
+-----------
+
+Description:           This is used by secondary. It buffers the original
+                       content that is modified by the primary VM.
+Mandatory:             Yes when COLO enabled
+
+
  ============================================
  DEPRECATED PARAMETERS, PREFIXES AND SYNTAXES
  ============================================
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 349a3c6..63fbe16 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2306,6 +2306,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, 
libxl_device_disk *disk)
      int rc;

      libxl_defbool_setdefault(&disk->discard_enable, !!disk->readwrite);
+    libxl_defbool_setdefault(&disk->colo_enable, false);
+    libxl_defbool_setdefault(&disk->colo_restore_enable, false);

      rc = libxl__resolve_domid(gc, disk->backend_domname, 
&disk->backend_domid);
      if (rc < 0) return rc;
@@ -2504,6 +2506,18 @@ static void device_disk_add(libxl__egc *egc, 
uint32_t domid,
                  flexarray_append(back, "params");
                  flexarray_append(back, GCSPRINTF("%s:%s",
 
libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                if (libxl_defbool_val(disk->colo_enable)) {
+                    flexarray_append(back, "colo-host");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", 
disk->colo_host));
+                    flexarray_append(back, "colo-port");
+                    flexarray_append(back, libxl__sprintf(gc, "%d", 
disk->colo_port));
+                    flexarray_append(back, "colo-export");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", 
disk->colo_export));
+                    flexarray_append(back, "active-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", 
disk->active_disk));
+                    flexarray_append(back, "hidden-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", 
disk->hidden_disk));
+                }
                  assert(device->backend_kind == LIBXL__DEVICE_KIND_QDISK);
                  break;
              default:
@@ -2619,7 +2633,12 @@ static int 
libxl__device_disk_from_xs_be(libxl__gc *gc,
          goto cleanup;
      }

-    /* "params" may not be present; but everything else must be. */
+    /*
+     * "params" may not be present; but everything else must be.
+     * colo releated entries(colo-host, colo-port, colo-export,
+     * active-disk and hidden-disk) are present only if colo is
+     * enabled.
+     */
      tmp = xs_read(ctx->xsh, XBT_NULL,
                    GCSPRINTF("%s/params", be_path), &len);
      if (tmp && strchr(tmp, ':')) {
@@ -2629,6 +2648,36 @@ static int 
libxl__device_disk_from_xs_be(libxl__gc *gc,
          disk->pdev_path = tmp;
      }

+    tmp = xs_read(ctx->xsh, XBT_NULL,
+                  GCSPRINTF("%s/colo-host", be_path), &len);
+    if (tmp) {
+        libxl_defbool_set(&disk->colo_enable, true);
+        disk->colo_host = tmp;
+
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      GCSPRINTF("%s/colo-port", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/colo-port", be_path);
+            goto cleanup;
+        }
+        disk->colo_port = atoi(tmp);
+
+#define XS_READ_COLO(param, item) do {                                  \
+        tmp = xs_read(ctx->xsh, XBT_NULL,                               \
+                      GCSPRINTF("%s/"#param"", be_path), &len);         \
+        if (!tmp) {                                                     \
+            LOG(ERROR, "Missing xenstore node %s/"#param"", be_path);   \
+            goto cleanup;                                               \
+        }                                                               \
+        disk->item = tmp;                                               \
+} while (0)
+        XS_READ_COLO(colo-export, colo_export);
+        XS_READ_COLO(active-disk, active_disk);
+        XS_READ_COLO(hidden-disk, hidden_disk);
+#undef XS_READ_COLO
+    } else {
+        libxl_defbool_set(&disk->colo_enable, false);
+    }

      tmp = libxl__xs_read(gc, XBT_NULL,
                           GCSPRINTF("%s/type", be_path));
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index be604e5..e2ec25c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1876,12 +1876,30 @@ static void domain_create_cb(libxl__egc *egc,

      libxl__ao_complete(egc, ao, rc);
  }
-
+
+
+static void set_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, true);
+}
+
+static void unset_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, false);
+}
+
  int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                              uint32_t *domid,
                              const libxl_asyncop_how *ao_how,
                              const libxl_asyncprogress_how 
*aop_console_how)
  {
+    unset_disk_colo_restore(d_config);
      return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
                              ao_how, aop_console_how);
  }
@@ -1893,6 +1911,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
                                  const libxl_asyncop_how *ao_how,
                                  const libxl_asyncprogress_how 
*aop_console_how)
  {
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        set_disk_colo_restore(d_config);
+    } else {
+        unset_disk_colo_restore(d_config);
+    }
+
      return do_domain_create(ctx, d_config, domid, restore_fd, 
send_back_fd,
                              params, ao_how, aop_console_how);
  }
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 4ced9b6..6a411c6 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -196,6 +196,9 @@ static int disk_try_backend(disk_try_backend_args *a,
              goto bad_format;
          }

+        if (libxl_defbool_val(a->disk->colo_enable))
+            goto bad_colo;
+
          if (a->disk->backend_domid != LIBXL_TOOLSTACK_DOMID) {
              LOG(DEBUG, "Disk vdev=%s, is using a storage driver domain, "
                         "skipping physical device check", a->disk->vdev);
@@ -218,6 +221,9 @@ static int disk_try_backend(disk_try_backend_args *a,
      case LIBXL_DISK_BACKEND_TAP:
          if (a->disk->script) goto bad_script;

+        if (libxl_defbool_val(a->disk->colo_enable))
+            goto bad_colo;
+
          if (a->disk->is_cdrom) {
              LOG(DEBUG, "Disk vdev=%s, backend tap unsuitable for cdroms",
                         a->disk->vdev);
@@ -256,6 +262,11 @@ static int disk_try_backend(disk_try_backend_args *a,
      LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with script=...",
          a->disk->vdev, libxl_disk_backend_to_string(backend));
      return 0;
+
+ bad_colo:
+    LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
  }

  int libxl__device_disk_set_backend(libxl__gc *gc, libxl_device_disk 
*disk) {
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 1d1b25b..4c3dff8 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -754,6 +754,8 @@ static int libxl__dm_runas_helper(libxl__gc *gc, 
const char *username)
  /* colo mode */
  enum {
      LIBXL__COLO_NONE = 0,
+    LIBXL__COLO_PRIMARY,
+    LIBXL__COLO_SECONDARY,
  };

  static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char 
*pdev_path,
@@ -762,6 +764,9 @@ static char *qemu_disk_scsi_drive_string(libxl__gc 
*gc, const char *pdev_path,
                                           int colo_mode)
  {
      char *drive = NULL;
+    const char *exportname = disk->colo_export;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;

      switch (colo_mode) {
      case LIBXL__COLO_NONE:
@@ -769,6 +774,45 @@ static char *qemu_disk_scsi_drive_string(libxl__gc 
*gc, const char *pdev_path,
              (gc, 
"file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
               pdev_path, unit, format);
          break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=scsi,bus=0,unit=x,cache=writeback,driver=quorum,\
+         *  id=exportname,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  read-pattern=fifo,\
+         *  vote-threshold=1
+         */
+        drive = GCSPRINTF(
+            "if=scsi,bus=0,unit=%d,cache=writeback,driver=quorum,"
+            "id=%s,"
+            "children.0.file.filename=%s,"
+            "children.0.driver=%s,"
+            "read-pattern=fifo,"
+            "vote-threshold=1",
+            unit, exportname, pdev_path, format);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive 
if=scsi,bus=0,unit=x,cache=writeback,driver=replication,\
+         *  mode=secondary,\
+         *  file.driver=qcow2,\
+         *  file.file.filename=active_disk,\
+         *  file.backing.driver=qcow2,\
+         *  file.backing.file.filename=hidden_disk,\
+         *  file.backing.backing=exportname,
+         */
+        drive = GCSPRINTF(
+            "if=scsi,bus=0,unit=%d,cache=writeback,driver=replication,"
+            "mode=secondary,"
+            "file.driver=qcow2,"
+            "file.file.filename=%s,"
+            "file.backing.driver=qcow2,"
+            "file.backing.file.filename=%s,"
+            "file.backing.backing=%s",
+            unit, active_disk, hidden_disk, exportname);
      default:
           abort();
      }
@@ -782,6 +826,9 @@ static char *qemu_disk_ide_drive_string(libxl__gc 
*gc, const char *pdev_path,
                                          int colo_mode)
  {
      char *drive = NULL;
+    const char *exportname = disk->colo_export;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;

      switch (colo_mode) {
      case LIBXL__COLO_NONE:
@@ -789,6 +836,46 @@ static char *qemu_disk_ide_drive_string(libxl__gc 
*gc, const char *pdev_path,
 
("file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
               pdev_path, unit, format);
          break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve 
if=ide,index=x,media=disk,cache=writeback,driver=quorum,\
+         *  id=exportname,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  read-pattern=fifo,\
+         *  vote-threshold=1
+         */
+        drive = GCSPRINTF(
+            "if=ide,index=%d,media=disk,cache=writeback,driver=quorum,"
+            "id=%s,"
+            "children.0.file.filename=%s,"
+            "children.0.driver=%s,"
+            "read-pattern=fifo,"
+            "vote-threshold=1",
+             unit, exportname, pdev_path, format);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive 
if=ide,index=x,media=disk,cache=writeback,driver=replication,\
+         *  mode=secondary,\
+         *  file.driver=qcow2,\
+         *  file.file.filename=active_disk,\
+         *  file.backing.driver=qcow2,\
+         *  file.backing.file.filename=hidden_disk,\
+         *  file.backing.backing=exportname,
+         */
+        drive = GCSPRINTF(
+ 
"if=ide,index=%d,media=disk,cache=writeback,driver=replication,"
+            "mode=secondary,"
+            "file.driver=qcow2,"
+            "file.file.filename=%s,"
+            "file.backing.driver=qcow2,"
+            "file.backing.file.filename=%s,"
+            "file.backing.backing=%s",
+            unit, active_disk, hidden_disk, exportname);
+        break;
      default:
           abort();
      }
@@ -1261,8 +1348,24 @@ static int 
libxl__build_device_model_args_new(libxl__gc *gc,
                   * hd[a-d] and ignore the rest.
                   */

-                colo_mode = LIBXL__COLO_NONE;
+                if (libxl_defbool_val(disks[i].colo_enable)) {
+                    if (libxl_defbool_val(disks[i].colo_restore_enable))
+                        colo_mode = LIBXL__COLO_SECONDARY;
+                    else
+                        colo_mode = LIBXL__COLO_PRIMARY;
+                } else {
+                    colo_mode = LIBXL__COLO_NONE;
+                }
+
                  if (strncmp(disks[i].vdev, "sd", 2) == 0) {
+                    if (colo_mode == LIBXL__COLO_SECONDARY) {
+                        drive = libxl__sprintf
+                            (gc, "if=none,driver=%s,file=%s,id=%s",
+                             format, pdev_path, disks[i].colo_export);
+
+                        flexarray_append(dm_args, "-drive");
+                        flexarray_append(dm_args, drive);
+                    }
                      drive = qemu_disk_scsi_drive_string(gc, pdev_path, 
disk,
                                                          format,
                                                          &disks[i],
@@ -1289,6 +1392,14 @@ static int 
libxl__build_device_model_args_new(libxl__gc *gc,
                          LOG(ERROR, "qemu-xen doesn't support read-only 
IDE disk drivers");
                          return ERROR_INVAL;
                      }
+                    if (colo_mode == LIBXL__COLO_SECONDARY) {
+                        drive = libxl__sprintf
+                            (gc, "if=none,driver=%s,file=%s,id=%s",
+                             format, pdev_path, disks[i].colo_export);
+
+                        flexarray_append(dm_args, "-drive");
+                        flexarray_append(dm_args, drive);
+                    }
                      drive = qemu_disk_ide_drive_string(gc, pdev_path, 
disk,
                                                         format,
                                                         &disks[i],
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 95efd82..8335291 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -576,6 +576,15 @@ libxl_device_disk = Struct("device_disk", [
      ("is_cdrom", integer),
      ("direct_io_safe", bool),
      ("discard_enable", libxl_defbool),
+    # Note that the COLO configuration settings should be considered 
unstable.
+    # They may change incompatibly in future versions of Xen.
+    ("colo_enable", libxl_defbool),
+    ("colo_restore_enable", libxl_defbool),
+    ("colo_host", string),
+    ("colo_port", integer),
+    ("colo_export", string),
+    ("active_disk", string),
+    ("hidden_disk", string)
      ])

  libxl_device_nic = Struct("device_nic", [
diff --git a/tools/libxl/libxlu_disk_l.l b/tools/libxl/libxlu_disk_l.l
index 1a5deb5..5b6db22 100644
--- a/tools/libxl/libxlu_disk_l.l
+++ b/tools/libxl/libxlu_disk_l.l
@@ -113,6 +113,16 @@ static void setbackendtype(DiskParseContext *dpc, 
const char *str) {
      else xlu__disk_err(dpc,str,"unknown value for backendtype");
  }

+/* Sets ->colo-port from the string.  COLO need this. */
+static void setcoloport(DiskParseContext *dpc, const char *str) {
+    int port = atoi(str);
+    if (port) {
+       dpc->disk->colo_port = port;
+    } else {
+	xlu__disk_err(dpc,str,"unknown value for colo_port");
+    }
+}
+
  #define DEPRECATE(usewhatinstead) /* not currently reported */

  /* Handles a vdev positional parameter which includes a devtype. */
@@ -176,6 +186,15 @@ script=[^,]*,?	{ STRIP(','); SAVESTRING("script", 
script, FROMEQUALS); }
  direct-io-safe,? { DPC->disk->direct_io_safe = 1; }
  discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, true); }
  no-discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, false); }
+ /* Note that the COLO configuration settings should be considered 
unstable.
+  * They may change incompatibly in future versions of Xen. */
+colo,?		{ libxl_defbool_set(&DPC->disk->colo_enable, true); }
+no-colo,?	{ libxl_defbool_set(&DPC->disk->colo_enable, false); }
+colo-host=[^,]*,?	{ STRIP(','); SAVESTRING("colo-host", colo_host, 
FROMEQUALS); }
+colo-port=[^,]*,?	{ STRIP(','); setcoloport(DPC, FROMEQUALS); }
+colo-export=[^,]*,?	{ STRIP(','); SAVESTRING("colo-export", 
colo_export, FROMEQUALS); }
+active-disk=[^,]*,?	{ STRIP(','); SAVESTRING("active-disk", 
active_disk, FROMEQUALS); }
+hidden-disk=[^,]*,?	{ STRIP(','); SAVESTRING("hidden-disk", 
hidden_disk, FROMEQUALS); }

   /* the target magic parameter, eats the rest of the string */

-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
  2016-03-25  6:44 ` [PATCH v13 23/26] COLO nic: implement COLO nic subkind Changlong Xie
  2016-03-25 12:56   ` Wei Liu
@ 2016-03-28  3:46   ` Changlong Xie
  2016-03-30 14:22     ` Ian Jackson
  2016-03-30 14:38     ` Ian Jackson
  1 sibling, 2 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-28  3:46 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Wen Congyang, Li Zhijian, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Anthony Perard, Shriram Rajagopalan,
	Yang Hongyang

 From 699f20d46fcce0bcce8fd7f7063551088a425254 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Wed, 15 Jul 2015 17:18:53 +0800
Subject: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
  tools/hotplug/Linux/Makefile         |   1 +
  tools/hotplug/Linux/colo-proxy-setup | 135 +++++++++++++++
  tools/libxl/Makefile                 |   1 +
  tools/libxl/libxl_colo.h             |  10 ++
  tools/libxl/libxl_colo_nic.c         | 320 
+++++++++++++++++++++++++++++++++++
  tools/libxl/libxl_internal.h         |   2 +
  tools/libxl/libxl_types.idl          |   3 +
  7 files changed, 472 insertions(+)
  create mode 100755 tools/hotplug/Linux/colo-proxy-setup
  create mode 100644 tools/libxl/libxl_colo_nic.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 6e10118..9bb852b 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -28,6 +28,7 @@ XEN_SCRIPTS += block-iscsi
  XEN_SCRIPTS += block-tap
  XEN_SCRIPTS += block-drbd-probe
  XEN_SCRIPTS += $(XEN_SCRIPTS-y)
+XEN_SCRIPTS += colo-proxy-setup

  SUBDIRS-$(CONFIG_SYSTEMD) += systemd

diff --git a/tools/hotplug/Linux/colo-proxy-setup 
b/tools/hotplug/Linux/colo-proxy-setup
new file mode 100755
index 0000000..94e2034
--- /dev/null
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -0,0 +1,135 @@
+#! /bin/bash
+
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+. "$dir/hotplugpath.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+    echo "Invalid command: $command"
+    log err "Invalid command: $command"
+    exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${forwarddev:?}
+: ${mode:?}
+: ${index:?}
+: ${bridge:?}
+
+forwardbr="colobr0"
+
+if [ "$mode" != "primary" -a "$mode" != "secondary" ]
+then
+    echo "Invalid mode: $mode"
+    log err "Invalid mode: $mode"
+    exit 1
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 1
+fi
+
+function setup_primary()
+{
+    do_without_error tc qdisc add dev $vifname root handle 1: prio
+    do_without_error tc filter add dev $vifname parent 1: protocol ip 
prio 10 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev 
$forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol arp 
prio 11 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev 
$forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol ipv6 
prio \
+        12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \
+        dev $forwarddev
+
+    do_without_error modprobe nf_conntrack_ipv4
+    do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev
+
+    iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -I INPUT -i $forwarddev -j MARK 
--set-mark $index
+}
+
+function teardown_primary()
+{
+    do_without_error tc filter del dev $vifname parent 1: protocol ip 
prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol arp 
prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol ipv6 
prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc qdisc del dev $vifname root handle 1: prio
+
+    do_without_error iptables -t mangle -D PREROUTING -m physdev 
--physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error ip6tables -t mangle -D PREROUTING -m physdev 
--physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -F
+    do_without_error rmmod xt_PMYCOLO
+}
+
+function setup_secondary()
+{
+    do_without_error brctl delif $bridge $vifname
+    do_without_error brctl addbr $forwardbr
+    do_without_error brctl addif $forwardbr $vifname
+    do_without_error brctl addif $forwardbr $forwarddev
+    do_without_error ip link set dev $forwardbr up
+    do_without_error modprobe xt_SECCOLO
+
+    iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+}
+
+function teardown_secondary()
+{
+    do_without_error brctl delif $forwardbr $forwarddev
+    do_without_error brctl delif $forwardbr $vifname
+    do_without_error brctl delbr $forwardbr
+    do_without_error brctl addif $bridge $vifname
+
+    do_without_error iptables -t mangle -D PREROUTING -m physdev 
--physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error ip6tables -t mangle -D PREROUTING -m physdev 
--physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error rmmod xt_SECCOLO
+}
+
+case "$command" in
+    setup)
+        if [ "$mode" = "primary" ]
+        then
+            setup_primary
+        else
+            setup_secondary
+        fi
+
+        success
+        ;;
+    teardown)
+        if [ "$mode" = "primary" ]
+        then
+            teardown_primary
+        else
+            teardown_secondary
+        fi
+        ;;
+esac
+
+if [ "$mode" = "primary" ]
+then
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev."
+else
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: 
$forwarddev,"\
+              " forwardbr: $forwardbr."
+fi
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 72f3b1a..a433aaa 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -68,6 +68,7 @@ LIBXL_OBJS-y += libxl_remus.o 
libxl_checkpoint_device.o libxl_remus_disk_drbd.o
  LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
  LIBXL_OBJS-y += libxl_colo_qdisk.o
  LIBXL_OBJS-y += libxl_colo_proxy.o
+LIBXL_OBJS-y += libxl_colo_nic.o

  LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
  LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o 
libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index a529ce8..5fbb659 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -40,6 +40,11 @@ enum colo_netlink_op {
      COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
  };

+typedef struct libxl__colo_device_nic {
+    int devid;
+    const char *vif;
+} libxl__colo_device_nic;
+
  typedef struct libxl__colo_qdisk {
      bool setuped;
  } libxl__colo_qdisk;
@@ -70,6 +75,7 @@ struct libxl__colo_restore_state {
      int recv_fd;
      int hvm;
      libxl__colo_callback *callback;
+    char *colo_proxy_script;

      /* private, colo restore checkpoint state */
      libxl__domain_create_cb *saved_cb;
@@ -89,6 +95,10 @@ int init_subkind_qdisk(struct 
libxl__checkpoint_devices_state *cds);

  void cleanup_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);

+int init_subkind_colo_nic(struct libxl__checkpoint_devices_state *cds);
+
+void cleanup_subkind_colo_nic(struct libxl__checkpoint_devices_state *cds);
+
  extern void libxl__colo_restore_setup(struct libxl__egc *egc,
                                        libxl__colo_restore_state *crs);
  extern void libxl__colo_restore_teardown(struct libxl__egc *egc, void 
*dcs_void,
diff --git a/tools/libxl/libxl_colo_nic.c b/tools/libxl/libxl_colo_nic.c
new file mode 100644
index 0000000..2e00c28
--- /dev/null
+++ b/tools/libxl/libxl_colo_nic.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+enum {
+    primary,
+    secondary,
+};
+
+/* ========== init() and cleanup() ========== */
+
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+    return 0;
+}
+
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== helper functions ========== */
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                     libxl__async_exec_state *aes,
+                                     int rc, int status);
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status);
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__checkpoint_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->cds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $forwarddev
+ * $mode(primary/secondary)
+ * $index
+ * $bridge
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op,
+                             libxl__colo_proxy_state *cps, int side,
+                             char *colo_proxy_script)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const char *const vif = colo_nic->vif;
+
+    arraysize = 11;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "forwarddev";
+    env[nr++] = libxl__strdup(gc, nic->coloft_forwarddev);
+    env[nr++] = "mode";
+    if (side == primary)
+        env[nr++] = "primary";
+    else
+        env[nr++] = "secondary";
+    env[nr++] = "index";
+    env[nr++] = GCSPRINTF("%d", cps->index);
+    env[nr++] = "bridge";
+    env[nr++] = libxl__strdup(gc, nic->bridge);
+    env[nr++] = NULL;
+    assert(nr == arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = colo_proxy_script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->cds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = colo_save_teardown_script_cb;
+    else
+        aes->callback = colo_save_setup_script_cb;
+}
+
+/* ========== setup() and teardown() ========== */
+
+static void colo_nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                           libxl__colo_proxy_state *cps, int side,
+                           char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices, we begin to setup the nic device
+     */
+    dev->matched = 1;
+
+    if (!nic->coloft_forwarddev) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    GCNEW(colo_nic);
+    dev->concrete_data = colo_nic;
+    colo_nic->devid = nic->devid;
+    colo_nic->vif = get_vifname(dev, nic);
+    if (!colo_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup", cps, side, colo_proxy_script);
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    const char *out_path_base, *hotplug_error = NULL;
+
+    EGC_GC;
+
+    /* Convenience aliases */
+    const uint32_t domid = cds->domid;
+    const int devid = colo_nic->devid;
+    const char *const vif = colo_nic->vif;
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    if (rc)
+        goto out;
+
+    out_path_base = GCSPRINTF("%s/colo_proxy/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", 
out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "colo_proxy script %s setup failed for vif %s: %s",
+            aes->args[0], vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void colo_nic_teardown(libxl__egc *egc, libxl__checkpoint_device 
*dev,
+                              libxl__colo_proxy_state *cps, int side,
+                              char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+
+    if (!colo_nic || !colo_nic->vif) {
+        /* colo nic has not yet been set up, just return */
+        rc = 0;
+        goto out;
+    }
+
+    setup_async_exec(dev, "teardown", cps, side, colo_proxy_script);
+
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/* ======== primary ======== */
+
+static void colo_nic_save_setup(libxl__egc *egc, 
libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = dev->cds->concrete_data;
+
+    colo_nic_setup(egc, dev, &css->cps, primary, css->colo_proxy_script);
+}
+
+static void colo_nic_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = dev->cds->concrete_data;
+
+    colo_nic_teardown(egc, dev, &css->cps, primary, 
css->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_save_setup,
+    .teardown = colo_nic_save_teardown,
+};
+
+/* ======== secondary ======== */
+
+static void colo_nic_restore_setup(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = dev->cds->concrete_data;
+
+    colo_nic_setup(egc, dev, &crs->cps, secondary, crs->colo_proxy_script);
+}
+
+static void colo_nic_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = dev->cds->concrete_data;
+
+    colo_nic_teardown(egc, dev, &crs->cps, secondary, 
crs->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_restore_setup,
+    .teardown = colo_nic_restore_teardown,
+};
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8f02222..759b8d0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3206,6 +3206,7 @@ typedef struct libxl__colo_save_state 
libxl__colo_save_state;
  struct libxl__colo_save_state {
      int send_fd;
      int recv_fd;
+    char *colo_proxy_script;

      /* private */
      libxl__stream_read_state srs;
@@ -3554,6 +3555,7 @@ struct libxl__domain_create_state {
      libxl_asyncprogress_how aop_console_how;
      /* private to domain_create */
      int guest_domid;
+    const char *colo_proxy_script;
      libxl__domain_build_state build_state;
      libxl__colo_restore_state crs;
      libxl__checkpoint_devices_state cds;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 8335291..f88fae0 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -602,6 +602,9 @@ libxl_device_nic = Struct("device_nic", [
      ("rate_bytes_per_interval", uint64),
      ("rate_interval_usecs", uint32),
      ("gatewaydev", string),
+    # Note that the COLO configuration settings should be considered 
unstable.
+    # They may change incompatibly in future versions of Xen.
+    ("coloft_forwarddev", string)
      ])

  libxl_device_pci = Struct("device_pci", [
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13.1 26/26] cmdline switches and config vars to control colo-proxy
  2016-03-25  6:44 ` [PATCH v13 26/26] cmdline switches and config vars to control colo-proxy Changlong Xie
@ 2016-03-28  3:47   ` Changlong Xie
  2016-03-30 14:28     ` Ian Jackson
  2016-03-30 14:42     ` Ian Jackson
  0 siblings, 2 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-28  3:47 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Wen Congyang, Li Zhijian, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Anthony Perard, Shriram Rajagopalan,
	Yang Hongyang

 From 1bfd14622455635c6cae6130396250996e49facc Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Wed, 15 Jul 2015 17:18:56 +0800
Subject: [PATCH v13.1 26/26] cmdline switches and config vars to control 
colo-proxy

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
  docs/man/xl.conf.pod.5           |  6 +++++
  docs/man/xl.pod.1                |  7 ++++--
  tools/libxl/libxl.c              |  6 +++++
  tools/libxl/libxl_colo_restore.c |  5 +++++
  tools/libxl/libxl_create.c       |  9 ++++++--
  tools/libxl/libxl_types.idl      |  1 +
  tools/libxl/xl.c                 |  3 +++
  tools/libxl/xl.h                 |  1 +
  tools/libxl/xl_cmdimpl.c         | 47 
+++++++++++++++++++++++++++++++---------
  9 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 8ae19bb..8f7fd28 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -111,6 +111,12 @@ Configures the default script used by Remus to 
setup network buffering.

  Default: C</etc/xen/scripts/remus-netbuf-setup>

+=item B<colo.default.proxyscript="PATH">
+
+Configures the default script used by COLO to setup colo-proxy.
+
+Default: C</etc/xen/scripts/colo-proxy-setup>
+
  =item B<output_format="json|sxp">

  Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2664402..9df3302 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -458,8 +458,6 @@ Remus support in xl is still in experimental 
(proof-of-concept) phase.
  Disk replication support is limited to DRBD disks.

  COLO support in xl is still in experimental (proof-of-concept) phase.
-There is no support for network, so the guest will confuse its network
-peers at the moment.

  =back

@@ -483,6 +481,11 @@ and it's used by secondary.
  =item B<hidden-disk>    :Primary's modified contents will be buffered 
in this
  disk, and it's used by secondary.

+(b) An example for COLO network configuration: vif =[ 
'...,forwarddev=xxx,...']
+
+=item B<forwarddev>     :Forward devices for primary and secondary, 
there are
+directly connected.
+
  Note that the COLO configuration settings should be considered 
unstable. They
  may change incompatibly in future versions of Xen.

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 63fbe16..aabf3a7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3378,6 +3378,11 @@ void libxl__device_nic_add(libxl__egc *egc, 
uint32_t domid,
          flexarray_append(back, nic->ifname);
      }

+    if (nic->coloft_forwarddev) {
+        flexarray_append(back, "forwarddev");
+        flexarray_append(back, nic->coloft_forwarddev);
+    }
+
      flexarray_append(back, "mac");
      flexarray_append(back,GCSPRINTF(LIBXL_MAC_FMT, 
LIBXL_MAC_BYTES(nic->mac)));
      if (nic->ip) {
@@ -3500,6 +3505,7 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc,
      nic->ip = READ_BACKEND(NOGC, "ip");
      nic->bridge = READ_BACKEND(NOGC, "bridge");
      nic->script = READ_BACKEND(NOGC, "script");
+    nic->coloft_forwarddev = READ_BACKEND(NOGC, "forwarddev");

      /* vif_ioemu nics use the same xenstore entries as vif interfaces */
      tmp = READ_BACKEND(gc, "type");
diff --git a/tools/libxl/libxl_colo_restore.c 
b/tools/libxl/libxl_colo_restore.c
index c8ad796..3483f39 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -233,6 +233,11 @@ void libxl__colo_restore_setup(libxl__egc *egc,
      crcs->crs = crs;
      crs->qdisk_setuped = false;
      crs->qdisk_used = false;
+    if (dcs->colo_proxy_script)
+        crs->colo_proxy_script = libxl__strdup(gc, dcs->colo_proxy_script);
+    else
+        crs->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                           libxl__xen_script_dir_path());

      /* setup dsps */
      crcs->dsps.ao = ao;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e2ec25c..d6028aa 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1690,6 +1690,7 @@ static void domain_create_cb(libxl__egc *egc,
  static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                              uint32_t *domid, int restore_fd, int 
send_back_fd,
                              const libxl_domain_restore_params *params,
+                            const char *colo_proxy_script,
                              const libxl_asyncop_how *ao_how,
                              const libxl_asyncprogress_how 
*aop_console_how)
  {
@@ -1713,6 +1714,7 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
      }
      cdcs->dcs.callback = domain_create_cb;
      cdcs->dcs.domid_soft_reset = INVALID_DOMID;
+    cdcs->dcs.colo_proxy_script = colo_proxy_script;
      libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, 
aop_console_how);
      cdcs->domid_out = domid;

@@ -1900,7 +1902,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
libxl_domain_config *d_config,
                              const libxl_asyncprogress_how 
*aop_console_how)
  {
      unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, NULL, NULL,
                              ao_how, aop_console_how);
  }

@@ -1911,14 +1913,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
                                  const libxl_asyncop_how *ao_how,
                                  const libxl_asyncprogress_how 
*aop_console_how)
  {
+    char *colo_proxy_script = NULL;
+
      if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        colo_proxy_script = params->colo_proxy_script;
          set_disk_colo_restore(d_config);
      } else {
          unset_disk_colo_restore(d_config);
      }

      return do_domain_create(ctx, d_config, domid, restore_fd, 
send_back_fd,
-                            params, ao_how, aop_console_how);
+                            params, colo_proxy_script, ao_how, 
aop_console_how);
  }

  int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f88fae0..165b788 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -386,6 +386,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
  libxl_domain_restore_params = Struct("domain_restore_params", [
      ("checkpointed_stream", integer),
      ("stream_version", uint32, {'init_val': '1'}),
+    ("colo_proxy_script", string),
      ])

  libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index dfae84a..a272258 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -45,6 +45,7 @@ char *default_bridge = NULL;
  char *default_gatewaydev = NULL;
  char *default_vifbackend = NULL;
  char *default_remus_netbufscript = NULL;
+char *default_colo_proxy_script = NULL;
  enum output_format default_output_format = OUTPUT_FORMAT_JSON;
  int claim_mode = 1;
  bool progress_use_cr = 0;
@@ -179,6 +180,8 @@ static void parse_global_config(const char *configfile,

      xlu_cfg_replace_string (config, "remus.default.netbufscript",
          &default_remus_netbufscript, 0);
+    xlu_cfg_replace_string (config, "colo.default.proxyscript",
+        &default_colo_proxy_script, 0);

      xlu_cfg_destroy(config);
  }
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 309627a..e601ca1 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -194,6 +194,7 @@ extern char *default_bridge;
  extern char *default_gatewaydev;
  extern char *default_vifbackend;
  extern char *default_remus_netbufscript;
+extern char *default_colo_proxy_script;
  extern char *blkdev_start;

  enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25bd81a..91dcb63 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -158,6 +158,7 @@ struct domain_create {
      const char *config_file;
      char *extra_config; /* extra config string */
      const char *restore_file;
+    char *colo_proxy_script;
      int migrate_fd; /* -1 means none */
      int send_back_fd; /* -1 means none */
      char **migration_domname_r; /* from malloc */
@@ -1053,6 +1054,8 @@ static int parse_nic_config(libxl_device_nic *nic, 
XLU_Config **config, char *to
          replace_string(&nic->model, oparg);
      } else if (MATCH_OPTION("rate", token, oparg)) {
          parse_vif_rate(config, oparg, nic);
+    } else if (MATCH_OPTION("forwarddev", token, oparg)) {
+        replace_string(&nic->coloft_forwarddev, oparg);
      } else if (MATCH_OPTION("accel", token, oparg)) {
          fprintf(stderr, "the accel parameter for vifs is currently not 
supported\n");
      } else {
@@ -3001,6 +3004,7 @@ start:
          params.checkpointed_stream = dom_info->checkpointed_stream;
          params.stream_version =
              (hdr.mandatory_flags & XL_MANDATORY_FLAG_STREAMv2) ? 2 : 1;
+        params.colo_proxy_script = dom_info->colo_proxy_script;

          ret = libxl_domain_create_restore(ctx, &d_config,
                                            &domid, restore_fd,
@@ -4733,7 +4737,8 @@ static void migrate_domain(uint32_t domid, const 
char *rune, int debug,

  static void migrate_receive(int debug, int daemonize, int monitor,
                              int send_fd, int recv_fd,
-                            libxl_checkpointed_stream checkpointed)
+                            libxl_checkpointed_stream checkpointed,
+                            char *colo_proxy_script)
  {
      uint32_t domid;
      int rc, rc2;
@@ -4762,6 +4767,7 @@ static void migrate_receive(int debug, int 
daemonize, int monitor,
      dom_info.send_back_fd = send_fd;
      dom_info.migration_domname_r = &migration_domname;
      dom_info.checkpointed_stream = checkpointed;
+    dom_info.colo_proxy_script = colo_proxy_script;

      rc = create_domain(&dom_info);
      if (rc < 0) {
@@ -4955,8 +4961,11 @@ int main_migrate_receive(int argc, char **argv)
      int debug = 0, daemonize = 1, monitor = 1;
      libxl_checkpointed_stream checkpointed = 
LIBXL_CHECKPOINTED_STREAM_NONE;
      int opt;
+    char *script = NULL;
      static struct option opts[] = {
          {"colo", 0, 0, 0x100},
+        /* It is a shame that the management code for disk is not here. */
+        {"coloft-script", 1, 0, 0x200},
          COMMON_LONG_OPTS
      };

@@ -4977,6 +4986,9 @@ int main_migrate_receive(int argc, char **argv)
      case 0x100:
          checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
          break;
+    case 0x200:
+        script = optarg;
+        break;
      }

      if (argc-optind != 0) {
@@ -4985,7 +4997,7 @@ int main_migrate_receive(int argc, char **argv)
      }
      migrate_receive(debug, daemonize, monitor,
                      STDOUT_FILENO, STDIN_FILENO,
-                    checkpointed);
+                    checkpointed, script);

      return 0;
  }
@@ -8395,8 +8407,10 @@ int main_remus(int argc, char **argv)
          r_info.interval = 200;

      if (libxl_defbool_val(r_info.colo)) {
-        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
-            perror("Option -c conflicts with -i or -b");
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole) ||
+            !libxl_defbool_is_default(r_info.netbuf) ||
+            !libxl_defbool_is_default(r_info.diskbuf)) {
+            perror("option -c is conflict with -i, -d, -n or -b");
              exit(-1);
          }

@@ -8407,8 +8421,12 @@ int main_remus(int argc, char **argv)
          }
      }

-    if (!r_info.netbufscript)
-        r_info.netbufscript = default_remus_netbufscript;
+    if (!r_info.netbufscript) {
+        if (libxl_defbool_val(r_info.colo))
+            r_info.netbufscript = default_colo_proxy_script;
+        else
+            r_info.netbufscript = default_remus_netbufscript;
+    }

      if (libxl_defbool_val(r_info.blackhole)) {
          send_fd = open("/dev/null", O_RDWR, 0644);
@@ -8421,10 +8439,19 @@ int main_remus(int argc, char **argv)
          if (!ssh_command[0]) {
              rune = host;
          } else {
-            xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
-                      ssh_command, host,
-                      libxl_defbool_val(r_info.colo) ? "-c" : "-r",
-                      daemonize ? "" : " -e");
+            if (!libxl_defbool_val(r_info.colo)) {
+                xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
+                          ssh_command, host,
+                          "-r",
+                          daemonize ? "" : " -e");
+            } else {
+                xasprintf(&rune, "exec %s %s xl migrate-receive %s %s 
%s %s",
+                          ssh_command, host,
+                          "--colo",
+                          r_info.netbufscript ? "--coloft-script" : "",
+                          r_info.netbufscript ? r_info.netbufscript : "",
+                          daemonize ? "" : " -e");
+            }
          }

          save_domain_core_begin(domid, NULL, &config_data, &config_len);
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-25 15:51 ` [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wei Liu
@ 2016-03-28  3:52   ` Changlong Xie
  2016-03-30 14:52     ` Ian Jackson
  0 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-28  3:52 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Li Zhijian, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Ian Jackson,
	Yang Hongyang

On 03/25/2016 11:51 PM, Wei Liu wrote:
> On Fri, Mar 25, 2016 at 02:44:07PM +0800, Changlong Xie wrote:
>> This patchset implemented the COLO feature for Xen.
>> For detail/install/use of COLO feature, refer to:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> You can get the codes from here:
>> https://github.com/Pating/xen/tree/changlox/colo_v13
>>
>> Changlog from v12 to v13
>> 1. Rebase to the upstream xen
>> 2. Address commnets from Ian and Liu Wei.
>> p7, Add A-B
>> p8, Add A-B
>> p10, Add A-B
>> p11, Add A-B
>> p12, Add LOG(ERROR, )
>> p13, Add A-B
>> p14, Remove libxl__ao_complete(xxx)
>> p15, Add A-B
>> p16, Add A-B
>> p17, Add A-B, replace "-c" with "--colo" for migrate-receive()
>> p19, Add A-B, introduce "switch ... case ..."
>> p21, Add A-B
>> p22, Add A-B
>> p23, replace "forwarddev" with "coloft_fowarddev"
>> p24, Add A-B
>> p25, Add A-B
>> p26, replace "--script" with "--coloft-script"
>
> I went over those unacked patches. The major thing I found is that you
> didn't add in the warning text as Ian suggested. I've pointed out one
> instance where you should add that. However, xl manage and libxl header
> file changes are spread across multiple commits, so I'm not quite sure
> which particular commit you should add in warning text.
>

https://github.com/Pating/xen/tree/changlox/colo_v13_fixup

I just update p20, p23, p26 as Ian suggested

Thanks
	-Xie
> I propose that you submit a separate patch to xl manpage and libxl
> header file for adding warning text after the majority part of this
> series is merged.
>
> Wei.
>
>
> .
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code
  2016-03-25  6:44 ` [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code Changlong Xie
@ 2016-03-30 14:07   ` Ian Jackson
  0 siblings, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:07 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13 12/26] secondary vm suspend/resume/checkpoint code"):
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Secondary vm is running in colo mode. So we will do
> the following things again and again:
> 1. Resume secondary vm
>    a. Send CHECKPOINT_SVM_READY to master.
>    b. If it is not the first resume, call libxl__checkpoint_devices_preresume().
>    c. If it is the first resume(resume right after live migration),
>       - call libxl__xc_domain_restore_done() to build the secondary vm.
>       - enable secondary vm's logdirty.
>       - call libxl__domain_resume() to resume secondary vm.
>       - call libxl__checkpoint_devices_setup() to setup checkpoint devices.
>    d. Send CHECKPOINT_SVM_RESUMED to master.
> 2. Wait a new checkpoint
>    a. Call libxl__checkpoint_devices_commit().
>    b. Read CHECKPOINT_NEW from master.
> 3. Suspend secondary vm
>    a. Suspend secondary vm.
>    b. Call libxl__checkpoint_devices_postsuspend().
>    c. Send CHECKPOINT_SVM_SUSPENDED to master.
> 4. Checkpoint
>    a. Read emulator xenstore data and emulator context
>    b. REC_TYPE_CHECKPOINT_END
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 14/26] primary vm suspend/resume/checkpoint code
  2016-03-25  6:44 ` [PATCH v13 14/26] primary vm suspend/resume/checkpoint code Changlong Xie
@ 2016-03-30 14:10   ` Ian Jackson
  0 siblings, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:10 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13 14/26] primary vm suspend/resume/checkpoint code"):
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> We will do the following things again and again:

Thanks for changing the error paths and callback handling.  I haven't
reviewed the complete flow in detail, but the general pattern looks
plausible AFAICT.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 20/26] Support colo mode for qemu disk
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
@ 2016-03-30 14:17     ` Ian Jackson
  2016-03-30 14:36     ` Ian Jackson
  1 sibling, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:17 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13.1 20/26] Support colo mode for qemu disk"):
>  From 468ff9fb2f6699314c28f30a7d7d09eac9aa6756 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Mon, 21 Mar 2016 15:38:30 +0800
> Subject: [PATCH v13.1 20/26] Support colo mode for qemu disk
> 
> Usage: disk = 
> ['...,colo,colo-host=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
> For QEMU block replication details:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Note: we just introduce COLO framework, but don't implement COLO
> operations in this patch.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>

Thanks.

I diffed this patch from v13 and got a bunch of formatting
differences.  I think maybe this patch has gone through an unusual
email/cut-and-paste/whatever path.

Ie I think this revised patch has probably been mangled, but it is
probably fine in its un-mangled form.

Can you provide the whole of the v13.1 series as a git branch ?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
@ 2016-03-30 14:22     ` Ian Jackson
  2016-03-30 14:38     ` Ian Jackson
  1 sibling, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:22 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13.1 23/26] COLO nic: implement COLO nic subkind"):
>  From 699f20d46fcce0bcce8fd7f7063551088a425254 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Wed, 15 Jul 2015 17:18:53 +0800
> Subject: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
> 
> implement COLO nic subkind.

Again, this patch seems to have been mangled somehow.

Sorry,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 25/26] setup and control colo proxy on secondary side
  2016-03-25  6:44 ` [PATCH v13 25/26] setup and control colo proxy on secondary side Changlong Xie
@ 2016-03-30 14:24   ` Ian Jackson
  2016-03-31  2:19     ` Changlong Xie
  0 siblings, 1 reply; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:24 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13 25/26] setup and control colo proxy on secondary side"):
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>

I think I acked this in v12.  I guess you probably overlooked that,
but your v13 00/25 says

  p25, Add A-B

Can you please check that you didn't mistakenly add the acked-by to
the wrong patch ?

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 26/26] cmdline switches and config vars to control colo-proxy
  2016-03-28  3:47   ` [PATCH v13.1 " Changlong Xie
@ 2016-03-30 14:28     ` Ian Jackson
  2016-03-30 14:42     ` Ian Jackson
  1 sibling, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:28 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13.1 26/26] cmdline switches and config vars to control colo-proxy"):
>  From 1bfd14622455635c6cae6130396250996e49facc Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Wed, 15 Jul 2015 17:18:56 +0800
> Subject: [PATCH v13.1 26/26] cmdline switches and config vars to control 
> colo-proxy
> 
> Add cmdline switches to 'xl migrate-receive' command to specify
> a domain-specific hotplug script to setup COLO proxy.
> 
> Add a new config var 'colo.default.agentscript' to xl.conf, that
> allows the user to override the default global script used to
> setup COLO proxy.

Thanks.

Again, this v13.1 has been mangled somehow.  I think this is probably
fine, but it's hard to tell.

In general, it would have been helpful to know what intentional
changes there were between each v13 and the corresponding v13.1.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 20/26] Support colo mode for qemu disk
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
  2016-03-30 14:17     ` Ian Jackson
@ 2016-03-30 14:36     ` Ian Jackson
  1 sibling, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:36 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13.1 20/26] Support colo mode for qemu disk"):
>  From 468ff9fb2f6699314c28f30a7d7d09eac9aa6756 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Mon, 21 Mar 2016 15:38:30 +0800
> Subject: [PATCH v13.1 20/26] Support colo mode for qemu disk

I found a proper copy of this in:
  https://github.com/Pating/xen.git#b226be1b0751

That commit is

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

From b226be1b0751d675dcd604c4a1dbeec5757bb051 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Mon, 21 Mar 2016 15:38:30 +0800
Subject: [PATCH 21/28] Support colo mode for qemu disk

Usage: disk = ['...,colo,colo-host=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Note: we just introduce COLO framework, but don't implement COLO
operations in this patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 docs/man/xl.pod.1                   |   41 +++++++++++--
 docs/misc/xl-disk-configuration.txt |   55 +++++++++++++++++
 tools/libxl/libxl.c                 |   51 +++++++++++++++-
 tools/libxl/libxl_create.c          |   26 +++++++-
 tools/libxl/libxl_device.c          |   11 ++++
 tools/libxl/libxl_dm.c              |  113 ++++++++++++++++++++++++++++++++++-
 tools/libxl/libxl_types.idl         |    9 +++
 tools/libxl/libxlu_disk_l.l         |   19 ++++++
 8 files changed, 317 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index a992a45..2664402 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -450,12 +450,43 @@ Print huge (!) amount of debug during the migration process.
 Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
 transport mechanism between the two hosts.
 
-N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     Disk replication support is limited to DRBD disks.
+B<NOTES>
+
+=over 4
+
+Remus support in xl is still in experimental (proof-of-concept) phase.
+Disk replication support is limited to DRBD disks.
+
+COLO support in xl is still in experimental (proof-of-concept) phase.
+There is no support for network, so the guest will confuse its network
+peers at the moment.
 
-     COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk, so the guest will corrupt its
-     disk and confuse its network peers at the moment.
+=back
+
+B<EXAMPLE>
+
+=over 4
+
+(a) An example for COLO replication's configuration: disk =['...,colo,colo-host
+=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
+
+=item B<colo-host>      :Secondary host's ip address.
+
+=item B<colo-port>      :Secondary host's port, we will run a nbd server on
+secondary host, and the nbd server will listen this port.
+
+=item B<colo-export>    :Nbd server's disk export name of secondary host.
+
+=item B<active-disk>    :Secondary's guest write will be buffered in this disk,
+and it's used by secondary.
+
+=item B<hidden-disk>    :Primary's modified contents will be buffered in this
+disk, and it's used by secondary.
+
+Note that the COLO configuration settings should be considered unstable. They
+may change incompatibly in future versions of Xen.
+
+=back
 
 B<OPTIONS>
 
diff --git a/docs/misc/xl-disk-configuration.txt b/docs/misc/xl-disk-configuration.txt
index 29f6ddb..b3402bc 100644
--- a/docs/misc/xl-disk-configuration.txt
+++ b/docs/misc/xl-disk-configuration.txt
@@ -234,6 +234,61 @@ were intentionally created non-sparse to avoid fragmentation of the
 file.
 
 
+===============
+COLO PARAMETERS
+===============
+
+
+colo
+----
+
+Enable COLO HA for disk. For better understanding block replication on
+QEMU, please refer to:
+http://wiki.qemu.org/Features/BlockReplication
+Note that the COLO configuration settings should be considered unstable.
+They may change incompatibly in future versions of Xen.
+
+
+colo-host
+---------
+
+Description:           Secondary host's address
+Mandatory:             Yes when COLO enabled
+
+
+colo-port
+---------
+
+Description:           Secondary port
+                       We will run a nbd server on secondary host,
+                       and the nbd server will listen this port.
+Mandatory:             Yes when COLO enabled
+
+
+colo-export
+-----------
+
+Description:           We will run a nbd server on secondary host,
+                       exportname is the nbd server's disk export name.
+Mandatory:             Yes when COLO enabled
+
+
+active-disk
+-----------
+
+Description:           This is used by secondary. Secondary guest's write
+                       will be buffered in this disk.
+Mandatory:             Yes when COLO enabled
+
+
+hidden-disk
+-----------
+
+Description:           This is used by secondary. It buffers the original
+                       content that is modified by the primary VM.
+Mandatory:             Yes when COLO enabled
+
+
 ============================================
 DEPRECATED PARAMETERS, PREFIXES AND SYNTAXES
 ============================================
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 349a3c6..63fbe16 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2306,6 +2306,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk)
     int rc;
 
     libxl_defbool_setdefault(&disk->discard_enable, !!disk->readwrite);
+    libxl_defbool_setdefault(&disk->colo_enable, false);
+    libxl_defbool_setdefault(&disk->colo_restore_enable, false);
 
     rc = libxl__resolve_domid(gc, disk->backend_domname, &disk->backend_domid);
     if (rc < 0) return rc;
@@ -2504,6 +2506,18 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
                 flexarray_append(back, "params");
                 flexarray_append(back, GCSPRINTF("%s:%s",
                               libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                if (libxl_defbool_val(disk->colo_enable)) {
+                    flexarray_append(back, "colo-host");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_host));
+                    flexarray_append(back, "colo-port");
+                    flexarray_append(back, libxl__sprintf(gc, "%d", disk->colo_port));
+                    flexarray_append(back, "colo-export");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_export));
+                    flexarray_append(back, "active-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->active_disk));
+                    flexarray_append(back, "hidden-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->hidden_disk));
+                }
                 assert(device->backend_kind == LIBXL__DEVICE_KIND_QDISK);
                 break;
             default:
@@ -2619,7 +2633,12 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         goto cleanup;
     }
 
-    /* "params" may not be present; but everything else must be. */
+    /*
+     * "params" may not be present; but everything else must be.
+     * colo releated entries(colo-host, colo-port, colo-export,
+     * active-disk and hidden-disk) are present only if colo is
+     * enabled.
+     */
     tmp = xs_read(ctx->xsh, XBT_NULL,
                   GCSPRINTF("%s/params", be_path), &len);
     if (tmp && strchr(tmp, ':')) {
@@ -2629,6 +2648,36 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         disk->pdev_path = tmp;
     }
 
+    tmp = xs_read(ctx->xsh, XBT_NULL,
+                  GCSPRINTF("%s/colo-host", be_path), &len);
+    if (tmp) {
+        libxl_defbool_set(&disk->colo_enable, true);
+        disk->colo_host = tmp;
+
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      GCSPRINTF("%s/colo-port", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/colo-port", be_path);
+            goto cleanup;
+        }
+        disk->colo_port = atoi(tmp);
+
+#define XS_READ_COLO(param, item) do {                                  \
+        tmp = xs_read(ctx->xsh, XBT_NULL,                               \
+                      GCSPRINTF("%s/"#param"", be_path), &len);         \
+        if (!tmp) {                                                     \
+            LOG(ERROR, "Missing xenstore node %s/"#param"", be_path);   \
+            goto cleanup;                                               \
+        }                                                               \
+        disk->item = tmp;                                               \
+} while (0)
+        XS_READ_COLO(colo-export, colo_export);
+        XS_READ_COLO(active-disk, active_disk);
+        XS_READ_COLO(hidden-disk, hidden_disk);
+#undef XS_READ_COLO
+    } else {
+        libxl_defbool_set(&disk->colo_enable, false);
+    }
 
     tmp = libxl__xs_read(gc, XBT_NULL,
                          GCSPRINTF("%s/type", be_path));
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index be604e5..e2ec25c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1876,12 +1876,30 @@ static void domain_create_cb(libxl__egc *egc,
 
     libxl__ao_complete(egc, ao, rc);
 }
-    
+
+
+static void set_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, true);
+}
+
+static void unset_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, false);
+}
+
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
+    unset_disk_colo_restore(d_config);
     return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
                             ao_how, aop_console_how);
 }
@@ -1893,6 +1911,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        set_disk_colo_restore(d_config);
+    } else {
+        unset_disk_colo_restore(d_config);
+    }
+
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
                             params, ao_how, aop_console_how);
 }
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 4ced9b6..6a411c6 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -196,6 +196,9 @@ static int disk_try_backend(disk_try_backend_args *a,
             goto bad_format;
         }
 
+        if (libxl_defbool_val(a->disk->colo_enable))
+            goto bad_colo;
+
         if (a->disk->backend_domid != LIBXL_TOOLSTACK_DOMID) {
             LOG(DEBUG, "Disk vdev=%s, is using a storage driver domain, "
                        "skipping physical device check", a->disk->vdev);
@@ -218,6 +221,9 @@ static int disk_try_backend(disk_try_backend_args *a,
     case LIBXL_DISK_BACKEND_TAP:
         if (a->disk->script) goto bad_script;
 
+        if (libxl_defbool_val(a->disk->colo_enable))
+            goto bad_colo;
+
         if (a->disk->is_cdrom) {
             LOG(DEBUG, "Disk vdev=%s, backend tap unsuitable for cdroms",
                        a->disk->vdev);
@@ -256,6 +262,11 @@ static int disk_try_backend(disk_try_backend_args *a,
     LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with script=...",
         a->disk->vdev, libxl_disk_backend_to_string(backend));
     return 0;
+
+ bad_colo:
+    LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
 }
 
 int libxl__device_disk_set_backend(libxl__gc *gc, libxl_device_disk *disk) {
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 1d1b25b..4c3dff8 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -754,6 +754,8 @@ static int libxl__dm_runas_helper(libxl__gc *gc, const char *username)
 /* colo mode */
 enum {
     LIBXL__COLO_NONE = 0,
+    LIBXL__COLO_PRIMARY,
+    LIBXL__COLO_SECONDARY,
 };
 
 static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
@@ -762,6 +764,9 @@ static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
                                          int colo_mode)
 {
     char *drive = NULL;
+    const char *exportname = disk->colo_export;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
 
     switch (colo_mode) {
     case LIBXL__COLO_NONE:
@@ -769,6 +774,45 @@ static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
             (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
              pdev_path, unit, format);
         break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=scsi,bus=0,unit=x,cache=writeback,driver=quorum,\
+         *  id=exportname,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  read-pattern=fifo,\
+         *  vote-threshold=1
+         */
+        drive = GCSPRINTF(
+            "if=scsi,bus=0,unit=%d,cache=writeback,driver=quorum,"
+            "id=%s,"
+            "children.0.file.filename=%s,"
+            "children.0.driver=%s,"
+            "read-pattern=fifo,"
+            "vote-threshold=1",
+            unit, exportname, pdev_path, format);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=scsi,bus=0,unit=x,cache=writeback,driver=replication,\
+         *  mode=secondary,\
+         *  file.driver=qcow2,\
+         *  file.file.filename=active_disk,\
+         *  file.backing.driver=qcow2,\
+         *  file.backing.file.filename=hidden_disk,\
+         *  file.backing.backing=exportname,
+         */
+        drive = GCSPRINTF(
+            "if=scsi,bus=0,unit=%d,cache=writeback,driver=replication,"
+            "mode=secondary,"
+            "file.driver=qcow2,"
+            "file.file.filename=%s,"
+            "file.backing.driver=qcow2,"
+            "file.backing.file.filename=%s,"
+            "file.backing.backing=%s",
+            unit, active_disk, hidden_disk, exportname);
     default:
          abort();
     }
@@ -782,6 +826,9 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
                                         int colo_mode)
 {
     char *drive = NULL;
+    const char *exportname = disk->colo_export;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
 
     switch (colo_mode) {
     case LIBXL__COLO_NONE:
@@ -789,6 +836,46 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
             ("file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
              pdev_path, unit, format);
         break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=ide,index=x,media=disk,cache=writeback,driver=quorum,\
+         *  id=exportname,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  read-pattern=fifo,\
+         *  vote-threshold=1
+         */
+        drive = GCSPRINTF(
+            "if=ide,index=%d,media=disk,cache=writeback,driver=quorum,"
+            "id=%s,"
+            "children.0.file.filename=%s,"
+            "children.0.driver=%s,"
+            "read-pattern=fifo,"
+            "vote-threshold=1",
+             unit, exportname, pdev_path, format);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=ide,index=x,media=disk,cache=writeback,driver=replication,\
+         *  mode=secondary,\
+         *  file.driver=qcow2,\
+         *  file.file.filename=active_disk,\
+         *  file.backing.driver=qcow2,\
+         *  file.backing.file.filename=hidden_disk,\
+         *  file.backing.backing=exportname,
+         */
+        drive = GCSPRINTF(
+            "if=ide,index=%d,media=disk,cache=writeback,driver=replication,"
+            "mode=secondary,"
+            "file.driver=qcow2,"
+            "file.file.filename=%s,"
+            "file.backing.driver=qcow2,"
+            "file.backing.file.filename=%s,"
+            "file.backing.backing=%s",
+            unit, active_disk, hidden_disk, exportname);
+        break;
     default:
          abort();
     }
@@ -1261,8 +1348,24 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                  * hd[a-d] and ignore the rest.
                  */
 
-                colo_mode = LIBXL__COLO_NONE;
+                if (libxl_defbool_val(disks[i].colo_enable)) {
+                    if (libxl_defbool_val(disks[i].colo_restore_enable))
+                        colo_mode = LIBXL__COLO_SECONDARY;
+                    else
+                        colo_mode = LIBXL__COLO_PRIMARY;
+                } else {
+                    colo_mode = LIBXL__COLO_NONE;
+                }
+
                 if (strncmp(disks[i].vdev, "sd", 2) == 0) {
+                    if (colo_mode == LIBXL__COLO_SECONDARY) {
+                        drive = libxl__sprintf
+                            (gc, "if=none,driver=%s,file=%s,id=%s",
+                             format, pdev_path, disks[i].colo_export);
+
+                        flexarray_append(dm_args, "-drive");
+                        flexarray_append(dm_args, drive);
+                    }
                     drive = qemu_disk_scsi_drive_string(gc, pdev_path, disk,
                                                         format,
                                                         &disks[i],
@@ -1289,6 +1392,14 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                         LOG(ERROR, "qemu-xen doesn't support read-only IDE disk drivers");
                         return ERROR_INVAL;
                     }
+                    if (colo_mode == LIBXL__COLO_SECONDARY) {
+                        drive = libxl__sprintf
+                            (gc, "if=none,driver=%s,file=%s,id=%s",
+                             format, pdev_path, disks[i].colo_export);
+
+                        flexarray_append(dm_args, "-drive");
+                        flexarray_append(dm_args, drive);
+                    }
                     drive = qemu_disk_ide_drive_string(gc, pdev_path, disk,
                                                        format,
                                                        &disks[i],
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 95efd82..8335291 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -576,6 +576,15 @@ libxl_device_disk = Struct("device_disk", [
     ("is_cdrom", integer),
     ("direct_io_safe", bool),
     ("discard_enable", libxl_defbool),
+    # Note that the COLO configuration settings should be considered unstable.
+    # They may change incompatibly in future versions of Xen.
+    ("colo_enable", libxl_defbool),
+    ("colo_restore_enable", libxl_defbool),
+    ("colo_host", string),
+    ("colo_port", integer),
+    ("colo_export", string),
+    ("active_disk", string),
+    ("hidden_disk", string)
     ])
 
 libxl_device_nic = Struct("device_nic", [
diff --git a/tools/libxl/libxlu_disk_l.l b/tools/libxl/libxlu_disk_l.l
index 1a5deb5..5b6db22 100644
--- a/tools/libxl/libxlu_disk_l.l
+++ b/tools/libxl/libxlu_disk_l.l
@@ -113,6 +113,16 @@ static void setbackendtype(DiskParseContext *dpc, const char *str) {
     else xlu__disk_err(dpc,str,"unknown value for backendtype");
 }
 
+/* Sets ->colo-port from the string.  COLO need this. */
+static void setcoloport(DiskParseContext *dpc, const char *str) {
+    int port = atoi(str);
+    if (port) {
+       dpc->disk->colo_port = port;
+    } else {
+	xlu__disk_err(dpc,str,"unknown value for colo_port");
+    }
+}
+
 #define DEPRECATE(usewhatinstead) /* not currently reported */
 
 /* Handles a vdev positional parameter which includes a devtype. */
@@ -176,6 +186,15 @@ script=[^,]*,?	{ STRIP(','); SAVESTRING("script", script, FROMEQUALS); }
 direct-io-safe,? { DPC->disk->direct_io_safe = 1; }
 discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, true); }
 no-discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, false); }
+ /* Note that the COLO configuration settings should be considered unstable.
+  * They may change incompatibly in future versions of Xen. */
+colo,?		{ libxl_defbool_set(&DPC->disk->colo_enable, true); }
+no-colo,?	{ libxl_defbool_set(&DPC->disk->colo_enable, false); }
+colo-host=[^,]*,?	{ STRIP(','); SAVESTRING("colo-host", colo_host, FROMEQUALS); }
+colo-port=[^,]*,?	{ STRIP(','); setcoloport(DPC, FROMEQUALS); }
+colo-export=[^,]*,?	{ STRIP(','); SAVESTRING("colo-export", colo_export, FROMEQUALS); }
+active-disk=[^,]*,?	{ STRIP(','); SAVESTRING("active-disk", active_disk, FROMEQUALS); }
+hidden-disk=[^,]*,?	{ STRIP(','); SAVESTRING("hidden-disk", hidden_disk, FROMEQUALS); }
 
  /* the target magic parameter, eats the rest of the string */
 
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
  2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
  2016-03-30 14:22     ` Ian Jackson
@ 2016-03-30 14:38     ` Ian Jackson
  2016-03-30 14:40       ` Ian Jackson
  1 sibling, 1 reply; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:38 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13.1 23/26] COLO nic: implement COLO nic subkind"):
>  From 699f20d46fcce0bcce8fd7f7063551088a425254 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Wed, 15 Jul 2015 17:18:53 +0800
> Subject: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
> 
> implement COLO nic subkind.

Changlong Xie writes ("[PATCH v13.1 20/26] Support colo mode for qemu disk"):
>  From 468ff9fb2f6699314c28f30a7d7d09eac9aa6756 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Mon, 21 Mar 2016 15:38:30 +0800
> Subject: [PATCH v13.1 20/26] Support colo mode for qemu disk

I found a proper copy of this in:
  https://github.com/Pating/xen.git#c8284df4eb79

That commit is

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

From c8284df4eb79e2f0e2bc82a7f784b4c3d4464f05 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Wed, 15 Jul 2015 17:18:53 +0800
Subject: [PATCH 24/28] COLO nic: implement COLO nic subkind

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |    1 +
 tools/hotplug/Linux/colo-proxy-setup |  135 ++++++++++++++
 tools/libxl/Makefile                 |    1 +
 tools/libxl/libxl_colo.h             |   10 ++
 tools/libxl/libxl_colo_nic.c         |  320 ++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h         |    2 +
 tools/libxl/libxl_types.idl          |    3 +
 7 files changed, 472 insertions(+)
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo_nic.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 6e10118..9bb852b 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -28,6 +28,7 @@ XEN_SCRIPTS += block-iscsi
 XEN_SCRIPTS += block-tap
 XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
+XEN_SCRIPTS += colo-proxy-setup
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
 
diff --git a/tools/hotplug/Linux/colo-proxy-setup b/tools/hotplug/Linux/colo-proxy-setup
new file mode 100755
index 0000000..94e2034
--- /dev/null
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -0,0 +1,135 @@
+#! /bin/bash
+
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+. "$dir/hotplugpath.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+    echo "Invalid command: $command"
+    log err "Invalid command: $command"
+    exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${forwarddev:?}
+: ${mode:?}
+: ${index:?}
+: ${bridge:?}
+
+forwardbr="colobr0"
+
+if [ "$mode" != "primary" -a "$mode" != "secondary" ]
+then
+    echo "Invalid mode: $mode"
+    log err "Invalid mode: $mode"
+    exit 1
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 1
+fi
+
+function setup_primary()
+{
+    do_without_error tc qdisc add dev $vifname root handle 1: prio
+    do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \
+        12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \
+        dev $forwarddev
+
+    do_without_error modprobe nf_conntrack_ipv4
+    do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev
+
+    iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -I INPUT -i $forwarddev -j MARK --set-mark $index
+}
+
+function teardown_primary()
+{
+    do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc qdisc del dev $vifname root handle 1: prio
+
+    do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error ip6tables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -F
+    do_without_error rmmod xt_PMYCOLO
+}
+
+function setup_secondary()
+{
+    do_without_error brctl delif $bridge $vifname
+    do_without_error brctl addbr $forwardbr
+    do_without_error brctl addif $forwardbr $vifname
+    do_without_error brctl addif $forwardbr $forwarddev
+    do_without_error ip link set dev $forwardbr up
+    do_without_error modprobe xt_SECCOLO
+
+    iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+}
+
+function teardown_secondary()
+{
+    do_without_error brctl delif $forwardbr $forwarddev
+    do_without_error brctl delif $forwardbr $vifname
+    do_without_error brctl delbr $forwardbr
+    do_without_error brctl addif $bridge $vifname
+
+    do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error ip6tables -t mangle -D PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error rmmod xt_SECCOLO
+}
+
+case "$command" in
+    setup)
+        if [ "$mode" = "primary" ]
+        then
+            setup_primary
+        else
+            setup_secondary
+        fi
+
+        success
+        ;;
+    teardown)
+        if [ "$mode" = "primary" ]
+        then
+            teardown_primary
+        else
+            teardown_secondary
+        fi
+        ;;
+esac
+
+if [ "$mode" = "primary" ]
+then
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev."
+else
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev,"\
+              " forwardbr: $forwardbr."
+fi
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 72f3b1a..a433aaa 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -68,6 +68,7 @@ LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
 LIBXL_OBJS-y += libxl_colo_proxy.o
+LIBXL_OBJS-y += libxl_colo_nic.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index a529ce8..5fbb659 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -40,6 +40,11 @@ enum colo_netlink_op {
     COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
 };
 
+typedef struct libxl__colo_device_nic {
+    int devid;
+    const char *vif;
+} libxl__colo_device_nic;
+
 typedef struct libxl__colo_qdisk {
     bool setuped;
 } libxl__colo_qdisk;
@@ -70,6 +75,7 @@ struct libxl__colo_restore_state {
     int recv_fd;
     int hvm;
     libxl__colo_callback *callback;
+    char *colo_proxy_script;
 
     /* private, colo restore checkpoint state */
     libxl__domain_create_cb *saved_cb;
@@ -89,6 +95,10 @@ int init_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
 
 void cleanup_subkind_qdisk(struct libxl__checkpoint_devices_state *cds);
 
+int init_subkind_colo_nic(struct libxl__checkpoint_devices_state *cds);
+
+void cleanup_subkind_colo_nic(struct libxl__checkpoint_devices_state *cds);
+
 extern void libxl__colo_restore_setup(struct libxl__egc *egc,
                                       libxl__colo_restore_state *crs);
 extern void libxl__colo_restore_teardown(struct libxl__egc *egc, void *dcs_void,
diff --git a/tools/libxl/libxl_colo_nic.c b/tools/libxl/libxl_colo_nic.c
new file mode 100644
index 0000000..2e00c28
--- /dev/null
+++ b/tools/libxl/libxl_colo_nic.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (C) 2016 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+enum {
+    primary,
+    secondary,
+};
+
+/* ========== init() and cleanup() ========== */
+
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+    return 0;
+}
+
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== helper functions ========== */
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                     libxl__async_exec_state *aes,
+                                     int rc, int status);
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status);
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__checkpoint_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->cds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $forwarddev
+ * $mode(primary/secondary)
+ * $index
+ * $bridge
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op,
+                             libxl__colo_proxy_state *cps, int side,
+                             char *colo_proxy_script)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const char *const vif = colo_nic->vif;
+
+    arraysize = 11;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "forwarddev";
+    env[nr++] = libxl__strdup(gc, nic->coloft_forwarddev);
+    env[nr++] = "mode";
+    if (side == primary)
+        env[nr++] = "primary";
+    else
+        env[nr++] = "secondary";
+    env[nr++] = "index";
+    env[nr++] = GCSPRINTF("%d", cps->index);
+    env[nr++] = "bridge";
+    env[nr++] = libxl__strdup(gc, nic->bridge);
+    env[nr++] = NULL;
+    assert(nr == arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = colo_proxy_script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->cds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = colo_save_teardown_script_cb;
+    else
+        aes->callback = colo_save_setup_script_cb;
+}
+
+/* ========== setup() and teardown() ========== */
+
+static void colo_nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                           libxl__colo_proxy_state *cps, int side,
+                           char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices, we begin to setup the nic device
+     */
+    dev->matched = 1;
+
+    if (!nic->coloft_forwarddev) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    GCNEW(colo_nic);
+    dev->concrete_data = colo_nic;
+    colo_nic->devid = nic->devid;
+    colo_nic->vif = get_vifname(dev, nic);
+    if (!colo_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup", cps, side, colo_proxy_script);
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    const char *out_path_base, *hotplug_error = NULL;
+
+    EGC_GC;
+
+    /* Convenience aliases */
+    const uint32_t domid = cds->domid;
+    const int devid = colo_nic->devid;
+    const char *const vif = colo_nic->vif;
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    if (rc)
+        goto out;
+
+    out_path_base = GCSPRINTF("%s/colo_proxy/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "colo_proxy script %s setup failed for vif %s: %s",
+            aes->args[0], vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void colo_nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                              libxl__colo_proxy_state *cps, int side,
+                              char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+
+    if (!colo_nic || !colo_nic->vif) {
+        /* colo nic has not yet been set up, just return */
+        rc = 0;
+        goto out;
+    }
+
+    setup_async_exec(dev, "teardown", cps, side, colo_proxy_script);
+
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/* ======== primary ======== */
+
+static void colo_nic_save_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = dev->cds->concrete_data;
+
+    colo_nic_setup(egc, dev, &css->cps, primary, css->colo_proxy_script);
+}
+
+static void colo_nic_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = dev->cds->concrete_data;
+
+    colo_nic_teardown(egc, dev, &css->cps, primary, css->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_save_setup,
+    .teardown = colo_nic_save_teardown,
+};
+
+/* ======== secondary ======== */
+
+static void colo_nic_restore_setup(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = dev->cds->concrete_data;
+
+    colo_nic_setup(egc, dev, &crs->cps, secondary, crs->colo_proxy_script);
+}
+
+static void colo_nic_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = dev->cds->concrete_data;
+
+    colo_nic_teardown(egc, dev, &crs->cps, secondary, crs->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_restore_setup,
+    .teardown = colo_nic_restore_teardown,
+};
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8f02222..759b8d0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3206,6 +3206,7 @@ typedef struct libxl__colo_save_state libxl__colo_save_state;
 struct libxl__colo_save_state {
     int send_fd;
     int recv_fd;
+    char *colo_proxy_script;
 
     /* private */
     libxl__stream_read_state srs;
@@ -3554,6 +3555,7 @@ struct libxl__domain_create_state {
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
     int guest_domid;
+    const char *colo_proxy_script;
     libxl__domain_build_state build_state;
     libxl__colo_restore_state crs;
     libxl__checkpoint_devices_state cds;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 8335291..f88fae0 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -602,6 +602,9 @@ libxl_device_nic = Struct("device_nic", [
     ("rate_bytes_per_interval", uint64),
     ("rate_interval_usecs", uint32),
     ("gatewaydev", string),
+    # Note that the COLO configuration settings should be considered unstable.
+    # They may change incompatibly in future versions of Xen.
+    ("coloft_forwarddev", string)
     ])
 
 libxl_device_pci = Struct("device_pci", [
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
  2016-03-30 14:38     ` Ian Jackson
@ 2016-03-30 14:40       ` Ian Jackson
  0 siblings, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:40 UTC (permalink / raw)
  To: Changlong Xie, xen devel, Konrad Rzeszutek Wilk, Andrew Cooper,
	Ian Campbell, Wei Liu, Dong Eddie, Jiang Yunhong, Wen Congyang,
	Li Zhijian, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang,
	Lars Kurth, Anthony Perard

Ian Jackson writes ("Re: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind"):

Sorry, this mail was unclear.  I meant to refer to this:

> Changlong Xie writes ("[PATCH v13.1 23/26] COLO nic: implement COLO nic subkind"):
> >  From 699f20d46fcce0bcce8fd7f7063551088a425254 Mon Sep 17 00:00:00 2001
> > From: Wen Congyang <wency@cn.fujitsu.com>
> > Date: Wed, 15 Jul 2015 17:18:53 +0800
> > Subject: [PATCH v13.1 23/26] COLO nic: implement COLO nic subkind
> > 
> > implement COLO nic subkind.

The reference to this:

> > Subject: [PATCH v13.1 20/26] Support colo mode for qemu disk

is a c&p mistake.


> I found a proper copy of this in:
>   https://github.com/Pating/xen.git#c8284df4eb79
> 
> That commit is
> 
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13.1 26/26] cmdline switches and config vars to control colo-proxy
  2016-03-28  3:47   ` [PATCH v13.1 " Changlong Xie
  2016-03-30 14:28     ` Ian Jackson
@ 2016-03-30 14:42     ` Ian Jackson
  1 sibling, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:42 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13.1 26/26] cmdline switches and config vars to control colo-proxy"):
>  From 1bfd14622455635c6cae6130396250996e49facc Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Wed, 15 Jul 2015 17:18:56 +0800
> Subject: [PATCH v13.1 26/26] cmdline switches and config vars to control 
> colo-proxy

I found a proper copy of this in:
  https://github.com/Pating/xen.git#27e64fe8a495

That commit is

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

From 27e64fe8a4954b5a5628696160471f89a9e6ae6e Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Wed, 15 Jul 2015 17:18:56 +0800
Subject: [PATCH 27/28] cmdline switches and config vars to control colo-proxy

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5           |    6 +++++
 docs/man/xl.pod.1                |    7 ++++--
 tools/libxl/libxl.c              |    6 +++++
 tools/libxl/libxl_colo_restore.c |    5 ++++
 tools/libxl/libxl_create.c       |    9 ++++++--
 tools/libxl/libxl_types.idl      |    1 +
 tools/libxl/xl.c                 |    3 +++
 tools/libxl/xl.h                 |    1 +
 tools/libxl/xl_cmdimpl.c         |   47 ++++++++++++++++++++++++++++++--------
 9 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 8ae19bb..8f7fd28 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -111,6 +111,12 @@ Configures the default script used by Remus to setup network buffering.
 
 Default: C</etc/xen/scripts/remus-netbuf-setup>
 
+=item B<colo.default.proxyscript="PATH">
+
+Configures the default script used by COLO to setup colo-proxy.
+
+Default: C</etc/xen/scripts/colo-proxy-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2664402..9df3302 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -458,8 +458,6 @@ Remus support in xl is still in experimental (proof-of-concept) phase.
 Disk replication support is limited to DRBD disks.
 
 COLO support in xl is still in experimental (proof-of-concept) phase.
-There is no support for network, so the guest will confuse its network
-peers at the moment.
 
 =back
 
@@ -483,6 +481,11 @@ and it's used by secondary.
 =item B<hidden-disk>    :Primary's modified contents will be buffered in this
 disk, and it's used by secondary.
 
+(b) An example for COLO network configuration: vif =[ '...,forwarddev=xxx,...']
+
+=item B<forwarddev>     :Forward devices for primary and secondary, there are
+directly connected.
+
 Note that the COLO configuration settings should be considered unstable. They
 may change incompatibly in future versions of Xen.
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 63fbe16..aabf3a7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3378,6 +3378,11 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid,
         flexarray_append(back, nic->ifname);
     }
 
+    if (nic->coloft_forwarddev) {
+        flexarray_append(back, "forwarddev");
+        flexarray_append(back, nic->coloft_forwarddev);
+    }
+
     flexarray_append(back, "mac");
     flexarray_append(back,GCSPRINTF(LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nic->mac)));
     if (nic->ip) {
@@ -3500,6 +3505,7 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc,
     nic->ip = READ_BACKEND(NOGC, "ip");
     nic->bridge = READ_BACKEND(NOGC, "bridge");
     nic->script = READ_BACKEND(NOGC, "script");
+    nic->coloft_forwarddev = READ_BACKEND(NOGC, "forwarddev");
 
     /* vif_ioemu nics use the same xenstore entries as vif interfaces */
     tmp = READ_BACKEND(gc, "type");
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index c8ad796..3483f39 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -233,6 +233,11 @@ void libxl__colo_restore_setup(libxl__egc *egc,
     crcs->crs = crs;
     crs->qdisk_setuped = false;
     crs->qdisk_used = false;
+    if (dcs->colo_proxy_script)
+        crs->colo_proxy_script = libxl__strdup(gc, dcs->colo_proxy_script);
+    else
+        crs->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                           libxl__xen_script_dir_path());
 
     /* setup dsps */
     crcs->dsps.ao = ao;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e2ec25c..d6028aa 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1690,6 +1690,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid, int restore_fd, int send_back_fd,
                             const libxl_domain_restore_params *params,
+                            const char *colo_proxy_script,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1713,6 +1714,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     }
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.domid_soft_reset = INVALID_DOMID;
+    cdcs->dcs.colo_proxy_script = colo_proxy_script;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
     cdcs->domid_out = domid;
 
@@ -1900,7 +1902,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, NULL, NULL,
                             ao_how, aop_console_how);
 }
 
@@ -1911,14 +1913,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
+    char *colo_proxy_script = NULL;
+
     if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        colo_proxy_script = params->colo_proxy_script;
         set_disk_colo_restore(d_config);
     } else {
         unset_disk_colo_restore(d_config);
     }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
-                            params, ao_how, aop_console_how);
+                            params, colo_proxy_script, ao_how, aop_console_how);
 }
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f88fae0..165b788 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -386,6 +386,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
     ("stream_version", uint32, {'init_val': '1'}),
+    ("colo_proxy_script", string),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index dfae84a..a272258 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -45,6 +45,7 @@ char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
 char *default_remus_netbufscript = NULL;
+char *default_colo_proxy_script = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -179,6 +180,8 @@ static void parse_global_config(const char *configfile,
 
     xlu_cfg_replace_string (config, "remus.default.netbufscript",
         &default_remus_netbufscript, 0);
+    xlu_cfg_replace_string (config, "colo.default.proxyscript",
+        &default_colo_proxy_script, 0);
 
     xlu_cfg_destroy(config);
 }
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 309627a..e601ca1 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -194,6 +194,7 @@ extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
 extern char *default_remus_netbufscript;
+extern char *default_colo_proxy_script;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25bd81a..91dcb63 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -158,6 +158,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    char *colo_proxy_script;
     int migrate_fd; /* -1 means none */
     int send_back_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
@@ -1053,6 +1054,8 @@ static int parse_nic_config(libxl_device_nic *nic, XLU_Config **config, char *to
         replace_string(&nic->model, oparg);
     } else if (MATCH_OPTION("rate", token, oparg)) {
         parse_vif_rate(config, oparg, nic);
+    } else if (MATCH_OPTION("forwarddev", token, oparg)) {
+        replace_string(&nic->coloft_forwarddev, oparg);
     } else if (MATCH_OPTION("accel", token, oparg)) {
         fprintf(stderr, "the accel parameter for vifs is currently not supported\n");
     } else {
@@ -3001,6 +3004,7 @@ start:
         params.checkpointed_stream = dom_info->checkpointed_stream;
         params.stream_version =
             (hdr.mandatory_flags & XL_MANDATORY_FLAG_STREAMv2) ? 2 : 1;
+        params.colo_proxy_script = dom_info->colo_proxy_script;
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
@@ -4733,7 +4737,8 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 
 static void migrate_receive(int debug, int daemonize, int monitor,
                             int send_fd, int recv_fd,
-                            libxl_checkpointed_stream checkpointed)
+                            libxl_checkpointed_stream checkpointed,
+                            char *colo_proxy_script)
 {
     uint32_t domid;
     int rc, rc2;
@@ -4762,6 +4767,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_back_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
+    dom_info.colo_proxy_script = colo_proxy_script;
 
     rc = create_domain(&dom_info);
     if (rc < 0) {
@@ -4955,8 +4961,11 @@ int main_migrate_receive(int argc, char **argv)
     int debug = 0, daemonize = 1, monitor = 1;
     libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
+    char *script = NULL;
     static struct option opts[] = {
         {"colo", 0, 0, 0x100},
+        /* It is a shame that the management code for disk is not here. */
+        {"coloft-script", 1, 0, 0x200},
         COMMON_LONG_OPTS
     };
 
@@ -4977,6 +4986,9 @@ int main_migrate_receive(int argc, char **argv)
     case 0x100:
         checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
         break;
+    case 0x200:
+        script = optarg;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -4985,7 +4997,7 @@ int main_migrate_receive(int argc, char **argv)
     }
     migrate_receive(debug, daemonize, monitor,
                     STDOUT_FILENO, STDIN_FILENO,
-                    checkpointed);
+                    checkpointed, script);
 
     return 0;
 }
@@ -8395,8 +8407,10 @@ int main_remus(int argc, char **argv)
         r_info.interval = 200;
 
     if (libxl_defbool_val(r_info.colo)) {
-        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
-            perror("Option -c conflicts with -i or -b");
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole) ||
+            !libxl_defbool_is_default(r_info.netbuf) ||
+            !libxl_defbool_is_default(r_info.diskbuf)) {
+            perror("option -c is conflict with -i, -d, -n or -b");
             exit(-1);
         }
 
@@ -8407,8 +8421,12 @@ int main_remus(int argc, char **argv)
         }
     }
 
-    if (!r_info.netbufscript)
-        r_info.netbufscript = default_remus_netbufscript;
+    if (!r_info.netbufscript) {
+        if (libxl_defbool_val(r_info.colo))
+            r_info.netbufscript = default_colo_proxy_script;
+        else
+            r_info.netbufscript = default_remus_netbufscript;
+    }
 
     if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
@@ -8421,10 +8439,19 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
-                      ssh_command, host,
-                      libxl_defbool_val(r_info.colo) ? "-c" : "-r",
-                      daemonize ? "" : " -e");
+            if (!libxl_defbool_val(r_info.colo)) {
+                xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
+                          ssh_command, host,
+                          "-r",
+                          daemonize ? "" : " -e");
+            } else {
+                xasprintf(&rune, "exec %s %s xl migrate-receive %s %s %s %s",
+                          ssh_command, host,
+                          "--colo",
+                          r_info.netbufscript ? "--coloft-script" : "",
+                          r_info.netbufscript ? r_info.netbufscript : "",
+                          daemonize ? "" : " -e");
+            }
         }
 
         save_domain_core_begin(domid, NULL, &config_data, &config_len);
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (26 preceding siblings ...)
  2016-03-25 15:51 ` [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wei Liu
@ 2016-03-30 14:50 ` Ian Jackson
  2016-03-31  1:26   ` Wen Congyang
  2016-03-31  2:28 ` Changlong Xie
  28 siblings, 1 reply; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:50 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("[PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service"):
> This patchset implemented the COLO feature for Xen.
> For detail/install/use of COLO feature, refer to:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> 
> You can get the codes from here:
> https://github.com/Pating/xen/tree/changlox/colo_v13

I fetched the branches `colo_v13' and `colo_v13_fixup' and it seems
that I can get the proper versions of v13.1 from the latter.  I have
acked them accordingly.

Can you confirm that that branch is what you intend for upstream ?

If so, I have a question about it:

There are two patches in it which have not been posted as part of your
series and are marked as "[DO NOT MERGE]".  (Thanks for your admirably
clear marking, btw.)

They are
  [DONT MERGE] don't create default ioreq server
  [DONT MERGE] tools/libxc: support to resume uncooperative HVM guests


The latter patch "support to resume uncooperative HVM guests" seems to
have been posted a number of times and AFAICT most recently as
  [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
on the 18th of February.

Is that not required for COLO ?  We need to sort this out, I think.


What is the status of the default ioreq server patch ?  Why is it in
your git branch ?


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-28  3:52   ` Changlong Xie
@ 2016-03-30 14:52     ` Ian Jackson
  0 siblings, 0 replies; 55+ messages in thread
From: Ian Jackson @ 2016-03-30 14:52 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service"):
> On 03/25/2016 11:51 PM, Wei Liu wrote:
> > I went over those unacked patches. The major thing I found is that you
> > didn't add in the warning text as Ian suggested. I've pointed out one
> > instance where you should add that. However, xl manage and libxl header
> > file changes are spread across multiple commits, so I'm not quite sure
> > which particular commit you should add in warning text.
> >
> 
> https://github.com/Pating/xen/tree/changlox/colo_v13_fixup
> 
> I just update p20, p23, p26 as Ian suggested

Oh, hi, thanks, I see our emails have crossed.  Thanks for confirming
that colo_v13_fixup is what I should be looking at.

Please see my other email in response to 00/26.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-30 14:50 ` Ian Jackson
@ 2016-03-31  1:26   ` Wen Congyang
  0 siblings, 0 replies; 55+ messages in thread
From: Wen Congyang @ 2016-03-31  1:26 UTC (permalink / raw)
  To: Ian Jackson, Changlong Xie
  Cc: Lars Kurth, Wei Liu, Ian Campbell, Li Zhijian, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen devel, Anthony Perard,
	Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

On 03/30/2016 10:50 PM, Ian Jackson wrote:
> Changlong Xie writes ("[PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service"):
>> This patchset implemented the COLO feature for Xen.
>> For detail/install/use of COLO feature, refer to:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> You can get the codes from here:
>> https://github.com/Pating/xen/tree/changlox/colo_v13
> 
> I fetched the branches `colo_v13' and `colo_v13_fixup' and it seems
> that I can get the proper versions of v13.1 from the latter.  I have
> acked them accordingly.
> 
> Can you confirm that that branch is what you intend for upstream ?
> 
> If so, I have a question about it:
> 
> There are two patches in it which have not been posted as part of your
> series and are marked as "[DO NOT MERGE]".  (Thanks for your admirably
> clear marking, btw.)
> 
> They are
>   [DONT MERGE] don't create default ioreq server
>   [DONT MERGE] tools/libxc: support to resume uncooperative HVM guests
> 
> 
> The latter patch "support to resume uncooperative HVM guests" seems to
> have been posted a number of times and AFAICT most recently as
>   [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
> on the 18th of February.
> 
> Is that not required for COLO ?  We need to sort this out, I think.

Yes, it is required for COLO.

> 
> 
> What is the status of the default ioreq server patch ?  Why is it in
> your git branch ?

I have reported this bug last year:
http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02850.html

This patch is just a temporary patch.

Thanks
Wen Congyang

> 
> 
> Thanks,
> Ian.
> 
> 
> .
> 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 25/26] setup and control colo proxy on secondary side
  2016-03-30 14:24   ` Ian Jackson
@ 2016-03-31  2:19     ` Changlong Xie
  0 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-03-31  2:19 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

On 03/30/2016 10:24 PM, Ian Jackson wrote:
> Changlong Xie writes ("[PATCH v13 25/26] setup and control colo proxy on secondary side"):
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
>
> I think I acked this in v12.  I guess you probably overlooked that,
> but your v13 00/25 says
>
>    p25, Add A-B
>
> Can you please check that you didn't mistakenly add the acked-by to
> the wrong patch ?

Hi Ian

I've checked all patches. It's my fault, i just forgot to add A-B in 
this patch.

Thanks
	-Xie
>
> Thanks,
> Ian.
>
>
> .
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
                   ` (27 preceding siblings ...)
  2016-03-30 14:50 ` Ian Jackson
@ 2016-03-31  2:28 ` Changlong Xie
  2016-03-31 14:22   ` Wei Liu
  28 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-03-31  2:28 UTC (permalink / raw)
  To: xen devel, Konrad Rzeszutek Wilk, Andrew Cooper, Ian Campbell,
	Ian Jackson, Wei Liu
  Cc: Lars Kurth, Wen Congyang, Li Zhijian, Gui Jianfeng,
	Jiang Yunhong, Dong Eddie, Anthony Perard, Shriram Rajagopalan,
	Yang Hongyang

I've checked all patches in this thread after Ian's comments, it seems
we can give A-B to p12, p14, p20, p23, p25, p26 now.

All in all, *all patches are acked-by*.

Thanks
	-Xie

On 03/25/2016 02:44 PM, Changlong Xie wrote:
> This patchset implemented the COLO feature for Xen.
> For detail/install/use of COLO feature, refer to:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>
> You can get the codes from here:
> https://github.com/Pating/xen/tree/changlox/colo_v13
>
> Changlog from v12 to v13
> 1. Rebase to the upstream xen
> 2. Address commnets from Ian and Liu Wei.
> p7, Add A-B
> p8, Add A-B
> p10, Add A-B
> p11, Add A-B
> p12, Add LOG(ERROR, )
> p13, Add A-B
> p14, Remove libxl__ao_complete(xxx)
> p15, Add A-B
> p16, Add A-B
> p17, Add A-B, replace "-c" with "--colo" for migrate-receive()
> p19, Add A-B, introduce "switch ... case ..."
> p21, Add A-B
> p22, Add A-B
> p23, replace "forwarddev" with "coloft_fowarddev"
> p24, Add A-B
> p25, Add A-B
> p26, replace "--script" with "--coloft-script"
>
> Changlog from v11 to v12
> 1. Rebase to the upstream xen
> 2. Address commnets from Ian, Liu Wei and Konard.
> Removed old p12,p13; introduce a new p13 what is splited out from old p15, introduce
> a new p19 what is splited out from old p20.
> p1, add A-B, and will update commit message when "xen-load-devices-state" relevant
> patch merged on qemu side
> p3, update comments, add assert() in libxl_domain_create_restore()
> p4, rename "dup_fd_helper" as "dup_cloexec", add missed newline
> p5, add A-B
> p7, remove repeated commit message, update the specification of libxl
> p8, update the specification of libxc
> p9, add A-B
> p10, update commit message, fix blank line issue
> p12, merged by old p12,p13(restore_callbacks wait_checkpoit/postcopy/suspend), fix blank
> line issues, update comments about why COLO only supports HVM
> p13, move stream read manipulations to right place in libxl_internal.h
> p14, merged by old p12(save_callbacks wait_checkpoint), fix blank line issues, update Copyright(C)
> p16, add "colo_" prefix for merge_secondary_dirty_bitmap()
> p17, update COLO description part on man page
> p18, fix long line issue
> p19, just introduce colo mode and refactor relevant functions
> p20, fix repetitive code in libxl__device_disk_from_xs_be(), make colo_port as int,
> remove unnecessary comments in libxl__build_device_model_args_new(), simplify
> disk_try_backend() and move the main part to in colo_qdisk_setup() in p21
> p21, fix blank line issue, update Copyright(C)
> p22, merged by old p22,p23, update Copyright(C), add commets for NETLINK_COLO, remove unnecessary
> '{ }', update url in commit message
> p23, fix blank line issue, add some comments for "forwarddev", update Copyright(C)
> p24, introduce COLO_PROXY_CHECKPOINT_TIMEOUT, ASYNC_CALL
> p26, move colo_proxy_script setup codes to libxl__colo_restore_setup(), introduce long options
> for main_migrate_receive()
>
> Changlog from v10 to v11
> 1. Rebased to then upstream xen
> 2. Address comments from Liu Wei
> p1, update commit message and remove libxl__domain_restore_device_model
> p4, add A-B
> p5, update commit message
> p6, add A-B
> p7,p8 add email address and direction info
> p10, merged by old p10,p11 and update comments
> p11, merged by old p12,p13 and update comments
> p14,p15 move colo structures and functions into libxl_colo.h, and list callbacks
> in order, also update commit message
> p16, merged by old p18,p19,p20 and remove TODOs
> p17, use original code for checking postcopy return value
> p18, simplify *if* logic, fix wrong comments, and unset dom_info.quiet in COLO
> p19, add A-B
> p20, fix code style, update comments and man page
> p21,p22,p23,p24 move colo structures and functions into libxl_colo.h
>
> Changlog from v9 to v10
> 1. Rebased to the upstream xen
> 2. Fix one bug found in the test
> 3. Merge some patches from prepare series
> 4. Split patch 5 to two patches(patch 4 and 5) according to the comments from
>     Wei Liu
>
> Changlog from v8 to v9:
> 1. Rebased to the upstream xen
> 2. Fix some bugs found in the test
>
> Changelog from v7 to v8:
> 1. Rebased to the latest libxl migration v2.
>
> Changelog from v6 to v7:
> 1. Ported to Libxl migration v2
> 2. Send dirty bitmap from secondary to primary on libxc side
> 3. Address review comments
>
> Changelog from v5 to v6:
> 1. based on migration v2(libxc)
> 2. split the patchset into prerequisite patchset and this main patchset.
>
> Changelog from v4 to v5:
> 1. rebase to the latest xen upstream
> 2. disk replication: blktap2->qdisk
> 3. nic replication: colo-agent->colo-proxy
>
> Changelog from v3 to v4:
> 1. rebase to newest xen
> 2. bug fix
>
> Changlog from v2 to v3:
> 1. rebase to newest remus
> 2. add nic replication support
>
> Changlog from v1 to v2:
> 1. rebase to newest remus
> 2. add disk replication support
>
> Changlong Xie (2):
>    libxl_internal: move stream read manipulations to right place
>    Introduce COLO mode and refactor relevant function
>
> Wen Congyang (24):
>    tools/libxl: introduction of libxl__qmp_restore to load qemu state
>    tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
>    tools/libxl: Add back channel to allow migration target send data back
>    tools/libxl: Introduce new helper function dup_fd_helper()
>    tools/libx{l,c}: add back channel to libxc
>    docs: add colo readme
>    docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo
>      streams
>    libxc/migration: Specification update for DIRTY_PFN_LIST records
>    libxc/migration: export read_record for common use
>    tools/libxl: add back channel support to write stream
>    tools/libxl: add back channel support to read stream
>    secondary vm suspend/resume/checkpoint code
>    primary vm suspend/resume/checkpoint code
>    libxc/restore: support COLO restore
>    libxc/save: support COLO save
>    implement the cmdline for COLO
>    COLO: introduce new API to prepare/start/do/get_error/stop replication
>    Support colo mode for qemu disk
>    COLO: use qemu block replication
>    COLO proxy: implement setup/teardown/preresume/postresume/checkpoint
>    COLO nic: implement COLO nic subkind
>    setup and control colo proxy on primary side
>    setup and control colo proxy on secondary side
>    cmdline switches and config vars to control colo-proxy
>
>   docs/README.colo                         |    9 +
>   docs/man/xl.conf.pod.5                   |    6 +
>   docs/man/xl.pod.1                        |   48 +-
>   docs/misc/xl-disk-configuration.txt      |   53 ++
>   docs/specs/libxc-migration-stream.pandoc |   27 +-
>   docs/specs/libxl-migration-stream.pandoc |   59 +-
>   tools/hotplug/Linux/Makefile             |    1 +
>   tools/hotplug/Linux/colo-proxy-setup     |  135 ++++
>   tools/libxc/include/xenguest.h           |   41 +-
>   tools/libxc/xc_nomigrate.c               |    4 +-
>   tools/libxc/xc_sr_common.c               |   80 ++-
>   tools/libxc/xc_sr_common.h               |   24 +-
>   tools/libxc/xc_sr_restore.c              |  246 +++++--
>   tools/libxc/xc_sr_save.c                 |  100 ++-
>   tools/libxc/xc_sr_stream_format.h        |   31 +-
>   tools/libxl/Makefile                     |    4 +
>   tools/libxl/libxl.c                      |   87 ++-
>   tools/libxl/libxl.h                      |   29 +-
>   tools/libxl/libxl_colo.h                 |  143 ++++
>   tools/libxl/libxl_colo_nic.c             |  320 +++++++++
>   tools/libxl/libxl_colo_proxy.c           |  277 ++++++++
>   tools/libxl/libxl_colo_qdisk.c           |  230 +++++++
>   tools/libxl/libxl_colo_restore.c         | 1087 ++++++++++++++++++++++++++++++
>   tools/libxl/libxl_colo_save.c            |  696 +++++++++++++++++++
>   tools/libxl/libxl_create.c               |   90 ++-
>   tools/libxl/libxl_device.c               |   11 +
>   tools/libxl/libxl_dm.c                   |  176 ++++-
>   tools/libxl/libxl_dom_save.c             |  103 +--
>   tools/libxl/libxl_internal.h             |  216 ++++--
>   tools/libxl/libxl_qmp.c                  |  106 +++
>   tools/libxl/libxl_remus_disk_drbd.c      |   38 +-
>   tools/libxl/libxl_save_callout.c         |   53 +-
>   tools/libxl/libxl_save_helper.c          |    8 +-
>   tools/libxl/libxl_save_msgs_gen.pl       |   13 +-
>   tools/libxl/libxl_sr_stream_format.h     |   11 +
>   tools/libxl/libxl_stream_read.c          |  106 ++-
>   tools/libxl/libxl_stream_write.c         |  100 ++-
>   tools/libxl/libxl_types.idl              |   11 +
>   tools/libxl/libxlu_disk_l.l              |   17 +
>   tools/libxl/xl.c                         |    3 +
>   tools/libxl/xl.h                         |    1 +
>   tools/libxl/xl_cmdimpl.c                 |  109 ++-
>   tools/libxl/xl_cmdtable.c                |    4 +-
>   tools/ocaml/libs/xl/xenlight_stubs.c     |    2 +-
>   tools/python/xen/migration/libxc.py      |   68 +-
>   tools/python/xen/migration/libxl.py      |    9 +
>   46 files changed, 4618 insertions(+), 374 deletions(-)
>   create mode 100644 docs/README.colo
>   create mode 100755 tools/hotplug/Linux/colo-proxy-setup
>   create mode 100644 tools/libxl/libxl_colo.h
>   create mode 100644 tools/libxl/libxl_colo_nic.c
>   create mode 100644 tools/libxl/libxl_colo_proxy.c
>   create mode 100644 tools/libxl/libxl_colo_qdisk.c
>   create mode 100644 tools/libxl/libxl_colo_restore.c
>   create mode 100644 tools/libxl/libxl_colo_save.c
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-31  2:28 ` Changlong Xie
@ 2016-03-31 14:22   ` Wei Liu
  2016-04-01  1:59     ` Changlong Xie
  0 siblings, 1 reply; 55+ messages in thread
From: Wei Liu @ 2016-03-31 14:22 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen devel,
	Anthony Perard, Dong Eddie, Gui Jianfeng, Shriram Rajagopalan,
	Yang Hongyang

On Thu, Mar 31, 2016 at 10:28:47AM +0800, Changlong Xie wrote:
> I've checked all patches in this thread after Ian's comments, it seems
> we can give A-B to p12, p14, p20, p23, p25, p26 now.
> 
> All in all, *all patches are acked-by*.
> 

Hello, can you rebase your branch on top of staging branch (minus the
two "DONT MERGE" patches) and post the git branch somewhere?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-03-31 14:22   ` Wei Liu
@ 2016-04-01  1:59     ` Changlong Xie
  2016-04-01 13:47       ` Ian Jackson
  0 siblings, 1 reply; 55+ messages in thread
From: Changlong Xie @ 2016-04-01  1:59 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Li Zhijian, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Ian Jackson,
	Yang Hongyang

On 03/31/2016 10:22 PM, Wei Liu wrote:
> On Thu, Mar 31, 2016 at 10:28:47AM +0800, Changlong Xie wrote:
>> I've checked all patches in this thread after Ian's comments, it seems
>> we can give A-B to p12, p14, p20, p23, p25, p26 now.
>>
>> All in all, *all patches are acked-by*.
>>
>
> Hello, can you rebase your branch on top of staging branch (minus the
> two "DONT MERGE" patches) and post the git branch somewhere?
>
> Wei.

Hi wei

https://github.com/Pating/xen/tree/changlox/colo_v14

Just add A-B for p12,p14,20,p23,p25,p26 in this version, no other changes.

*Note*, dont merge 
https://github.com/Pating/xen/commit/1ab693c8b64309da96aa6cac96dc480f4ba31d6a, 
i'll send out it separately

Thanks
	-Xie
>
>
> .
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-04-01  1:59     ` Changlong Xie
@ 2016-04-01 13:47       ` Ian Jackson
  2016-04-01 14:37         ` Changlong Xie
  0 siblings, 1 reply; 55+ messages in thread
From: Ian Jackson @ 2016-04-01 13:47 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

Changlong Xie writes ("Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service"):
> https://github.com/Pating/xen/tree/changlox/colo_v14
> 
> Just add A-B for p12,p14,20,p23,p25,p26 in this version, no other changes.
> 
> *Note*, dont merge 
> https://github.com/Pating/xen/commit/1ab693c8b64309da96aa6cac96dc480f4ba31d6a, 
> i'll send out it separately

Thanks.  I have pushed this to staging.

I found that it didn't build on i386.  I folded the change below into
"libxc/save: support COLO save".

Ian.

diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 3c0c86d..b861c7d 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -555,7 +555,7 @@ static int colo_merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
         pfn = pfns[i];
         if (pfn > ctx->save.p2m_size)
         {
-            PERROR("Invalid pfn %#lx", pfn );
+            PERROR("Invalid pfn 0x%" PRIpfn "", (unsigned long)pfn );
             rc = -1;
             goto err;
         }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
  2016-04-01 13:47       ` Ian Jackson
@ 2016-04-01 14:37         ` Changlong Xie
  0 siblings, 0 replies; 55+ messages in thread
From: Changlong Xie @ 2016-04-01 14:37 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen devel,
	Anthony Perard, Gui Jianfeng, Shriram Rajagopalan, Yang Hongyang

On 04/01/2016 09:47 PM, Ian Jackson wrote:
> Changlong Xie writes ("Re: [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service"):
>> https://github.com/Pating/xen/tree/changlox/colo_v14
>>
>> Just add A-B for p12,p14,20,p23,p25,p26 in this version, no other changes.
>>
>> *Note*, dont merge
>> https://github.com/Pating/xen/commit/1ab693c8b64309da96aa6cac96dc480f4ba31d6a,
>> i'll send out it separately
>
> Thanks.  I have pushed this to staging.od

Good news :)

>
> I found that it didn't build on i386.  I folded the change below into
> "libxc/save: support COLO save".
>

Thanks for your correction

> Ian.
>
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 3c0c86d..b861c7d 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -555,7 +555,7 @@ static int colo_merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
>           pfn = pfns[i];
>           if (pfn > ctx->save.p2m_size)
>           {
> -            PERROR("Invalid pfn %#lx", pfn );
> +            PERROR("Invalid pfn 0x%" PRIpfn "", (unsigned long)pfn );
>               rc = -1;
>               goto err;
>           }
>
>
> .
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back
  2016-03-25  6:44 ` [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back Changlong Xie
@ 2016-04-04 12:07   ` Olaf Hering
  2016-04-04 13:02     ` Wei Liu
  0 siblings, 1 reply; 55+ messages in thread
From: Olaf Hering @ 2016-04-04 12:07 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Lars Kurth, Li Zhijian, Wei Liu, Ian Campbell, Wen Congyang,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen devel,
	Shriram Rajagopalan, Dong Eddie, Gui Jianfeng, Anthony Perard,
	Yang Hongyang

On Fri, Mar 25, Changlong Xie wrote:

> +#elif defined(LIBXL_API_VERSION) && LIBXL_API_VERSION >= 0x040400 \
> +                                 && LIBXL_API_VERSION < 0x040700

Is this supposed to work? libvirt.git fails to build now:

libxl/libxl_domain.c: In function 'libxlDomainStart':
libxl/libxl_domain.c:1077: warning: passing argument 5 of 'libxl_domain_create_restore' makes integer from pointer without a cast
libxl/libxl_domain.c:1077: warning: passing argument 7 of 'libxl_domain_create_restore' from incompatible pointer type
libxl/libxl_domain.c:1077: error: too few arguments to function 'libxl_domain_create_restore'
make[3]: *** [libxl/libvirt_driver_libxl_impl_la-libxl_domain.lo] Error 1

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back
  2016-04-04 12:07   ` Olaf Hering
@ 2016-04-04 13:02     ` Wei Liu
  2016-04-04 15:29       ` Olaf Hering
  0 siblings, 1 reply; 55+ messages in thread
From: Wei Liu @ 2016-04-04 13:02 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Lars Kurth, Li Zhijian, Changlong Xie, Wei Liu, Ian Campbell,
	Wen Congyang, Andrew Cooper, Jiang Yunhong, Ian Jackson,
	xen devel, Shriram Rajagopalan, Jim Fehlig, Dong Eddie,
	Gui Jianfeng, Anthony Perard, Yang Hongyang

CC Jim

On Mon, Apr 04, 2016 at 02:07:28PM +0200, Olaf Hering wrote:
> On Fri, Mar 25, Changlong Xie wrote:
> 
> > +#elif defined(LIBXL_API_VERSION) && LIBXL_API_VERSION >= 0x040400 \
> > +                                 && LIBXL_API_VERSION < 0x040700
> 
> Is this supposed to work? libvirt.git fails to build now:
> 
> libxl/libxl_domain.c: In function 'libxlDomainStart':
> libxl/libxl_domain.c:1077: warning: passing argument 5 of 'libxl_domain_create_restore' makes integer from pointer without a cast
> libxl/libxl_domain.c:1077: warning: passing argument 7 of 'libxl_domain_create_restore' from incompatible pointer type
> libxl/libxl_domain.c:1077: error: too few arguments to function 'libxl_domain_create_restore'
> make[3]: *** [libxl/libvirt_driver_libxl_impl_la-libxl_domain.lo] Error 1
> 

From the look of it that's because libvirt doesn't have
LIBXL_API_VERSION defined before including libxl.h, so it always gets
the latest API.

The fix is to patch libvirt. Looking at libvirt code I think I need to
patch Makefile.in to pass in an explicit LIBXL_API_VERSION number.

Jim, does that sound right?

Wei.

> Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back
  2016-04-04 13:02     ` Wei Liu
@ 2016-04-04 15:29       ` Olaf Hering
  0 siblings, 0 replies; 55+ messages in thread
From: Olaf Hering @ 2016-04-04 15:29 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, Li Zhijian, Changlong Xie, Ian Campbell,
	Wen Congyang, Andrew Cooper, Jiang Yunhong, Ian Jackson,
	xen devel, Anthony Perard, Jim Fehlig, Dong Eddie, Gui Jianfeng,
	Shriram Rajagopalan, Yang Hongyang

On Mon, Apr 04, Wei Liu wrote:

> The fix is to patch libvirt. Looking at libvirt code I think I need to
> patch Makefile.in to pass in an explicit LIBXL_API_VERSION number.

That might be true.

But shouldnt at the same time libxl.h get a change to recognize
0x040700? Perhaps this will be part of the release management.

For the time being this change fixes it for me:

+++ libvirt.spec        (working copy)
@@ -1163,6 +1163,7 @@

 autoreconf -f -i
 export CFLAGS="$RPM_OPT_FLAGS"
+export CFLAGS="$CFLAGS -DLIBXL_API_VERSION=0x040500"
 %configure --disable-static --with-pic \
            %{?_without_xen} \
            %{?_without_qemu} \

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2016-04-04 15:29 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-25  6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
2016-03-25  6:44 ` [PATCH v13 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state Changlong Xie
2016-03-25  6:44 ` [PATCH v13 02/26] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Changlong Xie
2016-03-25  6:44 ` [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back Changlong Xie
2016-04-04 12:07   ` Olaf Hering
2016-04-04 13:02     ` Wei Liu
2016-04-04 15:29       ` Olaf Hering
2016-03-25  6:44 ` [PATCH v13 04/26] tools/libxl: Introduce new helper function dup_fd_helper() Changlong Xie
2016-03-25  6:44 ` [PATCH v13 05/26] tools/libx{l, c}: add back channel to libxc Changlong Xie
2016-03-25  6:44 ` [PATCH v13 06/26] docs: add colo readme Changlong Xie
2016-03-25  6:44 ` [PATCH v13 07/26] docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams Changlong Xie
2016-03-25  6:44 ` [PATCH v13 08/26] libxc/migration: Specification update for DIRTY_PFN_LIST records Changlong Xie
2016-03-25  6:44 ` [PATCH v13 09/26] libxc/migration: export read_record for common use Changlong Xie
2016-03-25  6:44 ` [PATCH v13 10/26] tools/libxl: add back channel support to write stream Changlong Xie
2016-03-25  6:44 ` [PATCH v13 11/26] tools/libxl: add back channel support to read stream Changlong Xie
2016-03-25  6:44 ` [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code Changlong Xie
2016-03-30 14:07   ` Ian Jackson
2016-03-25  6:44 ` [PATCH v13 13/26] libxl_internal: move stream read manipulations to right place Changlong Xie
2016-03-25  6:44 ` [PATCH v13 14/26] primary vm suspend/resume/checkpoint code Changlong Xie
2016-03-30 14:10   ` Ian Jackson
2016-03-25  6:44 ` [PATCH v13 15/26] libxc/restore: support COLO restore Changlong Xie
2016-03-25  6:44 ` [PATCH v13 16/26] libxc/save: support COLO save Changlong Xie
2016-03-25  6:44 ` [PATCH v13 17/26] implement the cmdline for COLO Changlong Xie
2016-03-25  6:44 ` [PATCH v13 18/26] COLO: introduce new API to prepare/start/do/get_error/stop replication Changlong Xie
2016-03-25  6:44 ` [PATCH v13 19/26] Introduce COLO mode and refactor relevant function Changlong Xie
2016-03-25  6:44 ` [PATCH v13 20/26] Support colo mode for qemu disk Changlong Xie
2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
2016-03-30 14:17     ` Ian Jackson
2016-03-30 14:36     ` Ian Jackson
2016-03-25  6:44 ` [PATCH v13 21/26] COLO: use qemu block replication Changlong Xie
2016-03-25  6:44 ` [PATCH v13 22/26] COLO proxy: implement setup/teardown/preresume/postresume/checkpoint Changlong Xie
2016-03-25  6:44 ` [PATCH v13 23/26] COLO nic: implement COLO nic subkind Changlong Xie
2016-03-25 12:56   ` Wei Liu
2016-03-28  3:46   ` [PATCH v13.1 " Changlong Xie
2016-03-30 14:22     ` Ian Jackson
2016-03-30 14:38     ` Ian Jackson
2016-03-30 14:40       ` Ian Jackson
2016-03-25  6:44 ` [PATCH v13 24/26] setup and control colo proxy on primary side Changlong Xie
2016-03-25  6:44 ` [PATCH v13 25/26] setup and control colo proxy on secondary side Changlong Xie
2016-03-30 14:24   ` Ian Jackson
2016-03-31  2:19     ` Changlong Xie
2016-03-25  6:44 ` [PATCH v13 26/26] cmdline switches and config vars to control colo-proxy Changlong Xie
2016-03-28  3:47   ` [PATCH v13.1 " Changlong Xie
2016-03-30 14:28     ` Ian Jackson
2016-03-30 14:42     ` Ian Jackson
2016-03-25 15:51 ` [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wei Liu
2016-03-28  3:52   ` Changlong Xie
2016-03-30 14:52     ` Ian Jackson
2016-03-30 14:50 ` Ian Jackson
2016-03-31  1:26   ` Wen Congyang
2016-03-31  2:28 ` Changlong Xie
2016-03-31 14:22   ` Wei Liu
2016-04-01  1:59     ` Changlong Xie
2016-04-01 13:47       ` Ian Jackson
2016-04-01 14:37         ` Changlong Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).