xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
To: xen devel <xen-devel@lists.xen.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>
Cc: Lars Kurth <lars.kurth@citrix.com>,
	Changlong Xie <xiecl.fnst@cn.fujitsu.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Li Zhijian <lizhijian@cn.fujitsu.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	Anthony Perard <anthony.perard@citrix.com>,
	Shriram Rajagopalan <rshriram@cs.ubc.ca>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Yang Hongyang <hongyang.yang@easystack.cn>
Subject: [PATCH v12 05/26] tools/libx{l, c}: add back channel to libxc
Date: Wed, 23 Mar 2016 16:06:19 +0800	[thread overview]
Message-ID: <1458720400-4699-6-git-send-email-xiecl.fnst@cn.fujitsu.com> (raw)
In-Reply-To: <1458720400-4699-1-git-send-email-xiecl.fnst@cn.fujitsu.com>

From: Wen Congyang <wency@cn.fujitsu.com>

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent from the
primary to the secondary.

However, the set difference B - A (the one in B but not in A, lets
call this C) is out-of-date on the secondary (with respect to the
primary) and will not be sent by the primary (to secondary), as it
was not memory dirtied by the primary. The secondary needs C page data
to reconstruct an exact copy of the primary at the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

We transfer the dirty bitmap on libxc side, so we need to introduce back
channel to libxc.

Note: it is different from the paper. We change the original design to
the current one, according to our following concerns:
1. The original design needs extra memory on Secondary host. When there's
   multiple backups on one host, the memory cost is high.
2. The memory cache code will be another 1k+, it will make the review
   more time consuming.

Note: this patch merely adds new parameters to various prototypes and
functions. The new parameters are used in later patch called
"libxc/restore: send dirty pfn list to primary when checkpoint under
COLO".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/include/xenguest.h   |  4 ++--
 tools/libxc/xc_nomigrate.c       |  4 ++--
 tools/libxc/xc_sr_restore.c      |  2 +-
 tools/libxc/xc_sr_save.c         |  2 +-
 tools/libxl/libxl_save_callout.c | 21 +++++++++++++++------
 tools/libxl/libxl_save_helper.c  |  8 ++++++--
 6 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 4f0b06e..b4f4bfb 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -93,7 +93,7 @@ typedef enum {
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
                    struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type);
+                   xc_migration_stream_t stream_type, int recv_fd);
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
@@ -132,7 +132,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       xc_migration_stream_t stream_type,
-                      struct restore_callbacks *callbacks);
+                      struct restore_callbacks *callbacks, int send_back_fd);
 
 /**
  * This function will create a domain for a paravirtualized Linux
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 08e1f8c..15c838f 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -23,7 +23,7 @@
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type)
+                   xc_migration_stream_t stream_type, int recv_fd)
 {
     errno = ENOSYS;
     return -1;
@@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       xc_migration_stream_t stream_type,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int send_back_fd)
 {
     errno = ENOSYS;
     return -1;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 819401d..2b9a0ea 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -726,7 +726,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_gfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       xc_migration_stream_t stream_type,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int send_back_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 388ae7f..1ccdbbb 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -830,7 +830,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
                    uint32_t max_iters, uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   xc_migration_stream_t stream_type)
+                   xc_migration_stream_t stream_type, int recv_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 06967df..f15c235 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -27,7 +27,7 @@
  */
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
                        const char *mode_arg,
-                       int stream_fd,
+                       int stream_fd, int back_channel_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums);
 
@@ -50,6 +50,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     /* Convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     const int restore_fd = dcs->libxc_fd;
+    const int send_back_fd = dcs->send_back_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
 
     unsigned cbflags =
@@ -71,7 +72,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     shs->caller_state = dcs;
     shs->need_results = 1;
 
-    run_helper(egc, shs, "--restore-domain", restore_fd, 0, 0,
+    run_helper(egc, shs, "--restore-domain", restore_fd, send_back_fd, 0, 0,
                argnums, ARRAY_SIZE(argnums));
 }
 
@@ -95,7 +96,7 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
     shs->caller_state = dss;
     shs->need_results = 0;
 
-    run_helper(egc, shs, "--save-domain", dss->fd,
+    run_helper(egc, shs, "--save-domain", dss->fd, dss->recv_fd,
                NULL, 0,
                argnums, ARRAY_SIZE(argnums));
     return;
@@ -141,12 +142,14 @@ static int dup_cloexec(libxl__gc *gc, int fd, const char *what)
  * 1) Path to libxl-save-helper.
  * 2) --[restore|save]-domain.
  * 3) stream file descriptor.
+ * 4) back channel file descriptor.
  * n) save/restore specific parameters.
- * 4) A \0 at the end.
+ * 5) A \0 at the end.
  */
-#define HELPER_NR_ARGS 4
+#define HELPER_NR_ARGS 5
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
-                       const char *mode_arg, int stream_fd,
+                       const char *mode_arg,
+                       int stream_fd, int back_channel_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums)
 {
@@ -179,6 +182,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC_BIN "/" "libxl-save-helper";
     *arg++ = mode_arg;
     const char **stream_fd_arg = arg++;
+    const char **back_channel_fd_arg = arg++;
     for (i=0; i<num_argnums; i++)
         *arg++ = GCSPRINTF("%lu", argnums[i]);
     *arg++ = 0;
@@ -206,6 +210,11 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
         stream_fd = dup_cloexec(gc, stream_fd, "migration stream fd");
         *stream_fd_arg = GCSPRINTF("%d", stream_fd);
 
+        if (back_channel_fd >= 0)
+            back_channel_fd = dup_cloexec(gc, back_channel_fd,
+                                          "migration back channel fd");
+        *back_channel_fd_arg = GCSPRINTF("%d", back_channel_fd);
+
         for (i=0; i<num_preserve_fds; i++)
             if (preserve_fds[i] >= 0) {
                 assert(preserve_fds[i] > 2);
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 0fd7022..5fe642a 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -238,6 +238,7 @@ static struct restore_callbacks helper_restore_callbacks;
 int main(int argc, char **argv)
 {
     int r;
+    int send_back_fd, recv_fd;
 
 #define NEXTARG (++argv, assert(*argv), *argv)
 
@@ -247,6 +248,7 @@ int main(int argc, char **argv)
     if (!strcmp(mode,"--save-domain")) {
 
         io_fd =                             atoi(NEXTARG);
+        recv_fd =                           atoi(NEXTARG);
         uint32_t dom =                      strtoul(NEXTARG,0,10);
         uint32_t max_iters =                strtoul(NEXTARG,0,10);
         uint32_t max_factor =               strtoul(NEXTARG,0,10);
@@ -262,12 +264,14 @@ int main(int argc, char **argv)
         setup_signals(save_signal_handler);
 
         r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-                           &helper_save_callbacks, hvm, stream_type);
+                           &helper_save_callbacks, hvm, stream_type,
+                           recv_fd);
         complete(r);
 
     } else if (!strcmp(mode,"--restore-domain")) {
 
         io_fd =                             atoi(NEXTARG);
+        send_back_fd =                      atoi(NEXTARG);
         uint32_t dom =                      strtoul(NEXTARG,0,10);
         unsigned store_evtchn =             strtoul(NEXTARG,0,10);
         domid_t store_domid =               strtoul(NEXTARG,0,10);
@@ -292,7 +296,7 @@ int main(int argc, char **argv)
                               store_domid, console_evtchn, &console_mfn,
                               console_domid, hvm, pae, superpages,
                               stream_type,
-                              &helper_restore_callbacks);
+                              &helper_restore_callbacks, send_back_fd);
         helper_stub_restore_results(store_mfn,console_mfn,0);
         complete(r);
 
-- 
1.9.3




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-03-23  8:06 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-23  8:06 [PATCH v12 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
2016-03-23  8:06 ` [PATCH v12 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state Changlong Xie
2016-03-23  8:06 ` [PATCH v12 02/26] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Changlong Xie
2016-03-23  8:06 ` [PATCH v12 03/26] tools/libxl: Add back channel to allow migration target send data back Changlong Xie
2016-03-23  8:06 ` [PATCH v12 04/26] tools/libxl: Introduce new helper function dup_fd_helper() Changlong Xie
2016-03-23  8:06 ` Changlong Xie [this message]
2016-03-23  8:06 ` [PATCH v12 06/26] docs: add colo readme Changlong Xie
2016-03-23  8:06 ` [PATCH v12 07/26] docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams Changlong Xie
2016-03-24 14:53   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 08/26] libxc/migration: Specification update for DIRTY_PFN_LIST records Changlong Xie
2016-03-24 14:56   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 09/26] libxc/migration: export read_record for common use Changlong Xie
2016-03-23  8:06 ` [PATCH v12 10/26] tools/libxl: add back channel support to write stream Changlong Xie
2016-03-24 16:49   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 11/26] tools/libxl: add back channel support to read stream Changlong Xie
2016-03-24 14:57   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 12/26] secondary vm suspend/resume/checkpoint code Changlong Xie
2016-03-24 15:15   ` Ian Jackson
2016-03-25  2:00     ` Changlong Xie
2016-03-23  8:06 ` [PATCH v12 13/26] libxl_internal: move stream read manipulations to right place Changlong Xie
2016-03-24 15:17   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 14/26] primary vm suspend/resume/checkpoint code Changlong Xie
2016-03-24 15:24   ` Ian Jackson
2016-03-25  2:00     ` Changlong Xie
2016-03-25  6:33     ` Changlong Xie
2016-03-23  8:06 ` [PATCH v12 15/26] libxc/restore: support COLO restore Changlong Xie
2016-03-24 15:27   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 16/26] libxc/save: support COLO save Changlong Xie
2016-03-24 15:28   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 17/26] implement the cmdline for COLO Changlong Xie
2016-03-24 15:34   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 18/26] COLO: introduce new API to prepare/start/do/get_error/stop replication Changlong Xie
2016-03-23  8:06 ` [PATCH v12 19/26] Introduce COLO mode and refactor relevant function Changlong Xie
2016-03-24 15:45   ` Ian Jackson
2016-03-25  2:02     ` Changlong Xie
2016-03-23  8:06 ` [PATCH v12 20/26] Support colo mode for qemu disk Changlong Xie
2016-03-23  8:06 ` [PATCH v12 21/26] COLO: use qemu block replication Changlong Xie
2016-03-24 15:54   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 22/26] COLO proxy: implement setup/teardown/preresume/postresume/checkpoint Changlong Xie
2016-03-24 15:59   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 23/26] COLO nic: implement COLO nic subkind Changlong Xie
2016-03-24 16:05   ` Ian Jackson
2016-03-25  2:29     ` Changlong Xie
2016-03-25  6:09     ` Changlong Xie
2016-03-25 12:23       ` Wei Liu
2016-03-28  3:20         ` Changlong Xie
2016-03-23  8:06 ` [PATCH v12 24/26] setup and control colo proxy on primary side Changlong Xie
2016-03-24 16:06   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 25/26] setup and control colo proxy on secondary side Changlong Xie
2016-03-24 16:06   ` Ian Jackson
2016-03-23  8:06 ` [PATCH v12 26/26] cmdline switches and config vars to control colo-proxy Changlong Xie
2016-03-24 16:12   ` Ian Jackson
2016-03-25  2:57     ` Changlong Xie
2016-03-25  6:10     ` Changlong Xie
2016-03-25 12:29       ` Wei Liu
2016-03-28  3:21         ` Changlong Xie
2016-03-24 16:21 ` [PATCH v12 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Ian Jackson
2016-03-24 16:43   ` Lars Kurth
2016-03-24 17:06   ` Wei Liu
2016-03-24 17:07     ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1458720400-4699-6-git-send-email-xiecl.fnst@cn.fujitsu.com \
    --to=xiecl.fnst@cn.fujitsu.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lars.kurth@citrix.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=wei.liu2@citrix.com \
    --cc=wency@cn.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).