All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10 V7] Remus/Libxl: Network buffering support
@ 2014-02-10  9:19   ` Lai Jiangshan
  2014-02-10  9:19     ` [PATCH 01/10 V7] remus: add libnl3 dependency to autoconf scripts Lai Jiangshan
                       ` (11 more replies)
  0 siblings, 12 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

This patch series adds support for network buffering in the Remus
codebase in libxl. 

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks
Lai



Shriram Rajagopalan (8):
  remus: add libnl3 dependency to autoconf scripts
  tools/libxl: update libxl_domain_remus_info
  tools/libxl: introduce a new structure libxl__remus_state
  remus: introduce a function to check whether network buffering is
    enabled
  remus: Remus network buffering core and APIs to setup/teardown
  remus: implement the APIs to buffer/release packages
  libxl: use the APIs to setup/teardown network buffering
  libxl: rename remus_failover_cb() to remus_replication_failure_cb()
  libxl: control network buffering in remus callbacks
  libxl: network buffering cmdline switch

 README                                 |    4 +
 config/Tools.mk.in                     |    3 +
 docs/man/xl.conf.pod.5                 |    6 +
 docs/man/xl.pod.1                      |   11 +-
 docs/misc/xenstore-paths.markdown      |    4 +
 tools/configure.ac                     |   15 +
 tools/hotplug/Linux/Makefile           |    1 +
 tools/hotplug/Linux/remus-netbuf-setup |  183 +++++++++++
 tools/libxl/Makefile                   |   11 +
 tools/libxl/libxl.c                    |   48 ++-
 tools/libxl/libxl.h                    |   13 +
 tools/libxl/libxl_dom.c                |  118 ++++++--
 tools/libxl/libxl_internal.h           |   54 +++-
 tools/libxl/libxl_netbuffer.c          |  561 ++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |   56 ++++
 tools/libxl/libxl_remus.c              |   64 ++++
 tools/libxl/libxl_types.idl            |    2 +
 tools/libxl/xl.c                       |    4 +
 tools/libxl/xl.h                       |    1 +
 tools/libxl/xl_cmdimpl.c               |   28 ++-
 tools/libxl/xl_cmdtable.c              |    3 +
 tools/remus/README                     |    6 +
 22 files changed, 1155 insertions(+), 41 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus.c

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 01/10 V7] remus: add libnl3 dependency to autoconf scripts
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-02-10  9:19     ` [PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info Lai Jiangshan
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Dong Eddie,
	Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 README               |    4 ++++
 config/Tools.mk.in   |    3 +++
 tools/configure.ac   |   15 +++++++++++++++
 tools/libxl/Makefile |    2 ++
 tools/remus/README   |    6 ++++++
 5 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/README b/README
index 4148a26..7bb25fb 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
     * cmake (if building vtpm stub domains)
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index d9d3239..81802b3 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -38,6 +38,8 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -56,6 +58,7 @@ CONFIG_QEMU_TRAD    := @qemu_traditional@
 CONFIG_QEMU_XEN     := @qemu_xen@
 CONFIG_XEND         := @xend@
 CONFIG_BLKTAP1      := @blktap1@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff --git a/tools/configure.ac b/tools/configure.ac
index 0754f0e..f95956d 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -236,6 +236,21 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+		[libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+	    AC_MSG_WARN([Disabling support for Remus network buffering.
+	    Please install libnl3 libraries, command line tools and devel
+	    headers - version 3.2.8 or higher])
+	    AC_SUBST(remus_netbuf, [n])
+	    ],[
+	    AC_SUBST(LIBNL3_LIBS)
+	    AC_SUBST(LIBNL3_CFLAGS)
+	    AC_SUBST(remus_netbuf, [y])
+])
+
 AC_OUTPUT()
 
 AS_IF([test "x$xend" = "xy" ], [
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index d8495bb..da27c84 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,13 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+LIBXL_LIBS += $(LIBNL3_LIBS)
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
diff --git a/tools/remus/README b/tools/remus/README
index 9e8140b..4736252 100644
--- a/tools/remus/README
+++ b/tools/remus/README
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://nss.cs.ubc.ca/remus/ for details.
+
+Using Remus with libxl on Xen 4.4 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
  2014-02-10  9:19     ` [PATCH 01/10 V7] remus: add libnl3 dependency to autoconf scripts Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 16:33       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state Lai Jiangshan
                       ` (9 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Add two members:
1. netbuf: whether netbuf is enabled
2. netbufscript: the path of the script which will be run to setup
     and tear down the guest's interface.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.h         |   13 +++++++++++++
 tools/libxl/libxl_types.idl |    2 ++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 12d6c31..d89ad0a 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -409,6 +409,19 @@
  */
 #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
 
+/*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
 /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be
  * called from within libxl itself. Callers outside libxl, who
  * do not #include libxl_internal.h, are fine. */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 649ce50..e49945a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -561,6 +561,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
  2014-02-10  9:19     ` [PATCH 01/10 V7] remus: add libnl3 dependency to autoconf scripts Lai Jiangshan
  2014-02-10  9:19     ` [PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 16:38       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled Lai Jiangshan
                       ` (8 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

libxl_domain_remus_info only contains the argument of the command
'xl remus'. So introduce a new structure libxl__remus_state to save
the remus state.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |   25 +++++++++++++++++++++++--
 tools/libxl/libxl_dom.c      |   12 ++++--------
 tools/libxl/libxl_internal.h |   22 ++++++++++++++++++++--
 3 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2845ca4..25af816 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -729,11 +729,32 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
-    dss->remus = info;
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    GCNEW(dss->remus_state);
+
+    /* convenience shorthand */
+    libxl__remus_state *remus_state = dss->remus_state;
+    remus_state->blackhole = info->blackhole;
+    remus_state->interval = info->interval;
+    remus_state->compression = info->compression;
+    remus_state->dss = dss;
+    libxl__ev_child_init(&remus_state->child);
+
+    /* TODO: enable disk buffering */
+
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (info->netbufscript) {
+            remus_state->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            remus_state->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                          libxl__xen_script_dir_path());
+        }
+    }
 
     /* Point of no return */
     libxl__domain_suspend(egc, dss);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 55f74b2..8d63f90 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1290,7 +1290,7 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
     /* REMUS TODO: Wait for disk and memory ack, release network buffer */
     /* REMUS TODO: make this asynchronous */
     assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    usleep(dss->remus_state->interval * 1000);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1308,7 +1308,6 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     const libxl_domain_type type = dss->type;
     const int live = dss->live;
     const int debug = dss->debug;
-    const libxl_domain_remus_info *const r_info = dss->remus;
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->shs.callbacks.save.a;
 
@@ -1343,11 +1342,8 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     dss->guest_responded = 0;
     dss->dm_savefile = libxl__device_model_savefile(gc, domid);
 
-    if (r_info != NULL) {
-        dss->interval = r_info->interval;
-        if (r_info->compression)
-            dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
-    }
+    if (dss->remus_state && dss->remus_state->compression)
+        dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
 
     dss->xce = xc_evtchn_open(NULL, 0);
     if (dss->xce == NULL)
@@ -1366,7 +1362,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     }
 
     memset(callbacks, 0, sizeof(*callbacks));
-    if (r_info != NULL) {
+    if (dss->remus_state != NULL) {
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1bd23ff..9970780 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2292,6 +2292,25 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+typedef struct libxl__remus_state {
+    /* filled by the user */
+    /* checkpoint interval */
+    int interval;
+    int blackhole;
+    int compression;
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
+    libxl__domain_suspend_state *dss;
+
+    /* private */
+    int saved_rc;
+    int dev_id;
+    /* Opaque context containing network buffer related stuff */
+    void *netbuf_state;
+    libxl__ev_time timeout;
+    libxl__ev_child child;
+} libxl__remus_state;
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2302,7 +2321,7 @@ struct libxl__domain_suspend_state {
     libxl_domain_type type;
     int live;
     int debug;
-    const libxl_domain_remus_info *remus;
+    libxl__remus_state *remus_state;
     /* private */
     xc_evtchn *xce; /* event channel handle */
     int suspend_eventchn;
@@ -2310,7 +2329,6 @@ struct libxl__domain_suspend_state {
     int xcflags;
     int guest_responded;
     const char *dm_savefile;
-    int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (2 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 16:39       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown Lai Jiangshan
                       ` (7 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
or returns 0 when network buffering is not compiled.

If network buffering is not compiled, and the user wants to use it, report
a error and exit.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/Makefile            |    7 +++++++
 tools/libxl/libxl.c             |    5 +++++
 tools/libxl/libxl_internal.h    |    2 ++
 tools/libxl/libxl_netbuffer.c   |   31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c |   31 +++++++++++++++++++++++++++++++
 5 files changed, 76 insertions(+), 0 deletions(-)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index da27c84..84a467c 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -45,6 +45,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 25af816..026206a 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -746,6 +746,11 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Setup network buffering */
     if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
         if (info->netbufscript) {
             remus_state->netbufscript =
                 libxl__strdup(gc, info->netbufscript);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9970780..2f64382 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2311,6 +2311,8 @@ typedef struct libxl__remus_state {
     libxl__ev_child child;
 } libxl__remus_state;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..8e23d75
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..6aa4bf1
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (3 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 17:44       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 06/10 V7] remus: implement the API to buffer/release packages Lai Jiangshan
                       ` (6 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

This patch introduces remus-netbuf-setup hotplug script responsible for
setting up and tearing down the necessary infrastructure required for
network output buffering in Remus.  This script is intended to be invoked
by libxl for each guest interface, when starting or stopping Remus.

Apart from returning success/failure indication via the usual hotplug
entries in xenstore, this script also writes to xenstore, the name of
the IFB device to be used to control the vif's network output.

The script relies on libnl3 command line utilities to perform various
setup/teardown functions. The script is confined to Linux platforms only
since NetBSD does not seem to have libnl3.

The following steps are taken during setup:
 a) call the hotplug script for each vif to setup its network buffer

 b) establish a dedicated remus context containing libnl related
    state (netlink sockets, qdisc caches, etc.,)

 c) Obtain handles to plug qdiscs installed on the IFB devices
    chosen by the hotplug scripts.

And during teardown, the netlink resources are released, followed by
invocation of hotplug scripts to remove the IFB devices.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/misc/xenstore-paths.markdown      |    4 +
 tools/hotplug/Linux/Makefile           |    1 +
 tools/hotplug/Linux/remus-netbuf-setup |  183 ++++++++++++
 tools/libxl/Makefile                   |    2 +
 tools/libxl/libxl_dom.c                |    7 +-
 tools/libxl/libxl_internal.h           |   17 ++
 tools/libxl/libxl_netbuffer.c          |  481 ++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |   11 +
 tools/libxl/libxl_remus.c              |   41 +++
 9 files changed, 742 insertions(+), 5 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_remus.c

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 70ab7f4..7a0d2c9 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+IFB device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 47655f6..6139c1f 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += network-nat vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..3467db2
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,183 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB         ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        local installed=`nl-qdisc-list -d $ifb`
+        [ -n "$installed" ] && continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$IFB"
+        add_plug_qdisc "$vifname" "$IFB"
+        release_lock "pickifb"
+
+        #not using xenstore_write that automatically exits on error
+        #because we need to cleanup
+        _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+        success
+        ;;
+    teardown)
+        : ${IFB:?}
+        teardown_netbuf "$vifname" "$IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 84a467c..218f55e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -52,6 +52,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 8d63f90..e3e9f6f 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -753,9 +753,6 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 
 /*==================== Domain suspend (save) ====================*/
 
-static void domain_suspend_done(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss, int rc);
-
 /*----- complicated callback, called by xc_domain_save -----*/
 
 /*
@@ -1508,8 +1505,8 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
-static void domain_suspend_done(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss, int rc)
+void domain_suspend_done(libxl__egc *egc,
+                         libxl__domain_suspend_state *dss, int rc)
 {
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2f64382..4006174 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2313,6 +2313,23 @@ typedef struct libxl__remus_state {
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
+_hidden void domain_suspend_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss,
+                                 int rc);
+
+_hidden void libxl__remus_setup_done(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     int rc);
+
+_hidden void libxl__remus_netbuf_setup(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_teardown_done(libxl__egc *egc,
+                                        libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                          libxl__domain_suspend_state *dss);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 8e23d75..2c77076 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,492 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_state {
+    struct rtnl_qdisc **netbuf_qdisc_list;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+    const char **vif_list;
+    const char **ifb_list;
+    uint32_t num_netbufs;
+    uint32_t unused;
+} libxl__remus_netbuf_state;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static const char *get_vifname(libxl__gc *gc, uint32_t domid,
+                               libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        /* use the default name */
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static const char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
+                                       int *num_vifs)
+{
+    libxl_device_nic *nics = NULL;
+    int nb, i = 0;
+    const char **vif_list = NULL;
+
+    *num_vifs = 0;
+    nics = libxl_device_nic_list(CTX, domid, &nb);
+    if (!nics)
+        return NULL;
+
+    /* Ensure that none of the vifs are backed by driver domains */
+    for (i = 0; i < nb; i++) {
+        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
+            const char *vifname = get_vifname(gc, domid, &nics[i]);
+
+            if (!vifname)
+              vifname = "(unknown)";
+            LOG(ERROR, "vif %s has driver domain (%u) as its backend. "
+                "Network buffering is not supported with driver domains",
+                vifname, nics[i].backend_domid);
+            *num_vifs = -1;
+            goto out;
+        }
+    }
+
+    GCNEW_ARRAY(vif_list, nb);
+    for (i = 0; i < nb; ++i) {
+        vif_list[i] = get_vifname(gc, domid, &nics[i]);
+        if (!vif_list[i]) {
+            vif_list = NULL;
+            goto out;
+        }
+    }
+    *num_vifs = nb;
+
+ out:
+    for (i = 0; i < nb; i++)
+        libxl_device_nic_dispose(&nics[i]);
+    free(nics);
+    return vif_list;
+}
+
+static void free_qdiscs(libxl__remus_netbuf_state *netbuf_state)
+{
+    int i;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    /* free qdiscs */
+    for (i = 0; i < netbuf_state->num_netbufs; i++) {
+        qdisc = netbuf_state->netbuf_qdisc_list[i];
+        if (!qdisc)
+            break;
+
+        nl_object_put((struct nl_object *)qdisc);
+        netbuf_state->netbuf_qdisc_list[i] = NULL;
+    }
+
+    /* free qdisc cache */
+    if (netbuf_state->qdisc_cache) {
+      nl_cache_clear(netbuf_state->qdisc_cache);
+      nl_cache_free(netbuf_state->qdisc_cache);
+      netbuf_state->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (netbuf_state->nlsock) {
+      nl_close(netbuf_state->nlsock);
+      nl_socket_free(netbuf_state->nlsock);
+      netbuf_state->nlsock = NULL;
+    }
+}
+
+static int init_qdiscs(libxl__gc *gc,
+                       libxl__remus_state *remus_state)
+{
+    int i, ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state * const netbuf_state = remus_state->netbuf_state;
+    const int num_netbufs = netbuf_state->num_netbufs;
+    const char ** const ifb_list = netbuf_state->ifb_list;
+
+    /* Now that we have brought up IFB devices with plug qdisc for
+     * each vif, lets get a netlink handle on the plug qdisc for use
+     * during checkpointing.
+     */
+    netbuf_state->nlsock = nl_socket_alloc();
+    if (!netbuf_state->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        goto out;
+    }
+
+    ret = nl_connect(netbuf_state->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(netbuf_state->nlsock,
+                                 &netbuf_state->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        goto out;
+    }
+
+    /* list of handles to plug qdiscs */
+    GCNEW_ARRAY(netbuf_state->netbuf_qdisc_list, num_netbufs);
+
+    for (i = 0; i < num_netbufs; ++i) {
+
+        /* get a handle to the IFB interface */
+        ifb = NULL;
+        ret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+                                   ifb_list[i], &ifb);
+        if (ret) {
+            LOG(ERROR, "cannot obtain handle for %s: %s", ifb_list[i],
+                nl_geterror(ret));
+            goto out;
+        }
+
+        ifindex = rtnl_link_get_ifindex(ifb);
+        if (!ifindex) {
+            LOG(ERROR, "interface %s has no index", ifb_list[i]);
+            goto out;
+        }
+
+        /* Get a reference to the root qdisc installed on the IFB, by
+         * querying the qdisc list we obtained earlier. The netbufscript
+         * sets up the plug qdisc as the root qdisc, so we don't have to
+         * search the entire qdisc tree on the IFB dev.
+
+         * There is no need to explicitly free this qdisc as its just a
+         * reference from the qdisc cache we allocated earlier.
+         */
+        qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+                                         TC_H_ROOT);
+
+        if (qdisc) {
+            const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+            /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+            if (!tc_kind || strcmp(tc_kind, "plug")) {
+                nl_object_put((struct nl_object *)qdisc);
+                LOG(ERROR, "plug qdisc is not installed on %s", ifb_list[i]);
+                goto out;
+            }
+            netbuf_state->netbuf_qdisc_list[i] = qdisc;
+        } else {
+            LOG(ERROR, "Cannot get qdisc handle from ifb %s", ifb_list[i]);
+            goto out;
+        }
+        rtnl_link_put(ifb);
+    }
+
+    return 0;
+
+ out:
+    if (ifb)
+        rtnl_link_put(ifb);
+    free_qdiscs(netbuf_state);
+    return ERROR_FAIL;
+}
+
+static void netbuf_setup_timeout_cb(libxl__egc *egc,
+                                    libxl__ev_time *ev,
+                                    const struct timeval *requested_abs)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(ev, *remus_state, timeout);
+
+    /* Convenience aliases */
+    const int devid = remus_state->dev_id;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const char *const vif = netbuf_state->vif_list[devid];
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
+    assert(libxl__ev_child_inuse(&remus_state->child));
+
+    LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout",
+        remus_state->netbufscript, vif);
+
+    if (kill(remus_state->child.pid, SIGKILL)) {
+        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
+              remus_state->netbufscript,
+              (unsigned long)remus_state->child.pid);
+    }
+
+    return;
+}
+
+/* the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static int exec_netbuf_script(libxl__gc *gc, libxl__remus_state *remus_state,
+                              char *op, libxl__ev_child_callback *death)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    pid_t pid;
+
+    /* Convenience aliases */
+    libxl__ev_child *const child = &remus_state->child;
+    libxl__ev_time *const timeout = &remus_state->timeout;
+    char *const script = libxl__strdup(gc, remus_state->netbufscript);
+    const uint32_t domid = remus_state->dss->domid;
+    const int devid = remus_state->dev_id;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const char *const vif = netbuf_state->vif_list[devid];
+    const char *const ifb = netbuf_state->ifb_list[devid];
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), devid);
+    if (!strcmp(op, "teardown")) {
+        env[nr++] = "IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    /* Set hotplug timeout */
+    if (libxl__ev_time_register_rel(gc, timeout,
+                                    netbuf_setup_timeout_cb,
+                                    LIBXL_HOTPLUG_TIMEOUT * 1000)) {
+        LOG(ERROR, "unable to register timeout for "
+            "netbuf setup script %s on vif %s", script, vif);
+        return ERROR_FAIL;
+    }
+
+    LOG(DEBUG, "Calling netbuf script: %s %s on vif %s",
+        script, op, vif);
+
+    /* Fork and exec netbuf script */
+    pid = libxl__ev_child_fork(gc, child, death);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork netbuf script %s", script);
+        return ERROR_FAIL;
+    }
+
+    if (!pid) {
+        /* child: Launch netbuf script */
+        libxl__exec(gc, -1, -1, -1, args[0], args, env);
+        /* notreached */
+        abort();
+    }
+
+    return 0;
+}
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__ev_child *child,
+                                   pid_t pid, int status)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(child, *remus_state, child);
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc = ERROR_FAIL;
+
+    /* Convenience aliases */
+    const uint32_t domid = remus_state->dss->domid;
+    const int devid = remus_state->dev_id;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const char *const vif = netbuf_state->vif_list[devid];
+    const char **const ifb = &netbuf_state->ifb_list[devid];
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            remus_state->netbufscript,
+            netbuf_state->vif_list[devid], hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      remus_state->netbufscript,
+                                      pid, status);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc)
+        goto out;
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    remus_state->dev_id++;
+    if (remus_state->dev_id < netbuf_state->num_netbufs) {
+        rc = exec_netbuf_script(gc, remus_state,
+                                "setup", netbuf_setup_script_cb);
+        if (rc)
+            goto out;
+
+        return;
+    }
+
+    rc = init_qdiscs(gc, remus_state);
+ out:
+    libxl__remus_setup_done(egc, remus_state->dss, rc);
+}
+
+/* Scan through the list of vifs belonging to domid and
+ * invoke the netbufscript to setup the IFB device & plug qdisc
+ * for each vif. Then scan through the list of IFB devices to obtain
+ * a handle on the plug qdisc installed on these IFB devices.
+ * Network output buffering is controlled via these qdiscs.
+ */
+void libxl__remus_netbuf_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+    libxl__remus_netbuf_state *netbuf_state = NULL;
+    int num_netbufs = 0;
+    int rc = ERROR_FAIL;
+
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+    libxl__remus_state *const remus_state = dss->remus_state;
+
+    STATE_AO_GC(dss->ao);
+
+    GCNEW(netbuf_state);
+    netbuf_state->vif_list = get_guest_vif_list(gc, domid, &num_netbufs);
+    if (!num_netbufs) {
+        rc = 0;
+        goto out;
+    }
+
+    if (num_netbufs < 0) goto out;
+
+    GCNEW_ARRAY(netbuf_state->ifb_list, num_netbufs);
+    netbuf_state->num_netbufs = num_netbufs;
+    remus_state->netbuf_state = netbuf_state;
+    remus_state->dev_id = 0;
+    if (exec_netbuf_script(gc, remus_state, "setup",
+                           netbuf_setup_script_cb))
+        goto out;
+    return;
+
+ out:
+    libxl__remus_setup_done(egc, dss, rc);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(child, *remus_state, child);
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      remus_state->netbufscript,
+                                      pid, status);
+    }
+
+    remus_state->dev_id++;
+    if (remus_state->dev_id < netbuf_state->num_netbufs) {
+        if (exec_netbuf_script(gc, remus_state,
+                               "teardown", netbuf_teardown_script_cb))
+            goto out;
+        return;
+    }
+
+ out:
+    libxl__remus_teardown_done(egc, remus_state->dss);
+}
+
+/* Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+    /* Convenience aliases */
+    libxl__remus_state *const remus_state = dss->remus_state;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+
+    STATE_AO_GC(dss->ao);
+
+    free_qdiscs(netbuf_state);
+
+    remus_state->dev_id = 0;
+    if (exec_netbuf_script(gc, remus_state, "teardown",
+                           netbuf_teardown_script_cb))
+        libxl__remus_teardown_done(egc, dss);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 6aa4bf1..559d0a6 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,17 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+/* Remus network buffer related stubs */
+void libxl__remus_netbuf_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+}
+
+void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
new file mode 100644
index 0000000..4e40412
--- /dev/null
+++ b/tools/libxl/libxl_remus.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*----- remus setup/teardown code -----*/
+
+void libxl__remus_setup_done(libxl__egc *egc,
+                             libxl__domain_suspend_state *dss,
+                             int rc)
+{
+    STATE_AO_GC(dss->ao);
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup network buffering"
+        " for guest with domid %u", dss->domid);
+    domain_suspend_done(egc, dss, rc);
+}
+
+void libxl__remus_teardown_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss)
+{
+    dss->callback(egc, dss, dss->remus_state->saved_rc);
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 06/10 V7] remus: implement the API to buffer/release packages
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (4 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 17:48       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering Lai Jiangshan
                       ` (5 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

This patch implements two APIs:
1. libxl__remus_netbuf_start_new_epoch()
   It marks a new epoch. The packages before this epoch will
   be flushed, and the packages after this epoch will be buffered.
   It will be called after the guest is suspended.
2. libxl__remus_netbuf_release_prev_epoch()
   It flushes the buffered packages to client, and it will be
   called when a checkpoint finishes.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h    |    6 ++++
 tools/libxl/libxl_netbuffer.c   |   49 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c |   14 +++++++++++
 3 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4006174..c13296b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2330,6 +2330,12 @@ _hidden void libxl__remus_teardown_done(libxl__egc *egc,
 _hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
                                           libxl__domain_suspend_state *dss);
 
+_hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                               libxl__remus_state *remus_state);
+
+_hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                                  libxl__remus_state *remus_state);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 2c77076..f358f4b 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -503,6 +503,55 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
         libxl__remus_teardown_done(egc, dss);
 }
 
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+static int remus_netbuf_op(libxl__gc *gc, uint32_t domid,
+                           libxl__remus_state *remus_state,
+                           int buffer_op)
+{
+    int i, ret;
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+
+    for (i = 0; i < netbuf_state->num_netbufs; ++i) {
+        if (buffer_op == tc_buffer_start)
+            ret = rtnl_qdisc_plug_buffer(netbuf_state->netbuf_qdisc_list[i]);
+        else
+            ret = rtnl_qdisc_plug_release_one(netbuf_state->netbuf_qdisc_list[i]);
+
+        if (!ret)
+            ret = rtnl_qdisc_add(netbuf_state->nlsock,
+                                 netbuf_state->netbuf_qdisc_list[i],
+                                 NLM_F_REQUEST);
+        if (ret) {
+            LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+                ((buffer_op == tc_buffer_start) ?
+                 "start_new_epoch" : "release_prev_epoch"),
+                netbuf_state->ifb_list[i], nl_geterror(ret));
+            return ERROR_FAIL;
+        }
+    }
+
+    return 0;
+}
+
+int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                        libxl__remus_state *remus_state)
+{
+    return remus_netbuf_op(gc, domid, remus_state, tc_buffer_start);
+}
+
+int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                           libxl__remus_state *remus_state)
+{
+    return remus_netbuf_op(gc, domid, remus_state, tc_buffer_release);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 559d0a6..92f35bc 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -33,6 +33,20 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
 {
 }
 
+int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                        libxl__remus_state *remus_state)
+{
+    LOG(ERROR, "Remus: No support for network buffering");
+    return ERROR_FAIL;
+}
+
+int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                           libxl__remus_state *remus_state)
+{
+    LOG(ERROR, "Remus: No support for network buffering");
+    return ERROR_FAIL;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (5 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 06/10 V7] remus: implement the API to buffer/release packages Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 17:51       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Lai Jiangshan
                       ` (4 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

If there is network buffering hotplug scripts, call
libxl__remus_netbuf_setup() to setup the network
buffering and libxl__remus_netbuf_teardown() to
teardown network buffering.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |    6 +-----
 tools/libxl/libxl_dom.c      |   11 +++++++++++
 tools/libxl/libxl_internal.h |    7 +++++++
 tools/libxl/libxl_remus.c    |   23 +++++++++++++++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 026206a..83d3772 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -762,7 +762,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     }
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_setup_initiate(egc, dss);
     return AO_INPROGRESS;
 
  out:
@@ -778,10 +778,6 @@ static void remus_failover_cb(libxl__egc *egc,
      * backup died or some network error occurred preventing us
      * from sending checkpoints.
      */
-
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e3e9f6f..912a6e4 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1519,6 +1519,17 @@ void domain_suspend_done(libxl__egc *egc,
     if (dss->xce != NULL)
         xc_evtchn_close(dss->xce);
 
+    if (dss->remus_state) {
+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        libxl__remus_teardown_initiate(egc, dss, rc);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c13296b..1bd2bba 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2336,6 +2336,13 @@ _hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
                                                   libxl__remus_state *remus_state);
 
+_hidden void libxl__remus_setup_initiate(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                            libxl__domain_suspend_state *dss,
+                                            int rc);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 4e40412..cdc1c16 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -19,6 +19,16 @@
 
 /*----- remus setup/teardown code -----*/
 
+void libxl__remus_setup_initiate(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss)
+{
+    libxl__ev_time_init(&dss->remus_state->timeout);
+    if (!dss->remus_state->netbufscript)
+        libxl__remus_setup_done(egc, dss, 0);
+    else
+        libxl__remus_netbuf_setup(egc, dss);
+}
+
 void libxl__remus_setup_done(libxl__egc *egc,
                              libxl__domain_suspend_state *dss,
                              int rc)
@@ -34,6 +44,19 @@ void libxl__remus_setup_done(libxl__egc *egc,
     domain_suspend_done(egc, dss, rc);
 }
 
+void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss,
+                                    int rc)
+{
+    /* stash rc somewhere before invoking teardown ops. */
+    dss->remus_state->saved_rc = rc;
+
+    if (!dss->remus_state->netbuf_state)
+        libxl__remus_teardown_done(egc, dss);
+    else
+        libxl__remus_netbuf_teardown(egc, dss);
+}
+
 void libxl__remus_teardown_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb()
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (6 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 17:52       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 09/10 V7] libxl: control network buffering in remus callbacks Lai Jiangshan
                       ` (3 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Failover means that: the machine on which primary vm is running is
down, and we need to start the secondary vm to take over the primary
vm. remus_failover_cb() is called when remus fails, not when we need
to do failover. So rename it to remus_replication_failure_cb()

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 83d3772..70e34c0 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -702,8 +702,9 @@ out:
     return ptr;
 }
 
-static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc);
+static void remus_replication_failure_cb(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss,
+                                         int rc);
 
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
@@ -722,7 +723,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     GCNEW(dss);
     dss->ao = ao;
-    dss->callback = remus_failover_cb;
+    dss->callback = remus_replication_failure_cb;
     dss->domid = domid;
     dss->fd = send_fd;
     /* TODO do something with recv_fd */
@@ -769,8 +770,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     return AO_ABORT(rc);
 }
 
-static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc)
+static void remus_replication_failure_cb(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss,
+                                         int rc)
 {
     STATE_AO_GC(dss->ao);
     /*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 09/10 V7] libxl: control network buffering in remus callbacks
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (7 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 17:54       ` Ian Jackson
  2014-02-10  9:19     ` [PATCH 10/10 V7] libxl: network buffering cmdline switch Lai Jiangshan
                       ` (2 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

This patch constitutes the core network buffering logic.
and does the following:
 a) create a new network buffer when the domain is suspended
    (remus_domain_suspend_callback)
 b) release the previous network buffer pertaining to the
    committed checkpoint (remus_domain_checkpoint_dm_saved)

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom.c |   90 ++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 82 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 912a6e4..a4ffdfd 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1243,8 +1243,30 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
 
 static int libxl__remus_domain_suspend_callback(void *data)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
-    return libxl__domain_suspend_common_callback(data);
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+
+    /* Convenience aliases */
+    libxl__remus_state *const remus_state = dss->remus_state;
+
+    STATE_AO_GC(dss->ao);
+
+    /* REMUS TODO: Issue disk checkpoint reqs. */
+    int ok = libxl__domain_suspend_common_callback(data);
+
+    if (!remus_state->netbuf_state || !ok) goto out;
+
+    /* The domain was suspended successfully. Start a new network
+     * buffer for the next epoch. If this operation fails, then act
+     * as though domain suspend failed -- libxc exits its infinite
+     * loop and ultimately, the replication stops.
+     */
+    if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid,
+                                            remus_state))
+        ok = 0;
+
+ out:
+    return ok;
 }
 
 static int libxl__remus_domain_resume_callback(void *data)
@@ -1257,7 +1279,7 @@ static int libxl__remus_domain_resume_callback(void *data)
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
         return 0;
 
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
+    /* REMUS TODO: Deal with disk. */
     return 1;
 }
 
@@ -1266,11 +1288,17 @@ static int libxl__remus_domain_resume_callback(void *data)
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
 
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
+
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
-    libxl__egc *egc = dss->shs.egc;
+
+    /* Convenience aliases */
+    libxl__egc *const egc = dss->shs.egc;
+
     STATE_AO_GC(dss->ao);
 
     /* This would go into tailbuf. */
@@ -1284,10 +1312,56 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->remus_state->interval * 1000);
+    /* Convenience aliases */
+    /*
+     * REMUS TODO: Wait for disk and explicit memory ack (through restore
+     * callback from remote) before releasing network buffer.
+     */
+    libxl__remus_state *const remus_state = dss->remus_state;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    if (remus_state->netbuf_state) {
+        rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
+                                                    remus_state);
+        if (rc) {
+            LOG(ERROR, "Failed to release network buffer."
+                " Terminating Remus..");
+            goto out;
+        }
+    }
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(gc, &remus_state->timeout,
+                                     remus_next_checkpoint,
+                                     dss->remus_state->interval);
+    if (rc) {
+        LOG(ERROR, "unable to register timeout for next epoch."
+            " Terminating Remus..");
+        goto out;
+    }
+    return;
+
+ out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(ev, *remus_state, timeout);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dss = remus_state->dss;
+
+    STATE_AO_GC(dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 10/10 V7] libxl: network buffering cmdline switch
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (8 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 09/10 V7] libxl: control network buffering in remus callbacks Lai Jiangshan
@ 2014-02-10  9:19     ` Lai Jiangshan
  2014-03-03 17:58       ` Ian Jackson
  2014-02-26  2:31     ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
  2014-02-26  2:53     ` [PATCH RFC] remus: implement remus replicated checkpointing disk Lai Jiangshan
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-10  9:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Command line switch to 'xl remus' command, to enable network buffering.
Pass on this flag to libxl so that it can act accordingly.
Also update man pages to reflect the addition of a new option to
'xl remus' command.

Note: the network buffering is enabled as default. If you want to
disable it, please use -n option.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5    |    6 ++++++
 docs/man/xl.pod.1         |   11 ++++++++++-
 tools/libxl/xl.c          |    4 ++++
 tools/libxl/xl.h          |    1 +
 tools/libxl/xl_cmdimpl.c  |   28 ++++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c |    3 +++
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index e7b9de2..3c5f246 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -399,7 +399,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     There is no support for disk buffering at the moment.
 
 B<OPTIONS>
 
@@ -418,6 +418,15 @@ Generally useful for debugging.
 
 Disable memory checkpoint compression.
 
+=item B<-n>
+
+Disable network output buffering.
+
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the instead of
+the default (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 657610b..e02a618 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -46,6 +46,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 
@@ -177,6 +178,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+                            &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index c876a33..d991fd3 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -153,6 +153,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index aff6f90..6d41775 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7265,8 +7265,9 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     r_info.blackhole = 0;
     r_info.compression = 1;
+    r_info.netbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7276,6 +7277,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         r_info.compression = 0;
         break;
+    case 'n':
+        r_info.netbuf = 0;
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7287,6 +7294,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if(!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (r_info.blackhole) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7324,13 +7334,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index ebe0220..9b7104c 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -481,6 +481,9 @@ struct cmd_spec cmd_table[] = {
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
+      "-n                      Disable network output buffering.\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH 00/10 V7] Remus/Libxl: Network buffering support
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (9 preceding siblings ...)
  2014-02-10  9:19     ` [PATCH 10/10 V7] libxl: network buffering cmdline switch Lai Jiangshan
@ 2014-02-26  2:31     ` Lai Jiangshan
  2014-02-26  2:53     ` [PATCH RFC] remus: implement remus replicated checkpointing disk Lai Jiangshan
  11 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-26  2:31 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Shriram Rajagopalan, Roger Pau Monne

Hi, Ian Campbell, Ian Jackson

Ping.
This patchset is ready for 4.5-unstable.
It went through several rounds and got decent reviews&comments.

Could you merge it for 4.5-unstable. Or any comments?

Thanks,
Lai

On 02/10/2014 05:19 PM, Lai Jiangshan wrote:
> This patch series adds support for network buffering in the Remus
> codebase in libxl. 
> 
> Changes in V7:
>   Applied missing comments(by IanJ).
>   Applied Shriram comments.
> 
>   merge netbufering tangled setup/teardown code into one patch.
>   (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
> 
> Changes in V6:
>   Applied Ian Jackson's comments of V5 series.
>   the [PATCH 2/4 V5] is split by small functionalities.
> 
>   [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
> 
> Changes in V5:
> 
> Merge hotplug script patch (2/5) and hotplug script setup/teardown
> patch (3/5) into a single patch.
> 
> Changes in V4:
> 
> [1/5] Remove check for libnl command line utils in autoconf checks
> 
> [2/5] minor nits
> 
> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
> 
> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
> 
> [5/5] minor nits
> 
> Changes in V3:
> [1/5] Fix redundant checks in configure scripts
>       (based on Ian Campbell's suggestions)
> 
> [2/5] Introduce locking in the script, during IFB setup.
>       Add xenstore paths used by netbuf scripts
>       to xenstore-paths.markdown
> 
> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
>       following IanJ's feedback.  However, the invocations are still
>       sequential. 
> 
> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
>       commmand.
> 
> And minor nits throughout the series based on feedback from
> the last version
> 
> Changes in V2:
> [1/5] Configure script will automatically enable/disable network
>       buffer support depending on the availability of the appropriate
>       libnl3 version. [If libnl3 is unavailable, a warning message will be
>       printed to let the user know that the feature has been disabled.]
> 
>       use macros from pkg.m4 instead of pkg-config commands
>       removed redundant checks for libnl3 libraries.
> 
> [3,4/5] - Minor nits.
> 
> Version 1:
> 
> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
>       to libxl Makefile.
> 
> [2/5] External script to setup/teardown network buffering using libnl3's
>       CLI. This script will be invoked by libxl before starting Remus.
>       The script's main job is to bring up an IFB device with plug qdisc
>       attached to it.  It then re-routes egress traffic from the guest's
>       vif to the IFB device.
> 
> [3/5] Libxl code to invoke the external setup script, followed by netlink
>       related setup to obtain a handle on the output buffers attached
>       to each vif.
> 
> [4/5] Libxl interaction with network buffer module in the kernel via
>       libnl3 API.
> 
> [5/5] xl cmdline switch to explicitly enable network buffering when
>       starting remus.
> 
> 
>   Few things to note(by shriram): 
> 
>     a) Based on previous email discussions, the setup/teardown task has
>     been moved to a hotplug style shell script which can be customized as
>     desired, instead of implementing it as C code inside libxl.
> 
>     b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
>    (Linux).  So I have made network buffering support an optional feature
>    so that it can be disabled if desired.
> 
>    c) NetBSD does not have libnl3. So I have put the setup script under
>    tools/hotplug/Linux folder.
> 
> thanks
> Lai
> 
> 
> 
> Shriram Rajagopalan (8):
>   remus: add libnl3 dependency to autoconf scripts
>   tools/libxl: update libxl_domain_remus_info
>   tools/libxl: introduce a new structure libxl__remus_state
>   remus: introduce a function to check whether network buffering is
>     enabled
>   remus: Remus network buffering core and APIs to setup/teardown
>   remus: implement the APIs to buffer/release packages
>   libxl: use the APIs to setup/teardown network buffering
>   libxl: rename remus_failover_cb() to remus_replication_failure_cb()
>   libxl: control network buffering in remus callbacks
>   libxl: network buffering cmdline switch
> 
>  README                                 |    4 +
>  config/Tools.mk.in                     |    3 +
>  docs/man/xl.conf.pod.5                 |    6 +
>  docs/man/xl.pod.1                      |   11 +-
>  docs/misc/xenstore-paths.markdown      |    4 +
>  tools/configure.ac                     |   15 +
>  tools/hotplug/Linux/Makefile           |    1 +
>  tools/hotplug/Linux/remus-netbuf-setup |  183 +++++++++++
>  tools/libxl/Makefile                   |   11 +
>  tools/libxl/libxl.c                    |   48 ++-
>  tools/libxl/libxl.h                    |   13 +
>  tools/libxl/libxl_dom.c                |  118 ++++++--
>  tools/libxl/libxl_internal.h           |   54 +++-
>  tools/libxl/libxl_netbuffer.c          |  561 ++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nonetbuffer.c        |   56 ++++
>  tools/libxl/libxl_remus.c              |   64 ++++
>  tools/libxl/libxl_types.idl            |    2 +
>  tools/libxl/xl.c                       |    4 +
>  tools/libxl/xl.h                       |    1 +
>  tools/libxl/xl_cmdimpl.c               |   28 ++-
>  tools/libxl/xl_cmdtable.c              |    3 +
>  tools/remus/README                     |    6 +
>  22 files changed, 1155 insertions(+), 41 deletions(-)
>  create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>  create mode 100644 tools/libxl/libxl_netbuffer.c
>  create mode 100644 tools/libxl/libxl_nonetbuffer.c
>  create mode 100644 tools/libxl/libxl_remus.c
> 
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
                       ` (10 preceding siblings ...)
  2014-02-26  2:31     ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
@ 2014-02-26  2:53     ` Lai Jiangshan
  2014-03-10 11:28       ` Ian Jackson
  2014-03-11 18:10       ` Shriram Rajagopalan
  11 siblings, 2 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-02-26  2:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, Roger Pau Monne

This patch implements remus replicated checkpointing disk.
It includes two parts:
  generic remus replicated checkpointing disks framework
  drbd replicated checkpointing disks
They will be split into different files in next round.

The patch is still simple due to disk-setup-teardown-script is
still under implementing. I need to use libxl_ao to implement it,
but libxl_ao is hard to use. The work sequence is needed to ugly split
to serveral callbacks like device_hotplug().

And becuase the remus disk script is unimplemented, the drbd_setup() code
can't check the disk now. So it just assumes the user config the disk correctly.

This patch is *UNTESTED*.
(there is a problem with xl&drbd(without remus) in my BOXes).

I request *comments* as many as possible.

Thanks,
Lai

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 tools/libxl/Makefile                      |    1 +
 tools/libxl/libxl_dom.c                   |   19 +++-
 tools/libxl/libxl_internal.h              |   10 ++
 tools/libxl/libxl_remus.c                 |    2 +
 tools/libxl/libxl_remus_replicated_disk.c |  219 +++++++++++++++++++++++++++++
 5 files changed, 249 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_replicated_disk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 218f55e..dbf5dd9 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -53,6 +53,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o
+LIBXL_OBJS-y += libxl_remus_replicated_disk.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a4ffdfd..858f5be 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1251,9 +1251,14 @@ static int libxl__remus_domain_suspend_callback(void *data)
 
     STATE_AO_GC(dss->ao);
 
-    /* REMUS TODO: Issue disk checkpoint reqs. */
     int ok = libxl__domain_suspend_common_callback(data);
 
+    /* Issue disk checkpoint reqs. */
+    if (libxl__remus_disks_postsuspend(remus_state)) {
+        ok = 0;
+        goto out;
+    }
+
     if (!remus_state->netbuf_state || !ok) goto out;
 
     /* The domain was suspended successfully. Start a new network
@@ -1279,7 +1284,10 @@ static int libxl__remus_domain_resume_callback(void *data)
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
         return 0;
 
-    /* REMUS TODO: Deal with disk. */
+    /* Deal with disk. */
+    if (libxl__remus_disks_preresume(dss->remus_state))
+        return 0;
+
     return 1;
 }
 
@@ -1326,6 +1334,13 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
         goto out;
     }
 
+    rc = libxl__remus_disks_commit(remus_state);
+    if (rc) {
+        LOG(ERROR, "Failed to commit disks state"
+            " Terminating Remus..");
+        goto out;
+    }
+
     if (remus_state->netbuf_state) {
         rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
                                                     remus_state);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1bd2bba..8933e5f 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2309,6 +2309,10 @@ typedef struct libxl__remus_state {
     void *netbuf_state;
     libxl__ev_time timeout;
     libxl__ev_child child;
+
+    /* remus disks state */
+    uint32_t nr_disks;
+    struct libxl__remus_disk **disks;
 } libxl__remus_state;
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
@@ -2336,6 +2340,12 @@ _hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
                                                   libxl__remus_state *remus_state);
 
+_hidden int libxl__remus_disks_postsuspend(libxl__remus_state *state);
+_hidden int libxl__remus_disks_preresume(libxl__remus_state *state);
+_hidden int libxl__remus_disks_commit(libxl__remus_state *state);
+_hidden int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss);
+_hidden void libxl__remus_disks_teardown(libxl__remus_state *state);
+
 _hidden void libxl__remus_setup_initiate(libxl__egc *egc,
                                          libxl__domain_suspend_state *dss);
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index cdc1c16..92eb36a 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -23,6 +23,7 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
                                  libxl__domain_suspend_state *dss)
 {
     libxl__ev_time_init(&dss->remus_state->timeout);
+    libxl__remus_disks_setup(egc, dss);
     if (!dss->remus_state->netbufscript)
         libxl__remus_setup_done(egc, dss, 0);
     else
@@ -51,6 +52,7 @@ void libxl__remus_teardown_initiate(libxl__egc *egc,
     /* stash rc somewhere before invoking teardown ops. */
     dss->remus_state->saved_rc = rc;
 
+    libxl__remus_disks_teardown(dss->remus_state);
     if (!dss->remus_state->netbuf_state)
         libxl__remus_teardown_done(egc, dss);
     else
diff --git a/tools/libxl/libxl_remus_replicated_disk.c b/tools/libxl/libxl_remus_replicated_disk.c
new file mode 100644
index 0000000..4b16403
--- /dev/null
+++ b/tools/libxl/libxl_remus_replicated_disk.c
@@ -0,0 +1,219 @@
+/*
+ * Copyright (C) 2013
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+typedef struct libxl__remus_disk
+{
+    const struct libxl_device_disk *disk;
+    const struct libxl__remus_disk_type *type;
+
+    /* ao callbacks for setup & teardown script */
+    int (*setup_cb)(struct libxl__remus_disk *d);
+    int (*teardown_cb)(struct libxl__remus_disk *d);
+} libxl__remus_disk;
+
+typedef struct libxl__remus_disk_type
+{
+    /* checkpointing */
+    int (*postsuspend)(libxl__remus_disk *d);
+    int (*preresume)(libxl__remus_disk *d);
+    int (*commit)(libxl__remus_disk *d);
+
+    /* setup & teardown */
+    libxl__remus_disk *(*setup)(libxl__gc *gc, libxl_device_disk *disk);
+    void (*teardown)(libxl__remus_disk *d);
+} libxl__remus_disk_type;
+
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+typedef struct libxl__remus_drbd_disk
+{
+    libxl__remus_disk remus_disk;
+    int ctl_fd;
+    int ackwait;
+} libxl__remus_drbd_disk;
+
+static int drbd_postsuspend(libxl__remus_disk *d)
+{
+    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
+
+    if (!drbd->ackwait) {
+        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            drbd->ackwait = 1;
+    }
+
+    return 0;
+}
+
+static int drbd_preresume(libxl__remus_disk *d)
+{
+    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
+
+    if (drbd->ackwait) {
+        ioctl(drbd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        drbd->ackwait = 0;
+    }
+
+    return 0;
+}
+
+static int drbd_commit(libxl__remus_disk *d)
+{
+    /* nothing to do, all work are done by DRBD's protocal-D. */
+    return 0;
+}
+
+static libxl__remus_disk *drbd_setup(libxl__gc *gc, libxl_device_disk *disk)
+{
+    libxl__remus_drbd_disk *drbd;
+    //if (!(drbd && protocal-D)) // TODO: need to run script async to check
+    //  return NULL
+
+    GCNEW(drbd);
+
+    drbd->ctl_fd = open(GCSPRINTF("/dev/drbd/by-res/%s", disk->pdev_path), O_RDONLY);
+    drbd->ackwait = 0;
+
+    if (drbd->ctl_fd < 0)
+        return NULL;
+
+    return &drbd->remus_disk;
+}
+
+static void drbd_teardown(libxl__remus_disk *d)
+{
+    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
+
+    close(drbd->ctl_fd);
+}
+
+static const libxl__remus_disk_type drbd_disk_type = {
+  .postsuspend = drbd_postsuspend,
+  .preresume = drbd_preresume,
+  .commit = drbd_commit,
+  .setup = drbd_setup,
+  .teardown = drbd_teardown,
+};
+
+/*** checkpoint disks states and callbacks ***/
+static const libxl__remus_disk_type *remus_disk_types[] =
+{
+    &drbd_disk_type,
+};
+
+int libxl__remus_disks_postsuspend(libxl__remus_state *state)
+{
+    int i;
+    int rc = 0;
+
+    for (i = 0; rc == 0 && i < state->nr_disks; i++)
+        rc = state->disks[i]->type->postsuspend(state->disks[i]);
+
+    return rc;
+}
+
+int libxl__remus_disks_preresume(libxl__remus_state *state)
+{
+    int i;
+    int rc = 0;
+
+    for (i = 0; rc == 0 && i < state->nr_disks; i++)
+        rc = state->disks[i]->type->preresume(state->disks[i]);
+
+    return rc;
+}
+
+int libxl__remus_disks_commit(libxl__remus_state *state)
+{
+    int i;
+    int rc = 0;
+
+    for (i = 0; rc == 0 && i < state->nr_disks; i++)
+        rc = state->disks[i]->type->commit(state->disks[i]);
+
+    return rc;
+}
+
+#if 0
+/* TODO: implement disk setup/teardown script */
+static void disk_exec_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
+                                      const struct timeval *requested_abs)
+{
+    libxl__remus_disks_state *state = CONTAINER_OF(ev, *aodev, timeout);
+    STATE_AO_GC(state->ao);
+
+    libxl__ev_time_deregister(gc, &state->timeout);
+
+    assert(libxl__ev_child_inuse(&state->child));
+    if (kill(state->child.pid, SIGKILL)) {
+    }
+
+    return;
+}
+
+int libxl__remus_disks_exec_script(libxl__gc *gc,
+    libxl__remus_disks_state *state)
+{
+}
+#endif
+
+int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss)
+{
+    libxl__remus_state *remus_state = dss->remus_state;
+    int i, j, nr_disks;
+    libxl_device_disk *disks;
+    libxl__remus_disk *remus_disk;
+    const libxl__remus_disk_type *type;
+
+    STATE_AO_GC(dss->ao);
+    disks = libxl_device_disk_list(CTX, dss->domid, &nr_disks);
+    remus_state->nr_disks = nr_disks;
+    GCNEW_ARRAY(remus_state->disks, nr_disks);
+
+    for (i = 0; i < nr_disks; i++) {
+        remus_disk = NULL;
+        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
+            type = remus_disk_types[j];
+            remus_disk = type->setup(gc, &disks[i]);
+            if (!remus_disk)
+                break;
+
+            remus_state->disks[i] = remus_disk;
+            remus_disk->disk = &disks[i];
+            remus_disk->type = type;
+        }
+        if (!remus_disk) {
+            remus_state->nr_disks = i;
+            libxl__remus_disks_teardown(remus_state);
+            return -1;
+        }
+    }
+    return 0;
+}
+
+void libxl__remus_disks_teardown(libxl__remus_state *state)
+{
+    int i;
+
+    for (i = 0; i < state->nr_disks; i++)
+        state->disks[i]->type->teardown(state->disks[i]);
+    state->nr_disks = 0;
+}
+
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info
  2014-02-10  9:19     ` [PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info Lai Jiangshan
@ 2014-03-03 16:33       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 16:33 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info"):
> From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> Add two members:
> 1. netbuf: whether netbuf is enabled
> 2. netbufscript: the path of the script which will be run to setup
>      and tear down the guest's interface.
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>

Thanks.  But I think these parameters should be introduced at the same
time (ie in the same patch) as their implementation.

Regards,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state
  2014-02-10  9:19     ` [PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state Lai Jiangshan
@ 2014-03-03 16:38       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 16:38 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state"):
> libxl_domain_remus_info only contains the argument of the command
> 'xl remus'. So introduce a new structure libxl__remus_state to save
> the remus state.

I appreciate that you've probably split this up to try to make the
review easier, but I think there would probably be a way to do this
that made the patches make more sense when reviewed in isolation.

For this one:

> +    /* convenience shorthand */
> +    libxl__remus_state *remus_state = dss->remus_state;
> +    remus_state->blackhole = info->blackhole;
> +    remus_state->interval = info->interval;
> +    remus_state->compression = info->compression;
> +    remus_state->dss = dss;
> +    libxl__ev_child_init(&remus_state->child);

AFAICT the main point of this patch seems to be to copy a bunch of
configuration options from libxl_domain_remus_start's info argument
into dss->remus_state.

I don't understand why this is desirable.  Does the info argument not
have a sufficient lifetime ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled
  2014-02-10  9:19     ` [PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled Lai Jiangshan
@ 2014-03-03 16:39       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 16:39 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled"):
> From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
> or returns 0 when network buffering is not compiled.
> 
> If network buffering is not compiled, and the user wants to use it, report
> a error and exit.

The code here seems plausible but again it looks like it would be more
sensibly part of some larger patch.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown
  2014-02-10  9:19     ` [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown Lai Jiangshan
@ 2014-03-03 17:44       ` Ian Jackson
  2014-04-03 14:06         ` [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown [and 1 more messages] Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 17:44 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown"):


Thanks.  I have reviewed much of this in some detail and have some
comments.


> --- a/docs/misc/xenstore-paths.markdown
> +++ b/docs/misc/xenstore-paths.markdown
> @@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
>  
>  The device model version for a domain.
>  
> +#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
> +
> +IFB device used by Remus to buffer network output from the associated vif.

I think this should expand "IFB".  Also, are these interface buffer
devices really called "IFB" and not "ifb" ?

> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 84a467c..218f55e 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -52,6 +52,8 @@ else
>  LIBXL_OBJS-y += libxl_nonetbuffer.o
>  endif
>  
> +LIBXL_OBJS-y += libxl_remus.o
> +
>  LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
>  LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
>  
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 8d63f90..e3e9f6f 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -753,9 +753,6 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>  
>  /*==================== Domain suspend (save) ====================*/
>  
> -static void domain_suspend_done(libxl__egc *egc,
> -                        libxl__domain_suspend_state *dss, int rc);
> -
>  /*----- complicated callback, called by xc_domain_save -----*/
>  
>  /*
> @@ -1508,8 +1505,8 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
>      dss->save_dm_callback(egc, dss, our_rc);
>  }
>  
> -static void domain_suspend_done(libxl__egc *egc,
> -                        libxl__domain_suspend_state *dss, int rc)
> +void domain_suspend_done(libxl__egc *egc,
> +                         libxl__domain_suspend_state *dss, int rc)
>  {
>      STATE_AO_GC(dss->ao);
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 2f64382..4006174 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2313,6 +2313,23 @@ typedef struct libxl__remus_state {
>  
>  _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>  
> +_hidden void domain_suspend_done(libxl__egc *egc,
> +                                 libxl__domain_suspend_state *dss,
> +                                 int rc);
> +
> +_hidden void libxl__remus_setup_done(libxl__egc *egc,
> +                                     libxl__domain_suspend_state *dss,
> +                                     int rc);
> +
> +_hidden void libxl__remus_netbuf_setup(libxl__egc *egc,
> +                                       libxl__domain_suspend_state *dss);
> +
> +_hidden void libxl__remus_teardown_done(libxl__egc *egc,
> +                                        libxl__domain_suspend_state *dss);
> +
> +_hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
> +                                          libxl__domain_suspend_state *dss);
> +
>  struct libxl__domain_suspend_state {
>      /* set by caller of libxl__domain_suspend */
>      libxl__ao *ao;
> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
> index 8e23d75..2c77076 100644
> --- a/tools/libxl/libxl_netbuffer.c
> +++ b/tools/libxl/libxl_netbuffer.c
> @@ -17,11 +17,492 @@
>  
>  #include "libxl_internal.h"
>  
> +#include <netlink/cache.h>
> +#include <netlink/socket.h>
> +#include <netlink/attr.h>
> +#include <netlink/route/link.h>
> +#include <netlink/route/route.h>
> +#include <netlink/route/qdisc.h>
> +#include <netlink/route/qdisc/plug.h>
> +
> +typedef struct libxl__remus_netbuf_state {
> +    struct rtnl_qdisc **netbuf_qdisc_list;
> +    struct nl_sock *nlsock;
> +    struct nl_cache *qdisc_cache;
> +    const char **vif_list;
> +    const char **ifb_list;
> +    uint32_t num_netbufs;
> +    uint32_t unused;
> +} libxl__remus_netbuf_state;
> +
>  int libxl__netbuffer_enabled(libxl__gc *gc)
>  {
>      return 1;
>  }
>  
> +/* If the device has a vifname, then use that instead of
> + * the vifX.Y format.
> + */
> +static const char *get_vifname(libxl__gc *gc, uint32_t domid,
> +                               libxl_device_nic *nic)
> +{
> +    const char *vifname = NULL;
> +    const char *path;
> +    int rc;
> +
> +    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
> +                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
> +    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
> +    if (!rc && !vifname) {
> +        /* use the default name */
> +        vifname = libxl__device_nic_devname(gc, domid,
> +                                            nic->devid,
> +                                            nic->nictype);
> +    }
> +
> +    return vifname;
> +}

I think the error handling here is rather odd.  It would be better to
use the "goto out" style.  And the callers should treat NULL from this
function as a fatal error.

> +static const char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
> +                                       int *num_vifs)
> +{
> +    libxl_device_nic *nics = NULL;
> +    int nb, i = 0;
> +    const char **vif_list = NULL;
> +
> +    *num_vifs = 0;
> +    nics = libxl_device_nic_list(CTX, domid, &nb);
> +    if (!nics)
> +        return NULL;

It would be clearer IMO if this sayd "goto out";

> +
> +    /* Ensure that none of the vifs are backed by driver domains */
> +    for (i = 0; i < nb; i++) {
> +        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
> +            const char *vifname = get_vifname(gc, domid, &nics[i]);
> +
> +            if (!vifname)
> +              vifname = "(unknown)";
> +            LOG(ERROR, "vif %s has driver domain (%u) as its backend. "
> +                "Network buffering is not supported with driver domains",
> +                vifname, nics[i].backend_domid);
> +            *num_vifs = -1;
> +            goto out;

The error handling return style of this function is very odd and not
documented.

> +static void free_qdiscs(libxl__remus_netbuf_state *netbuf_state)
> +{
> +    int i;
> +    struct rtnl_qdisc *qdisc = NULL;
> +
> +    /* free qdiscs */
> +    for (i = 0; i < netbuf_state->num_netbufs; i++) {
> +        qdisc = netbuf_state->netbuf_qdisc_list[i];

If you made this
           qdisc = &netbuf_state->netbuf_qdisc_list[i];
then you could say
           *qdisc = NULL;
and not have to write out the long expression again.

> +    /* free qdisc cache */
> +    if (netbuf_state->qdisc_cache) {
> +      nl_cache_clear(netbuf_state->qdisc_cache);
> +      nl_cache_free(netbuf_state->qdisc_cache);

Wrong indent level.

> +static int init_qdiscs(libxl__gc *gc,
> +                       libxl__remus_state *remus_state)
> +{
...
> +    libxl__remus_netbuf_state * const netbuf_state = remus_state->netbuf_state;
                                  ^
Coding style (extra space).

> +static void netbuf_setup_timeout_cb(libxl__egc *egc,
> +                                    libxl__ev_time *ev,
> +                                    const struct timeval *requested_abs)
> +{
> +    libxl__remus_state *remus_state = CONTAINER_OF(ev, *remus_state, timeout);
> +
> +    /* Convenience aliases */
> +    const int devid = remus_state->dev_id;
> +    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
> +    const char *const vif = netbuf_state->vif_list[devid];
> +
> +    STATE_AO_GC(remus_state->dss->ao);
> +
> +    libxl__ev_time_deregister(gc, &remus_state->timeout);
> +    assert(libxl__ev_child_inuse(&remus_state->child));
> +
> +    LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout",
> +        remus_state->netbufscript, vif);
> +
> +    if (kill(remus_state->child.pid, SIGKILL)) {
> +        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
> +              remus_state->netbufscript,
> +              (unsigned long)remus_state->child.pid);
> +    }
> +
> +    return;
> +}

This function bears a striking resemblance to
device_hotplug_timeout_cb.  Likewise parts of exec_netbuf_script look
very much like parts of device_hotplug, etc.

You should arrange to reuse code rather than clone-and-hacking it,
refactoring if necessary.  If refactoring is necessary, that should be
brought out into a pre-patch with no functional change.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 06/10 V7] remus: implement the API to buffer/release packages
  2014-02-10  9:19     ` [PATCH 06/10 V7] remus: implement the API to buffer/release packages Lai Jiangshan
@ 2014-03-03 17:48       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 17:48 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 06/10 V7] remus: implement the API to buffer/release packages"):
> This patch implements two APIs:
> 1. libxl__remus_netbuf_start_new_epoch()
>    It marks a new epoch. The packages before this epoch will
>    be flushed, and the packages after this epoch will be buffered.
>    It will be called after the guest is suspended.
> 2. libxl__remus_netbuf_release_prev_epoch()
>    It flushes the buffered packages to client, and it will be
>    called when a checkpoint finishes.

Thanks.

> +_hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t dom\
id,
> +                                               libxl__remus_state *remus_st\
ate);
> +
> +_hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t \
domid,
> +                                                  libxl__remus_state *remus\
_state);

I'd really appreciate it if you could wrap or otherwise reformat the
long lines in these patches.  As you can see, they appear on my screen
with pretty bad wrap damage otherwise.

> +        if (buffer_op == tc_buffer_start)
> +            ret = rtnl_qdisc_plug_buffer(netbuf_state->netbuf_qdisc_list[i]);
> +        else
> +            ret = rtnl_qdisc_plug_release_one(netbuf_state->netbuf_qdisc_list[i]);
> +
> +        if (!ret)
> +            ret = rtnl_qdisc_add(netbuf_state->nlsock,
> +                                 netbuf_state->netbuf_qdisc_list[i],
> +                                 NLM_F_REQUEST);

This error handling approach is unconventional for libxl.  The correct
approach would be to explicitly check after the first call.  If you
really want to have only one logging point, you can do that in your
"out:" section (reached via "goto out").

I confess I don't understand why this patch is broken out at this
point.  It provides these two functions but not yet any callers.  Is
that right ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering
  2014-02-10  9:19     ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering Lai Jiangshan
@ 2014-03-03 17:51       ` Ian Jackson
  2014-04-23 16:02         ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages] Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 17:51 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering"):
> From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> If there is network buffering hotplug scripts, call
> libxl__remus_netbuf_setup() to setup the network
> buffering and libxl__remus_netbuf_teardown() to
> teardown network buffering.

> +    if (dss->remus_state) {
> +        /*
> +         * With Remus, if we reach this point, it means either
> +         * backup died or some network error occurred preventing us
> +         * from sending checkpoints. Teardown the network buffers and
> +         * release netlink resources.  This is an async op.
> +         */
> +        libxl__remus_teardown_initiate(egc, dss, rc);
> +        return;
> +    }

This patch seems plausible.  But I wonder if it might not be better to
provide a firmer interface between the remus code and the rest of the
save/restore machinery.  That is, have an explicit callback function
recorded by the save/restore code which is called back by the remus
machinery when it has done its work.  What do you think ?

I think having the flow of control spring off into libxl_remus.c and
magically come back by libxl_remus.c knowing to call
domain_suspend_done is rather opaque.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb()
  2014-02-10  9:19     ` [PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Lai Jiangshan
@ 2014-03-03 17:52       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 17:52 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb()"):
> From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> Failover means that: the machine on which primary vm is running is
> down, and we need to start the secondary vm to take over the primary
> vm. remus_failover_cb() is called when remus fails, not when we need
> to do failover. So rename it to remus_replication_failure_cb()
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>

Thanks.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 09/10 V7] libxl: control network buffering in remus callbacks
  2014-02-10  9:19     ` [PATCH 09/10 V7] libxl: control network buffering in remus callbacks Lai Jiangshan
@ 2014-03-03 17:54       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 17:54 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 09/10 V7] libxl: control network buffering in remus callbacks"):
> From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> This patch constitutes the core network buffering logic.
> and does the following:
>  a) create a new network buffer when the domain is suspended
>     (remus_domain_suspend_callback)
>  b) release the previous network buffer pertaining to the
>     committed checkpoint (remus_domain_checkpoint_dm_saved)
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>

Thanks.  This looks plausible but I would like to review it in the
context of the whole series.  This would be a lot easier if I had the
whole series available to be as a git repo.

So when you repost this, can you also please provide me with a public
git url to fetch the branch from ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 10/10 V7] libxl: network buffering cmdline switch
  2014-02-10  9:19     ` [PATCH 10/10 V7] libxl: network buffering cmdline switch Lai Jiangshan
@ 2014-03-03 17:58       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-03 17:58 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 10/10 V7] libxl: network buffering cmdline switch"):
> From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> Command line switch to 'xl remus' command, to enable network buffering.
> Pass on this flag to libxl so that it can act accordingly.
> Also update man pages to reflect the addition of a new option to
> 'xl remus' command.
> 
> Note: the network buffering is enabled as default. If you want to
> disable it, please use -n option.
...
> diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
> index 7c43bde..8ae19bb 100644
> --- a/docs/man/xl.conf.pod.5
> +++ b/docs/man/xl.conf.pod.5
> @@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
>  
>  Default: C<None>
>  
> +=item B<remus.default.netbufscript="PATH">
> +
> +Configures the default script used by Remus to setup network buffering.

You provide a global option to control the script, but no per-domain
config option.  Why ?

> @@ -7287,6 +7294,9 @@ int main_remus(int argc, char **argv)
>      domid = find_domain(argv[optind]);
>      host = argv[optind + 1];
>  
> +    if(!r_info.netbufscript)
         ^
Coding style (missing space).

...
> +    if (libxl_domain_info(ctx, 0, domid)) {
> +        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
> +        close(send_fd);
> +        return 0;

You should not assume that any error means that the domain has been
destroyed.  Rather you should check the error code.

>  > +    /* If we are here, it means remus setup/domain suspend/backup has
> +     * failed. Try to resume the domain and exit gracefully.
> +     * TODO: Split-Brain check.

What are your plans for the split brain check ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-02-26  2:53     ` [PATCH RFC] remus: implement remus replicated checkpointing disk Lai Jiangshan
@ 2014-03-10 11:28       ` Ian Jackson
  2014-03-10 12:34         ` Lai Jiangshan
  2014-03-11 18:10       ` Shriram Rajagopalan
  1 sibling, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-03-10 11:28 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("[PATCH RFC] remus: implement remus replicated checkpointing disk"):
> This patch implements remus replicated checkpointing disk.
> It includes two parts:
...
> I request *comments* as many as possible.

Thanks for posting this so early.  It's very helpful to be able to
review it before it's been polished.  Sorry it's taken a while to
reply:

> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index a4ffdfd..858f5be 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1251,9 +1251,14 @@ static int libxl__remus_domain_suspend_callback(void *data)

These parts seem reasonable.

> +    rc = libxl__remus_disks_commit(remus_state);
> +    if (rc) {
> +        LOG(ERROR, "Failed to commit disks state"
> +            " Terminating Remus..");

Why do we log a message hear but not in the other
libxl__remus_disks_foo failure cases ?

> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
> index cdc1c16..92eb36a 100644
> --- a/tools/libxl/libxl_remus.c
> +++ b/tools/libxl/libxl_remus.c
> @@ -23,6 +23,7 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
>                                   libxl__domain_suspend_state *dss)
>  {
>      libxl__ev_time_init(&dss->remus_state->timeout);
> +    libxl__remus_disks_setup(egc, dss);

I think this is going to have to be an asynchronous function (ie, use
a callback style), as it's going to want to run scripts.  Likewise the
teardown.

> +/*** drbd implementation ***/
> +const int DRBD_SEND_CHECKPOINT = 20;
> +const int DRBD_WAIT_CHECKPOINT_ACK = 30;

These should be "static" as well as "const".

> +typedef struct libxl__remus_drbd_disk
> +{

Our coding style reserves "{" in the LH column for functions, so your
struct definitions should have the "{" on the end of the previous
line.  See libxl__device and libxl__ev_watch_slot for examples.

> +static int drbd_postsuspend(libxl__remus_disk *d)
> +{
> +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
> +
> +    if (!drbd->ackwait) {
> +        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
> +            drbd->ackwait = 1;

This seems to make some assumption about return values, or lack of
errors, or something.  I would expect to see some error handling
here.

> +static int drbd_commit(libxl__remus_disk *d)
> +{
> +    /* nothing to do, all work are done by DRBD's protocal-D. */
> +    return 0;
> +}

I'm not sure I understand how this can be true.  Can you point me at
an explanation of the supposed semantics of the remus disk commit ?
(Eg in a remus design document or even a paper.)  I suspect something
ought to be done here.

> +static libxl__remus_disk *drbd_setup(libxl__gc *gc, libxl_device_disk *disk)
...
> +    drbd->ctl_fd = open(GCSPRINTF("/dev/drbd/by-res/%s", disk->pdev_path), O_RDONLY);

This line could do with wrapping.  And your error handling is a bit
nugatory, I think - surely something should be logged here ?

> +static const libxl__remus_disk_type drbd_disk_type = {
> +  .postsuspend = drbd_postsuspend,
> +  .preresume = drbd_preresume,
> +  .commit = drbd_commit,
> +  .setup = drbd_setup,
> +  .teardown = drbd_teardown,
> +};

I like this vtable approach.

> +int libxl__remus_disks_postsuspend(libxl__remus_state *state)
> +{
> +    int i;
> +    int rc = 0;
> +
> +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
> +        rc = state->disks[i]->type->postsuspend(state->disks[i]);
> +
> +    return rc;
> +}

I think the error handling in these functions isn't correct.

Also, there are several almost-identical functions.  Can you consider
whether you can write a macro to define them, or perhaps use offsetof
to write a generic version of the function, or something ?

> +#if 0
> +/* TODO: implement disk setup/teardown script */
> +static void disk_exec_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
> +                                      const struct timeval *requested_abs)

This will probably be easier after the refactoring needed to tease out
the common script invocation code for the network buffering.

> +int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss)
> +{
> +    libxl__remus_state *remus_state = dss->remus_state;
> +    int i, j, nr_disks;
> +    libxl_device_disk *disks;
> +    libxl__remus_disk *remus_disk;
> +    const libxl__remus_disk_type *type;
> +
> +    STATE_AO_GC(dss->ao);
> +    disks = libxl_device_disk_list(CTX, dss->domid, &nr_disks);

disks doesn't come from the gc, so you need to free it.  You should
initialise it to 0 (NULL), and use the "goto out" error handling
style.

> +    remus_state->nr_disks = nr_disks;
> +    GCNEW_ARRAY(remus_state->disks, nr_disks);
> +
> +    for (i = 0; i < nr_disks; i++) {
> +        remus_disk = NULL;
> +        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
> +            type = remus_disk_types[j];
> +            remus_disk = type->setup(gc, &disks[i]);
> +            if (!remus_disk)
> +                break;
> +
> +            remus_state->disks[i] = remus_disk;
> +            remus_disk->disk = &disks[i];
> +            remus_disk->type = type;
> +        }

I think this code is wrong.  It appears to call all of the setup
functions, not just one, and overwrite remus_disk with their
successive results.

> +        if (!remus_disk) {
> +            remus_state->nr_disks = i;

You may find this easier to write with the "goto found" / "found:"
search loop idiom.  See "childproc_checkall" in libxl_fork.c for an
example.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-10 11:28       ` Ian Jackson
@ 2014-03-10 12:34         ` Lai Jiangshan
  2014-03-10 16:19           ` Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-03-10 12:34 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

On 03/10/2014 07:28 PM, Ian Jackson wrote:
> Lai Jiangshan writes ("[PATCH RFC] remus: implement remus replicated checkpointing disk"):
>> This patch implements remus replicated checkpointing disk.
>> It includes two parts:
> ...
>> I request *comments* as many as possible.
> 
> Thanks for posting this so early.  It's very helpful to be able to
> review it before it's been polished.  Sorry it's taken a while to
> reply:
> 
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index a4ffdfd..858f5be 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1251,9 +1251,14 @@ static int libxl__remus_domain_suspend_callback(void *data)
> 
> These parts seem reasonable.
> 
>> +    rc = libxl__remus_disks_commit(remus_state);
>> +    if (rc) {
>> +        LOG(ERROR, "Failed to commit disks state"
>> +            " Terminating Remus..");
> 
> Why do we log a message hear but not in the other
> libxl__remus_disks_foo failure cases ?
> 
>> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
>> index cdc1c16..92eb36a 100644
>> --- a/tools/libxl/libxl_remus.c
>> +++ b/tools/libxl/libxl_remus.c
>> @@ -23,6 +23,7 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
>>                                   libxl__domain_suspend_state *dss)
>>  {
>>      libxl__ev_time_init(&dss->remus_state->timeout);
>> +    libxl__remus_disks_setup(egc, dss);
> 
> I think this is going to have to be an asynchronous function (ie, use
> a callback style), as it's going to want to run scripts.  Likewise the
> teardown.
> 
>> +/*** drbd implementation ***/
>> +const int DRBD_SEND_CHECKPOINT = 20;
>> +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
> 
> These should be "static" as well as "const".
> 
>> +typedef struct libxl__remus_drbd_disk
>> +{
> 
> Our coding style reserves "{" in the LH column for functions, so your
> struct definitions should have the "{" on the end of the previous
> line.  See libxl__device and libxl__ev_watch_slot for examples.
> 
>> +static int drbd_postsuspend(libxl__remus_disk *d)
>> +{
>> +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
>> +
>> +    if (!drbd->ackwait) {
>> +        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
>> +            drbd->ackwait = 1;
> 
> This seems to make some assumption about return values, or lack of
> errors, or something.  I would expect to see some error handling
> here.
> 
>> +static int drbd_commit(libxl__remus_disk *d)
>> +{
>> +    /* nothing to do, all work are done by DRBD's protocal-D. */
>> +    return 0;
>> +}
> 
> I'm not sure I understand how this can be true.  Can you point me at
> an explanation of the supposed semantics of the remus disk commit ?
> (Eg in a remus design document or even a paper.)  I suspect something
> ought to be done here.


in drbd-remus case, DRBD_SEND_CHECKPOINT(drbd_postsuspend()) will
do the committing works asynchronously. 

tools/python/xen/remus/device.py:
    def commit(self):
        if not self.is_drbd:
            msg = os.read(self.msgfd.fileno(), 4)
            if msg != 'done':
                print 'Unknown message: %s' % msg

> 
>> +static libxl__remus_disk *drbd_setup(libxl__gc *gc, libxl_device_disk *disk)
> ...
>> +    drbd->ctl_fd = open(GCSPRINTF("/dev/drbd/by-res/%s", disk->pdev_path), O_RDONLY);
> 
> This line could do with wrapping.  And your error handling is a bit
> nugatory, I think - surely something should be logged here ?
> 
>> +static const libxl__remus_disk_type drbd_disk_type = {
>> +  .postsuspend = drbd_postsuspend,
>> +  .preresume = drbd_preresume,
>> +  .commit = drbd_commit,
>> +  .setup = drbd_setup,
>> +  .teardown = drbd_teardown,
>> +};
> 
> I like this vtable approach.
> 
>> +int libxl__remus_disks_postsuspend(libxl__remus_state *state)
>> +{
>> +    int i;
>> +    int rc = 0;
>> +
>> +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
>> +        rc = state->disks[i]->type->postsuspend(state->disks[i]);
>> +
>> +    return rc;
>> +}
> 
> I think the error handling in these functions isn't correct.
> 
> Also, there are several almost-identical functions.  Can you consider
> whether you can write a macro to define them, or perhaps use offsetof
> to write a generic version of the function, or something ?
> 
>> +#if 0
>> +/* TODO: implement disk setup/teardown script */
>> +static void disk_exec_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
>> +                                      const struct timeval *requested_abs)
> 
> This will probably be easier after the refactoring needed to tease out
> the common script invocation code for the network buffering.
> 
>> +int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss)
>> +{
>> +    libxl__remus_state *remus_state = dss->remus_state;
>> +    int i, j, nr_disks;
>> +    libxl_device_disk *disks;
>> +    libxl__remus_disk *remus_disk;
>> +    const libxl__remus_disk_type *type;
>> +
>> +    STATE_AO_GC(dss->ao);
>> +    disks = libxl_device_disk_list(CTX, dss->domid, &nr_disks);
> 
> disks doesn't come from the gc, so you need to free it.  You should
> initialise it to 0 (NULL), and use the "goto out" error handling
> style.
> 
>> +    remus_state->nr_disks = nr_disks;
>> +    GCNEW_ARRAY(remus_state->disks, nr_disks);
>> +
>> +    for (i = 0; i < nr_disks; i++) {
>> +        remus_disk = NULL;
>> +        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
>> +            type = remus_disk_types[j];
>> +            remus_disk = type->setup(gc, &disks[i]);
>> +            if (!remus_disk)
>> +                break;
>> +
>> +            remus_state->disks[i] = remus_disk;
>> +            remus_disk->disk = &disks[i];
>> +            remus_disk->type = type;
>> +        }
> 
> I think this code is wrong.  It appears to call all of the setup
> functions, not just one, and overwrite remus_disk with their
> successive results.

If the user use remus disk replication, it is required that
all the disks should support remus disk replication.

So we call setup to all disk. If any disk doesn't support remus
or any disk fail to setup, this libxl__remus_disks_setup() should failed too.

tools/python/xen/remus/device.py:
    def __init__(self, disk):
        if disk.uname.startswith('tap:remus:') or disk.uname.startswith('tap:tapdisk:remus:'):
            ...
        elif disk.uname.startswith('drbd:'):
        else:
            raise ReplicatedDiskException('Disk is not replicated: %s' %
                                        str(disk))


> 
>> +        if (!remus_disk) {
>> +            remus_state->nr_disks = i;
> 
> You may find this easier to write with the "goto found" / "found:"
> search loop idiom.  See "childproc_checkall" in libxl_fork.c for an
> example.
> 
> Thanks,
> Ian.
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-10 12:34         ` Lai Jiangshan
@ 2014-03-10 16:19           ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-03-10 16:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

Lai Jiangshan writes ("Re: [PATCH RFC] remus: implement remus replicated checkpointing disk"):
> On 03/10/2014 07:28 PM, Ian Jackson wrote:
> > I'm not sure I understand how this can be true.  Can you point me at
> > an explanation of the supposed semantics of the remus disk commit ?
> > (Eg in a remus design document or even a paper.)  I suspect something
> > ought to be done here.
> 
> in drbd-remus case, DRBD_SEND_CHECKPOINT(drbd_postsuspend()) will
> do the committing works asynchronously. 

Um, I'm still not convinced that this is right.  Can you point me
to the relevant design documentation for remus (as I say above) and
the relevant documentation for the drbd checkpoint facility ?  That
will allow me to understand this and check that it's correct.

> >> +    for (i = 0; i < nr_disks; i++) {
> >> +        remus_disk = NULL;
> >> +        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
> >> +            type = remus_disk_types[j];
> >> +            remus_disk = type->setup(gc, &disks[i]);
> >> +            if (!remus_disk)
> >> +                break;
> >> +
> >> +            remus_state->disks[i] = remus_disk;
> >> +            remus_disk->disk = &disks[i];
> >> +            remus_disk->type = type;
> >> +        }
> > 
> > I think this code is wrong.  It appears to call all of the setup
> > functions, not just one, and overwrite remus_disk with their
> > successive results.
> 
> If the user use remus disk replication, it is required that
> all the disks should support remus disk replication.

Oh, I see.

> So we call setup to all disk. If any disk doesn't support remus
> or any disk fail to setup, this libxl__remus_disks_setup() should failed too.

Right.  I think this deserves a long message.  And in that case,
I think:

+        if (!remus_disk) {
+            remus_state->nr_disks = i;
+            libxl__remus_disks_teardown(remus_state);
+            return -1;
+        }

Instead of rewinding nr_disks, it would be better to make

+void libxl__remus_disks_teardown(libxl__remus_state *state)
+{
+    int i;
+
+    for (i = 0; i < state->nr_disks; i++)
+        state->disks[i]->type->teardown(state->disks[i]);

this code not mind if disks[i] == NULL;

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-02-26  2:53     ` [PATCH RFC] remus: implement remus replicated checkpointing disk Lai Jiangshan
  2014-03-10 11:28       ` Ian Jackson
@ 2014-03-11 18:10       ` Shriram Rajagopalan
  2014-03-12  2:35         ` Lai Jiangshan
  2014-03-12 10:06         ` Ian Campbell
  1 sibling, 2 replies; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-03-11 18:10 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 15127 bytes --]

On Tue, Feb 25, 2014 at 6:53 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> This patch implements remus replicated checkpointing disk.
> It includes two parts:
>   generic remus replicated checkpointing disks framework
>   drbd replicated checkpointing disks
> They will be split into different files in next round.
>
> The patch is still simple due to disk-setup-teardown-script is
> still under implementing. I need to use libxl_ao to implement it,
> but libxl_ao is hard to use. The work sequence is needed to ugly split
> to serveral callbacks like device_hotplug().
>
> And becuase the remus disk script is unimplemented, the drbd_setup() code
> can't check the disk now. So it just assumes the user config the disk
> correctly.
>
> This patch is *UNTESTED*.
> (there is a problem with xl&drbd(without remus) in my BOXes).
>
> I request *comments* as many as possible.
>
> Thanks,
> Lai
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>


Hi
 sorry for the delayed response. And thanks a lot for this initiative.
Apart from the inline feedback,
there are a few things to consider first before going down this route.

1. The drbd kernel module required for Remus is still out of tree,
currently hosted on a wiki page.
The drbd folks didnt want to include the changes into their code
unfortunately, as they were offering the
same functionality to one of their paid customers. This is what they told
me back in 2011 or so.

To streamline the storage replication module installation, is there a
chance of hosting the code in
xen.org's repos? That way, we could script the download and installation
process. Like the qemu
stuff.

2. The tapdisk based replication unfortunately is outdated. Please correct
me if I have got this wrong.
Haven't we decided to get rid of blktap2 and go with the qemu disk models?
In which case, the tapdisk
remus code has to be ported into some qemu disk variant.

Without getting a resolution to the above two, my stance is that we
shouldn't pollute xl with functionality
that requires out-of-band modules that may prove pretty painful to install
for the majority of folks out there.

Based on the experience from the last 3 years, most average users of Remus
tend to skip disk replication
altogether.  They install the distro's default drbd, use the disk
replication provided with it and then complain
that Remus crashes or fails.  Some have ventured into tapdisk replication
but it unfortunately seemed to
get difficult as xend/blktap2 started getting deprecated.



> ---
>  tools/libxl/Makefile                      |    1 +
>  tools/libxl/libxl_dom.c                   |   19 +++-
>  tools/libxl/libxl_internal.h              |   10 ++
>  tools/libxl/libxl_remus.c                 |    2 +
>  tools/libxl/libxl_remus_replicated_disk.c |  219
> +++++++++++++++++++++++++++++
>  5 files changed, 249 insertions(+), 2 deletions(-)
>  create mode 100644 tools/libxl/libxl_remus_replicated_disk.c
>
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 218f55e..dbf5dd9 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -53,6 +53,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
>  endif
>
>  LIBXL_OBJS-y += libxl_remus.o
> +LIBXL_OBJS-y += libxl_remus_replicated_disk.o
>
>
So I think this part will also require some autoconf based stuff.
Especially, if DRBD & tapdisk are not present, then this whole
thing gets disabled. Just like the libxl_netbuffer and libxl_nonetbuffer

Given that both of these (netbuffer and disk) are associated with Remus
and both are required for Remus to work "correctly", we might as well have
noremus.c and remus.c . Ofcourse it can be modularized a bit to have
netbuffer but no disk replication or vice versa. As long as the person
installing
or compiling this stuff is made to state explicitly that he/she does not
want
Remus, but only a subset of its functionality for some other purpose.


diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index a4ffdfd..858f5be 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
>



>      /* The domain was suspended successfully. Start a new network
> @@ -1279,7 +1284,10 @@ static int libxl__remus_domain_resume_callback(void
> *data)
>      if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>          return 0;
>
> -    /* REMUS TODO: Deal with disk. */
> +    /* Deal with disk. */
> +    if (libxl__remus_disks_preresume(dss->remus_state))
> +        return 0;
> +
>

Bug. This should go before the resume call. Also, I would suggest changing
the comment
to something more meaningful, e.g., "commit disk changes.."


     return 1;
>  }
>
> @@ -1326,6 +1334,13 @@ static void remus_checkpoint_dm_saved(libxl__egc
> *egc,
>          goto out;
>      }
>
> +    rc = libxl__remus_disks_commit(remus_state);
> +    if (rc) {
> +        LOG(ERROR, "Failed to commit disks state"
> +            " Terminating Remus..");
> +        goto out;
> +    }
> +
>

Now might be a good time to use the restore callbacks offered by the
toolstack to
get an explicit ack from the backup that it has received the memory
checkpoint too
before the network buffer is released. I think I put in a comment related
to that somewhere.



>      if (remus_state->netbuf_state) {
>          rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
>                                                      remus_state);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 1bd2bba..8933e5f 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2309,6 +2309,10 @@ typedef struct libxl__remus_state {
>      void *netbuf_state;
>      libxl__ev_time timeout;
>      libxl__ev_child child;
> +
> +    /* remus disks state */
> +    uint32_t nr_disks;
> +    struct libxl__remus_disk **disks;
>  } libxl__remus_state;
>
>  _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
> @@ -2336,6 +2340,12 @@ _hidden int
> libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
>  _hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc,
> uint32_t domid,
>                                                    libxl__remus_state
> *remus_state);
>
> +_hidden int libxl__remus_disks_postsuspend(libxl__remus_state *state);
> +_hidden int libxl__remus_disks_preresume(libxl__remus_state *state);
> +_hidden int libxl__remus_disks_commit(libxl__remus_state *state);
> +_hidden int libxl__remus_disks_setup(libxl__egc *egc,
> libxl__domain_suspend_state *dss);
> +_hidden void libxl__remus_disks_teardown(libxl__remus_state *state);
> +
>  _hidden void libxl__remus_setup_initiate(libxl__egc *egc,
>                                           libxl__domain_suspend_state
> *dss);
>
> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
> index cdc1c16..92eb36a 100644
> --- a/tools/libxl/libxl_remus.c
> +++ b/tools/libxl/libxl_remus.c
> @@ -23,6 +23,7 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
>                                   libxl__domain_suspend_state *dss)
>  {
>      libxl__ev_time_init(&dss->remus_state->timeout);
> +    libxl__remus_disks_setup(egc, dss);
>      if (!dss->remus_state->netbufscript)
>          libxl__remus_setup_done(egc, dss, 0);
>      else
> @@ -51,6 +52,7 @@ void libxl__remus_teardown_initiate(libxl__egc *egc,
>      /* stash rc somewhere before invoking teardown ops. */
>      dss->remus_state->saved_rc = rc;
>
> +    libxl__remus_disks_teardown(dss->remus_state);
>      if (!dss->remus_state->netbuf_state)
>          libxl__remus_teardown_done(egc, dss);
>      else
> diff --git a/tools/libxl/libxl_remus_replicated_disk.c
> b/tools/libxl/libxl_remus_replicated_disk.c
> new file mode 100644
> index 0000000..4b16403
> --- /dev/null
> +++ b/tools/libxl/libxl_remus_replicated_disk.c
> @@ -0,0 +1,219 @@
> +/*
> + * Copyright (C) 2013
> + * Author Lai Jiangshan <laijs@cn.fujitsu.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as
> published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +
> +#include "libxl_internal.h"
> +
> +typedef struct libxl__remus_disk
> +{
> +    const struct libxl_device_disk *disk;
> +    const struct libxl__remus_disk_type *type;
> +
> +    /* ao callbacks for setup & teardown script */
> +    int (*setup_cb)(struct libxl__remus_disk *d);
> +    int (*teardown_cb)(struct libxl__remus_disk *d);
> +} libxl__remus_disk;
> +
> +typedef struct libxl__remus_disk_type
> +{
> +    /* checkpointing */
> +    int (*postsuspend)(libxl__remus_disk *d);
> +    int (*preresume)(libxl__remus_disk *d);
> +    int (*commit)(libxl__remus_disk *d);
> +
>

I would also suggest renaming these to something else, not associated with
suspend but associated with checkpoints. start_disk_sync, finish_disk_sync,
start_new_epoch or something along those lines.



> +    /* setup & teardown */
> +    libxl__remus_disk *(*setup)(libxl__gc *gc, libxl_device_disk *disk);
> +    void (*teardown)(libxl__remus_disk *d);

+} libxl__remus_disk_type;
> +
> +
> +/*** drbd implementation ***/
> +const int DRBD_SEND_CHECKPOINT = 20;
> +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
> +typedef struct libxl__remus_drbd_disk
> +{
> +    libxl__remus_disk remus_disk;
> +    int ctl_fd;
> +    int ackwait;
> +} libxl__remus_drbd_disk;
> +
> +static int drbd_postsuspend(libxl__remus_disk *d)
> +{
> +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd,
> remus_disk);
> +
> +    if (!drbd->ackwait) {
> +        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
> +            drbd->ackwait = 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int drbd_preresume(libxl__remus_disk *d)
> +{
> +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd,
> remus_disk);
> +
> +    if (drbd->ackwait) {
> +        ioctl(drbd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
> +        drbd->ackwait = 0;
> +    }
> +
> +    return 0;
> +}
> +
> +static int drbd_commit(libxl__remus_disk *d)
> +{
> +    /* nothing to do, all work are done by DRBD's protocal-D. */
> +    return 0;
> +}
> +
> +static libxl__remus_disk *drbd_setup(libxl__gc *gc, libxl_device_disk
> *disk)
> +{
> +    libxl__remus_drbd_disk *drbd;
> +    //if (!(drbd && protocal-D)) // TODO: need to run script async to
> check
> +    //  return NULL
> +
>

We don't need to run any scripts for DRBD (or tapdisk for that matter).

DRBD scripts will get activated when the domain boots and thats the end of
it.
On the backup side, it gets activated during the initial phase of Remus,
which
is same as live migration.  Since xl already supports live migration with
drbd
based disks, we don't need any script related code at all.

With regard to tapdisk-remus (atleast with blktap2), you cant boot the
domain
fully unless you start Remus too. This in turn forces the backup to start
the
tapdisk-remus receiving end.  Once again in this case, in Xend, the live
migration
infrastructure did all the script setup work.




> +    GCNEW(drbd);
> +
> +    drbd->ctl_fd = open(GCSPRINTF("/dev/drbd/by-res/%s",
> disk->pdev_path), O_RDONLY);
> +    drbd->ackwait = 0;
> +
> +    if (drbd->ctl_fd < 0)
> +        return NULL;
> +
> +    return &drbd->remus_disk;
> +}
> +
> +static void drbd_teardown(libxl__remus_disk *d)
> +{
> +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd,
> remus_disk);
> +
> +    close(drbd->ctl_fd);
> +}
> +
> +static const libxl__remus_disk_type drbd_disk_type = {
> +  .postsuspend = drbd_postsuspend,
> +  .preresume = drbd_preresume,
> +  .commit = drbd_commit,
> +  .setup = drbd_setup,
> +  .teardown = drbd_teardown,
> +};
> +
> +/*** checkpoint disks states and callbacks ***/
> +static const libxl__remus_disk_type *remus_disk_types[] =
> +{
> +    &drbd_disk_type,
> +};
> +
> +int libxl__remus_disks_postsuspend(libxl__remus_state *state)
> +{
> +    int i;
> +    int rc = 0;
> +
> +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
> +        rc = state->disks[i]->type->postsuspend(state->disks[i]);
> +
> +    return rc;
> +}
> +
> +int libxl__remus_disks_preresume(libxl__remus_state *state)
> +{
> +    int i;
> +    int rc = 0;
> +
> +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
> +        rc = state->disks[i]->type->preresume(state->disks[i]);
> +
> +    return rc;
> +}
> +
> +int libxl__remus_disks_commit(libxl__remus_state *state)
> +{
> +    int i;
> +    int rc = 0;
> +
> +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
> +        rc = state->disks[i]->type->commit(state->disks[i]);
> +
> +    return rc;
> +}
> +
> +#if 0
> +/* TODO: implement disk setup/teardown script */
> +static void disk_exec_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
> +                                      const struct timeval *requested_abs)
> +{
> +    libxl__remus_disks_state *state = CONTAINER_OF(ev, *aodev, timeout);
> +    STATE_AO_GC(state->ao);
> +
> +    libxl__ev_time_deregister(gc, &state->timeout);
> +
> +    assert(libxl__ev_child_inuse(&state->child));
> +    if (kill(state->child.pid, SIGKILL)) {
> +    }
> +
> +    return;
> +}
> +
> +int libxl__remus_disks_exec_script(libxl__gc *gc,
> +    libxl__remus_disks_state *state)
> +{
> +}
> +#endif
> +


I don't know if this is needed at all, given that we don't have disk script
setup issues.


>

+int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state
> *dss)
> +{
> +    libxl__remus_state *remus_state = dss->remus_state;
> +    int i, j, nr_disks;
> +    libxl_device_disk *disks;
> +    libxl__remus_disk *remus_disk;
> +    const libxl__remus_disk_type *type;
> +
> +    STATE_AO_GC(dss->ao);
> +    disks = libxl_device_disk_list(CTX, dss->domid, &nr_disks);
> +    remus_state->nr_disks = nr_disks;
> +    GCNEW_ARRAY(remus_state->disks, nr_disks);
> +
> +    for (i = 0; i < nr_disks; i++) {
> +        remus_disk = NULL;
> +        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
> +            type = remus_disk_types[j];
> +            remus_disk = type->setup(gc, &disks[i]);
> +            if (!remus_disk)
> +                break;
> +
> +            remus_state->disks[i] = remus_disk;
> +            remus_disk->disk = &disks[i];
> +            remus_disk->type = type;
> +        }
> +        if (!remus_disk) {
> +            remus_state->nr_disks = i;
> +            libxl__remus_disks_teardown(remus_state);
> +            return -1;
> +        }
> +    }
> +    return 0;
> +}
> +
> +void libxl__remus_disks_teardown(libxl__remus_state *state)
> +{
> +    int i;
> +
> +    for (i = 0; i < state->nr_disks; i++)
> +        state->disks[i]->type->teardown(state->disks[i]);
> +    state->nr_disks = 0;
> +}
> +
> --
> 1.7.1
>
>

[-- Attachment #1.2: Type: text/html, Size: 19927 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-11 18:10       ` Shriram Rajagopalan
@ 2014-03-12  2:35         ` Lai Jiangshan
  2014-03-12  6:23           ` Shriram Rajagopalan
  2014-03-12 10:07           ` Ian Campbell
  2014-03-12 10:06         ` Ian Campbell
  1 sibling, 2 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-03-12  2:35 UTC (permalink / raw)
  To: rshriram
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monne

On 03/12/2014 02:10 AM, Shriram Rajagopalan wrote:
> On Tue, Feb 25, 2014 at 6:53 PM, Lai Jiangshan <laijs@cn.fujitsu.com <mailto:laijs@cn.fujitsu.com>> wrote:
> 
>     This patch implements remus replicated checkpointing disk.
>     It includes two parts:
>       generic remus replicated checkpointing disks framework
>       drbd replicated checkpointing disks
>     They will be split into different files in next round.
> 
>     The patch is still simple due to disk-setup-teardown-script is
>     still under implementing. I need to use libxl_ao to implement it,
>     but libxl_ao is hard to use. The work sequence is needed to ugly split
>     to serveral callbacks like device_hotplug().
> 
>     And becuase the remus disk script is unimplemented, the drbd_setup() code
>     can't check the disk now. So it just assumes the user config the disk correctly.
> 
>     This patch is *UNTESTED*.
>     (there is a problem with xl&drbd(without remus) in my BOXes).
> 
>     I request *comments* as many as possible.
> 
>     Thanks,
>     Lai
> 
>     Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com <mailto:laijs@cn.fujitsu.com>>
> 
> 
> 
> Hi
>  sorry for the delayed response. And thanks a lot for this initiative. Apart from the inline feedback,
> there are a few things to consider first before going down this route. 
> 
> 1. The drbd kernel module required for Remus is still out of tree, currently hosted on a wiki page.
> The drbd folks didnt want to include the changes into their code unfortunately, as they were offering the
> same functionality to one of their paid customers. This is what they told me back in 2011 or so.
> 
> To streamline the storage replication module installation, is there a chance of hosting the code in 
> xen.org <http://xen.org>'s repos? That way, we could script the download and installation process. Like the qemu
> stuff.
> 
> 2. The tapdisk based replication unfortunately is outdated. Please correct me if I have got this wrong.
> Haven't we decided to get rid of blktap2 and go with the qemu disk models? In which case, the tapdisk
> remus code has to be ported into some qemu disk variant.

We are implementing *qemu* replicated checkpointing disk, but we can't make it public even we have done,
we need to delay the publication due to we are paid to implement it by a paid customer.

> 
> Without getting a resolution to the above two, my stance is that we shouldn't pollute xl with functionality
> that requires out-of-band modules that may prove pretty painful to install for the majority of folks out there.

I'm also concern with out-of-band modules, since remus-drbd can't be merged upstream,
It will be valueless to apply remus-drbd replicated checkpointing disk to xl.

What's the status of blktap3 now? (I am asking to xen community)

> 
> Based on the experience from the last 3 years, most average users of Remus tend to skip disk replication 
> altogether.  They install the distro's default drbd, use the disk replication provided with it and then complain
> that Remus crashes or fails.  Some have ventured into tapdisk replication but it unfortunately seemed to 
> get difficult as xend/blktap2 started getting deprecated.
> 
>  
> 
>     ---
>      tools/libxl/Makefile                      |    1 +
>      tools/libxl/libxl_dom.c                   |   19 +++-
>      tools/libxl/libxl_internal.h              |   10 ++
>      tools/libxl/libxl_remus.c                 |    2 +
>      tools/libxl/libxl_remus_replicated_disk.c |  219 +++++++++++++++++++++++++++++
>      5 files changed, 249 insertions(+), 2 deletions(-)
>      create mode 100644 tools/libxl/libxl_remus_replicated_disk.c
> 
>     diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
>     index 218f55e..dbf5dd9 100644
>     --- a/tools/libxl/Makefile
>     +++ b/tools/libxl/Makefile
>     @@ -53,6 +53,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
>      endif
> 
>      LIBXL_OBJS-y += libxl_remus.o
>     +LIBXL_OBJS-y += libxl_remus_replicated_disk.o
> 
> 
> So I think this part will also require some autoconf based stuff.
> Especially, if DRBD & tapdisk are not present, then this whole
> thing gets disabled. Just like the libxl_netbuffer and libxl_nonetbuffer
> 
> Given that both of these (netbuffer and disk) are associated with Remus
> and both are required for Remus to work "correctly", we might as well have
> noremus.c and remus.c . Ofcourse it can be modularized a bit to have
> netbuffer but no disk replication or vice versa. As long as the person installing
> or compiling this stuff is made to state explicitly that he/she does not want
> Remus, but only a subset of its functionality for some other purpose.
>  
> 
>     diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>     index a4ffdfd..858f5be 100644
>     --- a/tools/libxl/libxl_dom.c
>     +++ b/tools/libxl/libxl_dom.c
> 
> 
>  
> 
>          /* The domain was suspended successfully. Start a new network
>     @@ -1279,7 +1284,10 @@ static int libxl__remus_domain_resume_callback(void *data)
>          if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>              return 0;
> 
>     -    /* REMUS TODO: Deal with disk. */
>     +    /* Deal with disk. */
>     +    if (libxl__remus_disks_preresume(dss->remus_state))
>     +        return 0;
>     +
> 
> 
> Bug. This should go before the resume call. Also, I would suggest changing the comment 
> to something more meaningful, e.g., "commit disk changes.."
> 
> 
>          return 1;
>      }
> 
>     @@ -1326,6 +1334,13 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
>              goto out;
>          }
> 
>     +    rc = libxl__remus_disks_commit(remus_state);
>     +    if (rc) {
>     +        LOG(ERROR, "Failed to commit disks state"
>     +            " Terminating Remus..");
>     +        goto out;
>     +    }
>     +
> 
> 
> Now might be a good time to use the restore callbacks offered by the toolstack to 
> get an explicit ack from the backup that it has received the memory checkpoint too
> before the network buffer is released. I think I put in a comment related to that somewhere.
> 
>  
> 
>          if (remus_state->netbuf_state) {
>              rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
>                                                          remus_state);
>     diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>     index 1bd2bba..8933e5f 100644
>     --- a/tools/libxl/libxl_internal.h
>     +++ b/tools/libxl/libxl_internal.h
>     @@ -2309,6 +2309,10 @@ typedef struct libxl__remus_state {
>          void *netbuf_state;
>          libxl__ev_time timeout;
>          libxl__ev_child child;
>     +
>     +    /* remus disks state */
>     +    uint32_t nr_disks;
>     +    struct libxl__remus_disk **disks;
>      } libxl__remus_state;
> 
>      _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>     @@ -2336,6 +2340,12 @@ _hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
>      _hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
>                                                        libxl__remus_state *remus_state);
> 
>     +_hidden int libxl__remus_disks_postsuspend(libxl__remus_state *state);
>     +_hidden int libxl__remus_disks_preresume(libxl__remus_state *state);
>     +_hidden int libxl__remus_disks_commit(libxl__remus_state *state);
>     +_hidden int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss);
>     +_hidden void libxl__remus_disks_teardown(libxl__remus_state *state);
>     +
>      _hidden void libxl__remus_setup_initiate(libxl__egc *egc,
>                                               libxl__domain_suspend_state *dss);
> 
>     diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
>     index cdc1c16..92eb36a 100644
>     --- a/tools/libxl/libxl_remus.c
>     +++ b/tools/libxl/libxl_remus.c
>     @@ -23,6 +23,7 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
>                                       libxl__domain_suspend_state *dss)
>      {
>          libxl__ev_time_init(&dss->remus_state->timeout);
>     +    libxl__remus_disks_setup(egc, dss);
>          if (!dss->remus_state->netbufscript)
>              libxl__remus_setup_done(egc, dss, 0);
>          else
>     @@ -51,6 +52,7 @@ void libxl__remus_teardown_initiate(libxl__egc *egc,
>          /* stash rc somewhere before invoking teardown ops. */
>          dss->remus_state->saved_rc = rc;
> 
>     +    libxl__remus_disks_teardown(dss->remus_state);
>          if (!dss->remus_state->netbuf_state)
>              libxl__remus_teardown_done(egc, dss);
>          else
>     diff --git a/tools/libxl/libxl_remus_replicated_disk.c b/tools/libxl/libxl_remus_replicated_disk.c
>     new file mode 100644
>     index 0000000..4b16403
>     --- /dev/null
>     +++ b/tools/libxl/libxl_remus_replicated_disk.c
>     @@ -0,0 +1,219 @@
>     +/*
>     + * Copyright (C) 2013
>     + * Author Lai Jiangshan <laijs@cn.fujitsu.com <mailto:laijs@cn.fujitsu.com>>
>     + *
>     + * This program is free software; you can redistribute it and/or modify
>     + * it under the terms of the GNU Lesser General Public License as published
>     + * by the Free Software Foundation; version 2.1 only. with the special
>     + * exception on linking described in file LICENSE.
>     + *
>     + * This program is distributed in the hope that it will be useful,
>     + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>     + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>     + * GNU Lesser General Public License for more details.
>     + */
>     +
>     +#include "libxl_osdeps.h" /* must come before any other headers */
>     +
>     +#include "libxl_internal.h"
>     +
>     +typedef struct libxl__remus_disk
>     +{
>     +    const struct libxl_device_disk *disk;
>     +    const struct libxl__remus_disk_type *type;
>     +
>     +    /* ao callbacks for setup & teardown script */
>     +    int (*setup_cb)(struct libxl__remus_disk *d);
>     +    int (*teardown_cb)(struct libxl__remus_disk *d);
>     +} libxl__remus_disk;
>     +
>     +typedef struct libxl__remus_disk_type
>     +{
>     +    /* checkpointing */
>     +    int (*postsuspend)(libxl__remus_disk *d);
>     +    int (*preresume)(libxl__remus_disk *d);
>     +    int (*commit)(libxl__remus_disk *d);
>     +
> 
> 
> I would also suggest renaming these to something else, not associated with
> suspend but associated with checkpoints. start_disk_sync, finish_disk_sync,
> start_new_epoch or something along those lines.
> 
>  
> 
>     +    /* setup & teardown */
>     +    libxl__remus_disk *(*setup)(libxl__gc *gc, libxl_device_disk *disk);
>     +    void (*teardown)(libxl__remus_disk *d); 
> 
>     +} libxl__remus_disk_type;
>     +
>     +
>     +/*** drbd implementation ***/
>     +const int DRBD_SEND_CHECKPOINT = 20;
>     +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
>     +typedef struct libxl__remus_drbd_disk
>     +{
>     +    libxl__remus_disk remus_disk;
>     +    int ctl_fd;
>     +    int ackwait;
>     +} libxl__remus_drbd_disk;
>     +
>     +static int drbd_postsuspend(libxl__remus_disk *d)
>     +{
>     +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
>     +
>     +    if (!drbd->ackwait) {
>     +        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
>     +            drbd->ackwait = 1;
>     +    }
>     +
>     +    return 0;
>     +}
>     +
>     +static int drbd_preresume(libxl__remus_disk *d)
>     +{
>     +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
>     +
>     +    if (drbd->ackwait) {
>     +        ioctl(drbd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
>     +        drbd->ackwait = 0;
>     +    }
>     +
>     +    return 0;
>     +}
>     +
>     +static int drbd_commit(libxl__remus_disk *d)
>     +{
>     +    /* nothing to do, all work are done by DRBD's protocal-D. */
>     +    return 0;
>     +}
>     +
>     +static libxl__remus_disk *drbd_setup(libxl__gc *gc, libxl_device_disk *disk)
>     +{
>     +    libxl__remus_drbd_disk *drbd;
>     +    //if (!(drbd && protocal-D)) // TODO: need to run script async to check
>     +    //  return NULL
>     +
> 
> 
> We don't need to run any scripts for DRBD (or tapdisk for that matter).

It is hard to check the status/configuration of DRBD-disk in C code,
it requires drbd C-header files(with remus supported).

I find no way to do it. So we try to use scripts.

Thank you and thanks to your elaborate reply.
Lai

> 
> DRBD scripts will get activated when the domain boots and thats the end of it.
> On the backup side, it gets activated during the initial phase of Remus, which
> is same as live migration.  Since xl already supports live migration with drbd
> based disks, we don't need any script related code at all.
> 
> With regard to tapdisk-remus (atleast with blktap2), you cant boot the domain
> fully unless you start Remus too. This in turn forces the backup to start the 
> tapdisk-remus receiving end.  Once again in this case, in Xend, the live migration
> infrastructure did all the script setup work.
> 
> 
>  
> 
>     +    GCNEW(drbd);
>     +
>     +    drbd->ctl_fd = open(GCSPRINTF("/dev/drbd/by-res/%s", disk->pdev_path), O_RDONLY);
>     +    drbd->ackwait = 0;
>     +
>     +    if (drbd->ctl_fd < 0)
>     +        return NULL;
>     +
>     +    return &drbd->remus_disk;
>     +}
>     +
>     +static void drbd_teardown(libxl__remus_disk *d)
>     +{
>     +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, remus_disk);
>     +
>     +    close(drbd->ctl_fd);
>     +}
>     +
>     +static const libxl__remus_disk_type drbd_disk_type = {
>     +  .postsuspend = drbd_postsuspend,
>     +  .preresume = drbd_preresume,
>     +  .commit = drbd_commit,
>     +  .setup = drbd_setup,
>     +  .teardown = drbd_teardown,
>     +};
>     +
>     +/*** checkpoint disks states and callbacks ***/
>     +static const libxl__remus_disk_type *remus_disk_types[] =
>     +{
>     +    &drbd_disk_type,
>     +};
>     +
>     +int libxl__remus_disks_postsuspend(libxl__remus_state *state)
>     +{
>     +    int i;
>     +    int rc = 0;
>     +
>     +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
>     +        rc = state->disks[i]->type->postsuspend(state->disks[i]);
>     +
>     +    return rc;
>     +}
>     +
>     +int libxl__remus_disks_preresume(libxl__remus_state *state)
>     +{
>     +    int i;
>     +    int rc = 0;
>     +
>     +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
>     +        rc = state->disks[i]->type->preresume(state->disks[i]);
>     +
>     +    return rc;
>     +}
>     +
>     +int libxl__remus_disks_commit(libxl__remus_state *state)
>     +{
>     +    int i;
>     +    int rc = 0;
>     +
>     +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
>     +        rc = state->disks[i]->type->commit(state->disks[i]);
>     +
>     +    return rc;
>     +}
>     +
>     +#if 0
>     +/* TODO: implement disk setup/teardown script */
>     +static void disk_exec_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
>     +                                      const struct timeval *requested_abs)
>     +{
>     +    libxl__remus_disks_state *state = CONTAINER_OF(ev, *aodev, timeout);
>     +    STATE_AO_GC(state->ao);
>     +
>     +    libxl__ev_time_deregister(gc, &state->timeout);
>     +
>     +    assert(libxl__ev_child_inuse(&state->child));
>     +    if (kill(state->child.pid, SIGKILL)) {
>     +    }
>     +
>     +    return;
>     +}
>     +
>     +int libxl__remus_disks_exec_script(libxl__gc *gc,
>     +    libxl__remus_disks_state *state)
>     +{
>     +}
>     +#endif
>     +
> 
> 
> I don't know if this is needed at all, given that we don't have disk script setup issues.
>  
> 
>      
> 
>     +int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state *dss)
>     +{
>     +    libxl__remus_state *remus_state = dss->remus_state;
>     +    int i, j, nr_disks;
>     +    libxl_device_disk *disks;
>     +    libxl__remus_disk *remus_disk;
>     +    const libxl__remus_disk_type *type;
>     +
>     +    STATE_AO_GC(dss->ao);
>     +    disks = libxl_device_disk_list(CTX, dss->domid, &nr_disks);
>     +    remus_state->nr_disks = nr_disks;
>     +    GCNEW_ARRAY(remus_state->disks, nr_disks);
>     +
>     +    for (i = 0; i < nr_disks; i++) {
>     +        remus_disk = NULL;
>     +        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
>     +            type = remus_disk_types[j];
>     +            remus_disk = type->setup(gc, &disks[i]);
>     +            if (!remus_disk)
>     +                break;
>     +
>     +            remus_state->disks[i] = remus_disk;
>     +            remus_disk->disk = &disks[i];
>     +            remus_disk->type = type;
>     +        }
>     +        if (!remus_disk) {
>     +            remus_state->nr_disks = i;
>     +            libxl__remus_disks_teardown(remus_state);
>     +            return -1;
>     +        }
>     +    }
>     +    return 0;
>     +}
>     +
>     +void libxl__remus_disks_teardown(libxl__remus_state *state)
>     +{
>     +    int i;
>     +
>     +    for (i = 0; i < state->nr_disks; i++)
>     +        state->disks[i]->type->teardown(state->disks[i]);
>     +    state->nr_disks = 0;
>     +}
>     +
>     --
>     1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-12  2:35         ` Lai Jiangshan
@ 2014-03-12  6:23           ` Shriram Rajagopalan
  2014-03-12 10:07           ` Ian Campbell
  1 sibling, 0 replies; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-03-12  6:23 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 790 bytes --]

>
> >     +
> >     +static libxl__remus_disk *drbd_setup(libxl__gc *gc,
> libxl_device_disk *disk)
> >     +{
> >     +    libxl__remus_drbd_disk *drbd;
> >     +    //if (!(drbd && protocal-D)) // TODO: need to run script async
> to check
> >     +    //  return NULL
> >     +
> >
> >
> > We don't need to run any scripts for DRBD (or tapdisk for that matter).
>
> It is hard to check the status/configuration of DRBD-disk in C code,
> it requires drbd C-header files(with remus supported).
>
> I find no way to do it. So we try to use scripts.
>
>

Okay. So these are not setup scripts. But merely sanity check scripts to
ensure
the proper version of kernel module and userspace tools for DRBD are
installed.
In which case, yes, a script makes more sense and its much easier.


shriram

[-- Attachment #1.2: Type: text/html, Size: 1207 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-11 18:10       ` Shriram Rajagopalan
  2014-03-12  2:35         ` Lai Jiangshan
@ 2014-03-12 10:06         ` Ian Campbell
  2014-03-12 12:21           ` Lai Jiangshan
  1 sibling, 1 reply; 89+ messages in thread
From: Ian Campbell @ 2014-03-12 10:06 UTC (permalink / raw)
  To: rshriram
  Cc: Lai Jiangshan, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monne

On Tue, 2014-03-11 at 11:10 -0700, Shriram Rajagopalan wrote:
> On Tue, Feb 25, 2014 at 6:53 PM, Lai Jiangshan <laijs@cn.fujitsu.com>
> wrote:
> 
>         This patch implements remus replicated checkpointing disk.
>         It includes two parts:
>           generic remus replicated checkpointing disks framework
>           drbd replicated checkpointing disks
>         They will be split into different files in next round.
>         
>         The patch is still simple due to disk-setup-teardown-script is
>         still under implementing. I need to use libxl_ao to implement
>         it,
>         but libxl_ao is hard to use. The work sequence is needed to
>         ugly split
>         to serveral callbacks like device_hotplug().
>         
>         And becuase the remus disk script is unimplemented, the
>         drbd_setup() code
>         can't check the disk now. So it just assumes the user config
>         the disk correctly.
>         
>         This patch is *UNTESTED*.
>         (there is a problem with xl&drbd(without remus) in my BOXes).
>         
>         I request *comments* as many as possible.
>         
>         Thanks,
>         Lai
>         
>         Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> 
> 
> 
> Hi
>  sorry for the delayed response. And thanks a lot for this initiative.
> Apart from the inline feedback,
> there are a few things to consider first before going down this
> route. 
> 
> 
> 1. The drbd kernel module required for Remus is still out of tree,
> currently hosted on a wiki page.
> The drbd folks didnt want to include the changes into their code
> unfortunately, as they were offering the
> same functionality to one of their paid customers. This is what they
> told me back in 2011 or so.

That's rather sad.

Is there a more community contribution friendly project which provides
similar functionality? A community drdb fork perhaps?

I don't know how invasive the changes are, but one approach might be to
ask various distro package maintainers if they would be willing to carry
a patch which you maintain out of the main drdb tree. You'd only need a
few of the big ones to say yes for this to be worthwhile.

> To streamline the storage replication module installation, is there a
> chance of hosting the code in 
> xen.org's repos? That way, we could script the download and
> installation process. Like the qemu
> stuff.

I'm very reluctant to add more downloading to the Xen build system, but
that doesn't rule out hosting something on xenbits. There are also
things like gitorious and other hosting services.

> 2. The tapdisk based replication unfortunately is outdated. Please
> correct me if I have got this wrong.
> Haven't we decided to get rid of blktap2 and go with the qemu disk
> models?

"decided" in so much as noone is interesting in maintaining blktap2.
qemu is where people are willing to invest the effort so that is where
things are heading.

>  In which case, the tapdisk
> remus code has to be ported into some qemu disk variant.

Right. I think qemu has some amount of snapshot stuff, but how close it
is to what remus wants I don't know.

> Without getting a resolution to the above two, my stance is that we
> shouldn't pollute xl with functionality
> that requires out-of-band modules that may prove pretty painful to
> install for the majority of folks out there.

This sounds reasonable.

> Based on the experience from the last 3 years, most average users of
> Remus tend to skip disk replication 
> altogether.  They install the distro's default drbd, use the disk
> replication provided with it and then complain
> that Remus crashes or fails.

Remus should probably complain more stridently about the lack of disk
replication and require a --i-know-my-data-is-at-risk type flag.

Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-12  2:35         ` Lai Jiangshan
  2014-03-12  6:23           ` Shriram Rajagopalan
@ 2014-03-12 10:07           ` Ian Campbell
  2014-03-12 11:57             ` Lai Jiangshan
  1 sibling, 1 reply; 89+ messages in thread
From: Ian Campbell @ 2014-03-12 10:07 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Dong Eddie, FNST-Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, rshriram, Roger Pau Monne

On Wed, 2014-03-12 at 10:35 +0800, Lai Jiangshan wrote:
> > 2. The tapdisk based replication unfortunately is outdated. Please correct me if I have got this wrong.
> > Haven't we decided to get rid of blktap2 and go with the qemu disk models? In which case, the tapdisk
> > remus code has to be ported into some qemu disk variant.
> 
> We are implementing *qemu* replicated checkpointing disk, but we can't make it public even we have done,
> we need to delay the publication due to we are paid to implement it by a paid customer.

Are you saying you are never going to be able to make this code public?
Or just that it will be delayed by some months?

> 
> > 
> > Without getting a resolution to the above two, my stance is that we shouldn't pollute xl with functionality
> > that requires out-of-band modules that may prove pretty painful to install for the majority of folks out there.
> 
> I'm also concern with out-of-band modules, since remus-drbd can't be merged upstream,
> It will be valueless to apply remus-drbd replicated checkpointing disk to xl.
> 
> What's the status of blktap3 now? (I am asking to xen community)

AFAIK the XenServer are not continuing down the blktap3 path and are
instead planning to switch to qemu qdisk as a backend.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-12 10:07           ` Ian Campbell
@ 2014-03-12 11:57             ` Lai Jiangshan
  2014-03-12 12:17               ` Ian Campbell
  0 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-03-12 11:57 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Dong Eddie, FNST-Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, rshriram, Roger Pau Monne

On 03/12/2014 06:07 PM, Ian Campbell wrote:
> On Wed, 2014-03-12 at 10:35 +0800, Lai Jiangshan wrote:
>>> 2. The tapdisk based replication unfortunately is outdated. Please correct me if I have got this wrong.
>>> Haven't we decided to get rid of blktap2 and go with the qemu disk models? In which case, the tapdisk
>>> remus code has to be ported into some qemu disk variant.
>>
>> We are implementing *qemu* replicated checkpointing disk, but we can't make it public even we have done,
>> we need to delay the publication due to we are paid to implement it by a paid customer.
> 
> Are you saying you are never going to be able to make this code public?
> Or just that it will be delayed by some months?

It will be just delayed, but it will be public finally.

This private code is just under implementing, it is far from mature.
I hope the community also makes efforts to it.

see also to: http://wiki.qemu.org/Features/MicroCheckpointing


> 
>>
>>>
>>> Without getting a resolution to the above two, my stance is that we shouldn't pollute xl with functionality
>>> that requires out-of-band modules that may prove pretty painful to install for the majority of folks out there.
>>
>> I'm also concern with out-of-band modules, since remus-drbd can't be merged upstream,
>> It will be valueless to apply remus-drbd replicated checkpointing disk to xl.
>>
>> What's the status of blktap3 now? (I am asking to xen community)
> 
> AFAIK the XenServer are not continuing down the blktap3 path and are
> instead planning to switch to qemu qdisk as a backend.
> 
> 
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-12 11:57             ` Lai Jiangshan
@ 2014-03-12 12:17               ` Ian Campbell
  2014-03-12 12:28                 ` Lai Jiangshan
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Campbell @ 2014-03-12 12:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Dong Eddie, FNST-Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, rshriram, Roger Pau Monne

On Wed, 2014-03-12 at 19:57 +0800, Lai Jiangshan wrote:
> On 03/12/2014 06:07 PM, Ian Campbell wrote:
> > On Wed, 2014-03-12 at 10:35 +0800, Lai Jiangshan wrote:
> >>> 2. The tapdisk based replication unfortunately is outdated. Please correct me if I have got this wrong.
> >>> Haven't we decided to get rid of blktap2 and go with the qemu disk models? In which case, the tapdisk
> >>> remus code has to be ported into some qemu disk variant.
> >>
> >> We are implementing *qemu* replicated checkpointing disk, but we can't make it public even we have done,
> >> we need to delay the publication due to we are paid to implement it by a paid customer.
> > 
> > Are you saying you are never going to be able to make this code public?
> > Or just that it will be delayed by some months?
> 
> It will be just delayed, but it will be public finally.
> 
> This private code is just under implementing, it is far from mature.
> I hope the community also makes efforts to it.

Are you asking that people work on this feature in parallel with you
building the same thing privately? That doesn't seem likely.

Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-12 10:06         ` Ian Campbell
@ 2014-03-12 12:21           ` Lai Jiangshan
  0 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-03-12 12:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Dong Eddie, FNST-Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, rshriram, Roger Pau Monne

On 03/12/2014 06:06 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 11:10 -0700, Shriram Rajagopalan wrote:
>> On Tue, Feb 25, 2014 at 6:53 PM, Lai Jiangshan <laijs@cn.fujitsu.com>
>> wrote:
>>
>>         This patch implements remus replicated checkpointing disk.
>>         It includes two parts:
>>           generic remus replicated checkpointing disks framework
>>           drbd replicated checkpointing disks
>>         They will be split into different files in next round.
>>         
>>         The patch is still simple due to disk-setup-teardown-script is
>>         still under implementing. I need to use libxl_ao to implement
>>         it,
>>         but libxl_ao is hard to use. The work sequence is needed to
>>         ugly split
>>         to serveral callbacks like device_hotplug().
>>         
>>         And becuase the remus disk script is unimplemented, the
>>         drbd_setup() code
>>         can't check the disk now. So it just assumes the user config
>>         the disk correctly.
>>         
>>         This patch is *UNTESTED*.
>>         (there is a problem with xl&drbd(without remus) in my BOXes).
>>         
>>         I request *comments* as many as possible.
>>         
>>         Thanks,
>>         Lai
>>         
>>         Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>>
>>
>>
>> Hi
>>  sorry for the delayed response. And thanks a lot for this initiative.
>> Apart from the inline feedback,
>> there are a few things to consider first before going down this
>> route. 
>>
>>
>> 1. The drbd kernel module required for Remus is still out of tree,
>> currently hosted on a wiki page.
>> The drbd folks didnt want to include the changes into their code
>> unfortunately, as they were offering the
>> same functionality to one of their paid customers. This is what they
>> told me back in 2011 or so.
> 
> That's rather sad.
> 
> Is there a more community contribution friendly project which provides
> similar functionality? A community drdb fork perhaps?
> 
> I don't know how invasive the changes are, but one approach might be to
> ask various distro package maintainers if they would be willing to carry
> a patch which you maintain out of the main drdb tree. You'd only need a
> few of the big ones to say yes for this to be worthwhile.
> 
>> To streamline the storage replication module installation, is there a
>> chance of hosting the code in 
>> xen.org's repos? That way, we could script the download and
>> installation process. Like the qemu
>> stuff.
> 
> I'm very reluctant to add more downloading to the Xen build system, but
> that doesn't rule out hosting something on xenbits. There are also
> things like gitorious and other hosting services.
> 
>> 2. The tapdisk based replication unfortunately is outdated. Please
>> correct me if I have got this wrong.
>> Haven't we decided to get rid of blktap2 and go with the qemu disk
>> models?
> 
> "decided" in so much as noone is interesting in maintaining blktap2.
> qemu is where people are willing to invest the effort so that is where
> things are heading.
> 
>>  In which case, the tapdisk
>> remus code has to be ported into some qemu disk variant.
> 
> Right. I think qemu has some amount of snapshot stuff, but how close it
> is to what remus wants I don't know.
> 
>> Without getting a resolution to the above two, my stance is that we
>> shouldn't pollute xl with functionality
>> that requires out-of-band modules that may prove pretty painful to
>> install for the majority of folks out there.
> 
> This sounds reasonable.

But we shouldn't stop porting remus from xm to xl.
we can mark xl-remus as experimental before a good
remus-replicated-checkpointing-disk is ready(qemu qdisk).

I will keep doing it(with drbd for disks).

Thanks,
Lai

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC] remus: implement remus replicated checkpointing disk
  2014-03-12 12:17               ` Ian Campbell
@ 2014-03-12 12:28                 ` Lai Jiangshan
  0 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-03-12 12:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Dong Eddie, FNST-Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, rshriram, Roger Pau Monne

On 03/12/2014 08:17 PM, Ian Campbell wrote:
> On Wed, 2014-03-12 at 19:57 +0800, Lai Jiangshan wrote:
>> On 03/12/2014 06:07 PM, Ian Campbell wrote:
>>> On Wed, 2014-03-12 at 10:35 +0800, Lai Jiangshan wrote:
>>>>> 2. The tapdisk based replication unfortunately is outdated. Please correct me if I have got this wrong.
>>>>> Haven't we decided to get rid of blktap2 and go with the qemu disk models? In which case, the tapdisk
>>>>> remus code has to be ported into some qemu disk variant.
>>>>
>>>> We are implementing *qemu* replicated checkpointing disk, but we can't make it public even we have done,
>>>> we need to delay the publication due to we are paid to implement it by a paid customer.
>>>
>>> Are you saying you are never going to be able to make this code public?
>>> Or just that it will be delayed by some months?
>>
>> It will be just delayed, but it will be public finally.
>>
>> This private code is just under implementing, it is far from mature.
>> I hope the community also makes efforts to it.
> 
> Are you asking that people work on this feature in parallel with you
> building the same thing privately? That doesn't seem likely.
> 

It may or may not be the same. our private implementing is too slow,
If the community do it so, we will definitely join in.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH V8 0/8] Remus/Libxl: Network buffering support
@ 2014-04-02 11:04 Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 1/8] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
                   ` (7 more replies)
  0 siblings, 8 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

This patch series adds support for network buffering in the Remus
codebase in libxl. 

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks


Shriram Rajagopalan (8):
  remus: add libnl3 dependency to autoconf scripts
  libxl: rename remus_failover_cb() to remus_replication_failure_cb()
  libxl: network buffering cmdline switch
  remus: introduce a function to check whether network buffering is
    enabled
  remus: Remus network buffering core and APIs to setup/teardown
  remus: implement the API to buffer/release packages
  libxl: use the API to setup/teardown network buffering
  libxl: control network buffering in remus callbacks

 README                                 |   4 +
 config/Tools.mk.in                     |   3 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  11 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  15 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 184 +++++++++++
 tools/libxl/Makefile                   |  11 +
 tools/libxl/libxl.c                    |  44 ++-
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_dom.c                |  92 +++++-
 tools/libxl/libxl_internal.h           |  48 +++
 tools/libxl/libxl_netbuffer.c          | 570 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  56 ++++
 tools/libxl/libxl_remus.c              |  63 ++++
 tools/libxl/libxl_types.idl            |   2 +
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  28 +-
 tools/libxl/xl_cmdtable.c              |   3 +
 tools/remus/README                     |   6 +
 22 files changed, 1142 insertions(+), 27 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH V8 1/8] remus: add libnl3 dependency to autoconf scripts
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 2/8] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 README               |  4 ++++
 config/Tools.mk.in   |  3 +++
 tools/configure.ac   | 15 +++++++++++++++
 tools/libxl/Makefile |  2 ++
 tools/remus/README   |  6 ++++++
 5 files changed, 30 insertions(+)

diff --git a/README b/README
index 9bbe734..e770932 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
     * cmake (if building vtpm stub domains)
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index d9d3239..81802b3 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -38,6 +38,8 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -56,6 +58,7 @@ CONFIG_QEMU_TRAD    := @qemu_traditional@
 CONFIG_QEMU_XEN     := @qemu_xen@
 CONFIG_XEND         := @xend@
 CONFIG_BLKTAP1      := @blktap1@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff --git a/tools/configure.ac b/tools/configure.ac
index a62faf8..c03c2d2 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -243,6 +243,21 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+		[libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+	    AC_MSG_WARN([Disabling support for Remus network buffering.
+	    Please install libnl3 libraries, command line tools and devel
+	    headers - version 3.2.8 or higher])
+	    AC_SUBST(remus_netbuf, [n])
+	    ],[
+	    AC_SUBST(LIBNL3_LIBS)
+	    AC_SUBST(LIBNL3_CFLAGS)
+	    AC_SUBST(remus_netbuf, [y])
+])
+
 AC_OUTPUT()
 
 AS_IF([test "x$xend" = "xy" ], [
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 755b666..3647a2a 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,13 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+LIBXL_LIBS += $(LIBNL3_LIBS)
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
diff --git a/tools/remus/README b/tools/remus/README
index 9e8140b..4736252 100644
--- a/tools/remus/README
+++ b/tools/remus/README
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://nss.cs.ubc.ca/remus/ for details.
+
+Using Remus with libxl on Xen 4.4 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 2/8] remus: introduce a function to check whether network buffering is enabled
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 1/8] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 3/8] remus: Remus network buffering core and APIs to setup/teardown Yang Hongyang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
or returns 0 when network buffering is not compiled.

If network buffering is not compiled, and the user wants to use it, report
a error and exit.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/Makefile            |  7 +++++++
 tools/libxl/libxl_internal.h    |  2 ++
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 3647a2a..a29c505 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -45,6 +45,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b3a200d..79c536d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2432,6 +2432,8 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..8e23d75
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..6aa4bf1
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 3/8] remus: Remus network buffering core and APIs to setup/teardown
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 1/8] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 2/8] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
  2014-04-02 11:04 ` [PATCH V8 4/8] remus: implement the API to buffer/release packages Yang Hongyang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

1.Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
       and tear down the guest's interface.
2.introduce a new structure libxl__remus_state to save the remus state
3.introduces remus-netbuf-setup hotplug script responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering in Remus.  This script is intended to be invoked
  by libxl for each guest interface, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

  The following steps are taken during setup:
    a) call the hotplug script for each vif to setup its network buffer

    b) establish a dedicated remus context containing libnl related
       state (netlink sockets, qdisc caches, etc.,)

    c) Obtain handles to plug qdiscs installed on the IFB devices
       chosen by the hotplug scripts.

  And during teardown, the netlink resources are released, followed by
  invocation of hotplug scripts to remove the ifb devices.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 184 +++++++++++++
 tools/libxl/Makefile                   |   2 +
 tools/libxl/libxl.c                    |  26 +-
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_dom.c                |   4 +-
 tools/libxl/libxl_internal.h           |  32 +++
 tools/libxl/libxl_netbuffer.c          | 487 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  11 +
 tools/libxl/libxl_remus.c              |  41 +++
 tools/libxl/libxl_types.idl            |   2 +
 12 files changed, 803 insertions(+), 4 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_remus.c

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 70ab7f4..039eaea 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 47655f6..6139c1f 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += network-nat vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..1c13185
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,184 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB         ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        local installed=`nl-qdisc-list -d $ifb`
+        [ -n "$installed" ] && continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$IFB"
+        add_plug_qdisc "$vifname" "$IFB"
+        release_lock "pickifb"
+
+        #not using xenstore_write that automatically exits on error
+        #because we need to cleanup
+        _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+        success
+        ;;
+    teardown)
+        : ${IFB:?}
+        teardown_netbuf "$vifname" "$IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index a29c505..670a2bc 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -52,6 +52,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 30b0b06..f7696a3 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -741,7 +741,31 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    GCNEW(dss->remus_state);
+
+    /* convenience shorthand */
+    libxl__remus_state *remus_state = dss->remus_state;
+    remus_state->dss = dss;
+    libxl__ev_child_init(&remus_state->child);
+
+    /* TBD: enable disk buffering */
+
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
+        if (info->netbufscript) {
+            remus_state->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            remus_state->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                libxl__xen_script_dir_path());
+        }
+    }
 
     /* Point of no return */
     libxl__domain_suspend(egc, dss);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index b2c3015..62f7dd4 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -410,6 +410,19 @@
 #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
 
 /*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
+/*
  * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
  *
  * If this is defined:
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 36e70b5..20aaec8 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -753,8 +753,6 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 
 /*==================== Domain suspend (save) ====================*/
 
-static void domain_suspend_done(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok);
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
@@ -1703,7 +1701,7 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
-static void domain_suspend_done(libxl__egc *egc,
+void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
     STATE_AO_GC(dss->ao);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 79c536d..a6e5cff 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2432,8 +2432,39 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+typedef struct libxl__remus_state {
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
+    libxl__domain_suspend_state *dss;
+
+    /* private */
+    int saved_rc;
+    int dev_id;
+    /* Opaque context containing network buffer related stuff */
+    void *netbuf_state;
+    libxl__ev_time timeout;
+    libxl__ev_child child;
+} libxl__remus_state;
+
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
+_hidden void domain_suspend_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss,
+                                 int rc);
+
+_hidden void libxl__remus_setup_done(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     int rc);
+
+_hidden void libxl__remus_netbuf_setup(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_teardown_done(libxl__egc *egc,
+                                        libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                          libxl__domain_suspend_state *dss);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2445,6 +2476,7 @@ struct libxl__domain_suspend_state {
     int live;
     int debug;
     const libxl_domain_remus_info *remus;
+    libxl__remus_state *remus_state;
     /* private */
     libxl__ev_evtchn guest_evtchn;
     int guest_evtchn_lockfd;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 8e23d75..865cbb2 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,498 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_state {
+    struct rtnl_qdisc **netbuf_qdisc_list;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+    const char **vif_list;
+    const char **ifb_list;
+    uint32_t num_netbufs;
+    uint32_t unused;
+} libxl__remus_netbuf_state;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static const char *get_vifname(libxl__gc *gc, uint32_t domid,
+                               libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        /* use the default name */
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static const char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
+                                       int *num_vifs)
+{
+    libxl_device_nic *nics = NULL;
+    int nb, i = 0;
+    const char **vif_list = NULL;
+
+    *num_vifs = 0;
+    nics = libxl_device_nic_list(CTX, domid, &nb);
+    if (!nics)
+        goto out;
+
+    /* Ensure that none of the vifs are backed by driver domains */
+    for (i = 0; i < nb; i++) {
+        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
+            const char *vifname = get_vifname(gc, domid, &nics[i]);
+
+            if (!vifname)
+                vifname = "(unknown)";
+            LOG(ERROR, "vif %s has driver domain (%u) as its backend. "
+                "Network buffering is not supported with driver domains",
+                vifname, nics[i].backend_domid);
+            *num_vifs = -1;
+            goto out;
+        }
+    }
+
+    GCNEW_ARRAY(vif_list, nb);
+    for (i = 0; i < nb; ++i) {
+        vif_list[i] = get_vifname(gc, domid, &nics[i]);
+        if (!vif_list[i]) {
+            vif_list = NULL;
+            goto out;
+        }
+    }
+    *num_vifs = nb;
+
+ out:
+    if (nics) {
+        for (i = 0; i < nb; i++)
+            libxl_device_nic_dispose(&nics[i]);
+
+        free(nics);
+    }
+    return vif_list;
+}
+
+static void free_qdiscs(libxl__remus_netbuf_state *netbuf_state)
+{
+    int i;
+    struct rtnl_qdisc **qdisc = NULL;
+
+    /* free qdiscs */
+    for (i = 0; i < netbuf_state->num_netbufs; i++) {
+        if (!netbuf_state->netbuf_qdisc_list)
+            break;
+
+        qdisc = &netbuf_state->netbuf_qdisc_list[i];
+        if (*qdisc == NULL)
+            break;
+
+        nl_object_put((struct nl_object *)(*qdisc));
+        *qdisc = NULL;
+    }
+
+    /* free qdisc cache */
+    if (netbuf_state->qdisc_cache) {
+        nl_cache_clear(netbuf_state->qdisc_cache);
+        nl_cache_free(netbuf_state->qdisc_cache);
+        netbuf_state->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (netbuf_state->nlsock) {
+        nl_close(netbuf_state->nlsock);
+        nl_socket_free(netbuf_state->nlsock);
+        netbuf_state->nlsock = NULL;
+    }
+}
+
+static int init_qdiscs(libxl__gc *gc,
+                       libxl__remus_state *remus_state)
+{
+    int i, ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const int num_netbufs = netbuf_state->num_netbufs;
+    const char **const ifb_list = netbuf_state->ifb_list;
+
+    /* Now that we have brought up IFB devices with plug qdisc for
+     * each vif, lets get a netlink handle on the plug qdisc for use
+     * during checkpointing.
+     */
+    netbuf_state->nlsock = nl_socket_alloc();
+    if (!netbuf_state->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        goto out;
+    }
+
+    ret = nl_connect(netbuf_state->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(netbuf_state->nlsock,
+                                 &netbuf_state->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        goto out;
+    }
+
+    /* list of handles to plug qdiscs */
+    GCNEW_ARRAY(netbuf_state->netbuf_qdisc_list, num_netbufs);
+
+    for (i = 0; i < num_netbufs; ++i) {
+
+        /* get a handle to the IFB interface */
+        ifb = NULL;
+        ret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+                                   ifb_list[i], &ifb);
+        if (ret) {
+            LOG(ERROR, "cannot obtain handle for %s: %s", ifb_list[i],
+                nl_geterror(ret));
+            goto out;
+        }
+
+        ifindex = rtnl_link_get_ifindex(ifb);
+        if (!ifindex) {
+            LOG(ERROR, "interface %s has no index", ifb_list[i]);
+            goto out;
+        }
+
+        /* Get a reference to the root qdisc installed on the IFB, by
+         * querying the qdisc list we obtained earlier. The netbufscript
+         * sets up the plug qdisc as the root qdisc, so we don't have to
+         * search the entire qdisc tree on the IFB dev.
+
+         * There is no need to explicitly free this qdisc as its just a
+         * reference from the qdisc cache we allocated earlier.
+         */
+        qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+                                         TC_H_ROOT);
+
+        if (qdisc) {
+            const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+            /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+            if (!tc_kind || strcmp(tc_kind, "plug")) {
+                nl_object_put((struct nl_object *)qdisc);
+                LOG(ERROR, "plug qdisc is not installed on %s", ifb_list[i]);
+                goto out;
+            }
+            netbuf_state->netbuf_qdisc_list[i] = qdisc;
+        } else {
+            LOG(ERROR, "Cannot get qdisc handle from ifb %s", ifb_list[i]);
+            goto out;
+        }
+        rtnl_link_put(ifb);
+    }
+
+    return 0;
+
+ out:
+    if (ifb)
+        rtnl_link_put(ifb);
+    free_qdiscs(netbuf_state);
+    return ERROR_FAIL;
+}
+
+static void netbuf_setup_timeout_cb(libxl__egc *egc,
+                                    libxl__ev_time *ev,
+                                    const struct timeval *requested_abs)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(ev, *remus_state, timeout);
+
+    /* Convenience aliases */
+    const int devid = remus_state->dev_id;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const char *const vif = netbuf_state->vif_list[devid];
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
+    assert(libxl__ev_child_inuse(&remus_state->child));
+
+    LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout",
+        remus_state->netbufscript, vif);
+
+    if (kill(remus_state->child.pid, SIGKILL)) {
+        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
+              remus_state->netbufscript,
+              (unsigned long)remus_state->child.pid);
+    }
+
+    return;
+}
+
+/* the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static int exec_netbuf_script(libxl__gc *gc, libxl__remus_state *remus_state,
+                              char *op, libxl__ev_child_callback *death)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    pid_t pid;
+
+    /* Convenience aliases */
+    libxl__ev_child *const child = &remus_state->child;
+    libxl__ev_time *const timeout = &remus_state->timeout;
+    char *const script = libxl__strdup(gc, remus_state->netbufscript);
+    const uint32_t domid = remus_state->dss->domid;
+    const int devid = remus_state->dev_id;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const char *const vif = netbuf_state->vif_list[devid];
+    const char *const ifb = netbuf_state->ifb_list[devid];
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), devid);
+    if (!strcmp(op, "teardown")) {
+        env[nr++] = "IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    /* Set hotplug timeout */
+    if (libxl__ev_time_register_rel(gc, timeout,
+                                    netbuf_setup_timeout_cb,
+                                    LIBXL_HOTPLUG_TIMEOUT * 1000)) {
+        LOG(ERROR, "unable to register timeout for "
+            "netbuf setup script %s on vif %s", script, vif);
+        return ERROR_FAIL;
+    }
+
+    LOG(DEBUG, "Calling netbuf script: %s %s on vif %s",
+        script, op, vif);
+
+    /* Fork and exec netbuf script */
+    pid = libxl__ev_child_fork(gc, child, death);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork netbuf script %s", script);
+        return ERROR_FAIL;
+    }
+
+    if (!pid) {
+        /* child: Launch netbuf script */
+        libxl__exec(gc, -1, -1, -1, args[0], args, env);
+        /* notreached */
+        abort();
+    }
+
+    return 0;
+}
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__ev_child *child,
+                                   pid_t pid, int status)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(child, *remus_state, child);
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc = ERROR_FAIL;
+
+    /* Convenience aliases */
+    const uint32_t domid = remus_state->dss->domid;
+    const int devid = remus_state->dev_id;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const char *const vif = netbuf_state->vif_list[devid];
+    const char **const ifb = &netbuf_state->ifb_list[devid];
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            remus_state->netbufscript,
+            netbuf_state->vif_list[devid], hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      remus_state->netbufscript,
+                                      pid, status);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc)
+        goto out;
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    remus_state->dev_id++;
+    if (remus_state->dev_id < netbuf_state->num_netbufs) {
+        rc = exec_netbuf_script(gc, remus_state,
+                                "setup", netbuf_setup_script_cb);
+        if (rc)
+            goto out;
+
+        return;
+    }
+
+    rc = init_qdiscs(gc, remus_state);
+ out:
+    libxl__remus_setup_done(egc, remus_state->dss, rc);
+}
+
+/* Scan through the list of vifs belonging to domid and
+ * invoke the netbufscript to setup the IFB device & plug qdisc
+ * for each vif. Then scan through the list of IFB devices to obtain
+ * a handle on the plug qdisc installed on these IFB devices.
+ * Network output buffering is controlled via these qdiscs.
+ */
+void libxl__remus_netbuf_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+    libxl__remus_netbuf_state *netbuf_state = NULL;
+    int num_netbufs = 0;
+    int rc = ERROR_FAIL;
+
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+    libxl__remus_state *const remus_state = dss->remus_state;
+
+    STATE_AO_GC(dss->ao);
+
+    GCNEW(netbuf_state);
+    netbuf_state->vif_list = get_guest_vif_list(gc, domid, &num_netbufs);
+    if (!num_netbufs) {
+        rc = 0;
+        goto out;
+    }
+
+    if (num_netbufs < 0) goto out;
+
+    GCNEW_ARRAY(netbuf_state->ifb_list, num_netbufs);
+    netbuf_state->num_netbufs = num_netbufs;
+    remus_state->netbuf_state = netbuf_state;
+    remus_state->dev_id = 0;
+    if (exec_netbuf_script(gc, remus_state, "setup",
+                           netbuf_setup_script_cb))
+        goto out;
+    return;
+
+ out:
+    libxl__remus_setup_done(egc, dss, rc);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(child, *remus_state, child);
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      remus_state->netbufscript,
+                                      pid, status);
+    }
+
+    remus_state->dev_id++;
+    if (remus_state->dev_id < netbuf_state->num_netbufs) {
+        if (exec_netbuf_script(gc, remus_state,
+                               "teardown", netbuf_teardown_script_cb))
+            goto out;
+        return;
+    }
+
+ out:
+    libxl__remus_teardown_done(egc, remus_state->dss);
+}
+
+/* Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+    /* Convenience aliases */
+    libxl__remus_state *const remus_state = dss->remus_state;
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+
+    STATE_AO_GC(dss->ao);
+
+    free_qdiscs(netbuf_state);
+
+    remus_state->dev_id = 0;
+    if (exec_netbuf_script(gc, remus_state, "teardown",
+                           netbuf_teardown_script_cb))
+        libxl__remus_teardown_done(egc, dss);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 6aa4bf1..559d0a6 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,17 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+/* Remus network buffer related stubs */
+void libxl__remus_netbuf_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+}
+
+void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
new file mode 100644
index 0000000..4e40412
--- /dev/null
+++ b/tools/libxl/libxl_remus.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*----- remus setup/teardown code -----*/
+
+void libxl__remus_setup_done(libxl__egc *egc,
+                             libxl__domain_suspend_state *dss,
+                             int rc)
+{
+    STATE_AO_GC(dss->ao);
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup network buffering"
+        " for guest with domid %u", dss->domid);
+    domain_suspend_done(egc, dss, rc);
+}
+
+void libxl__remus_teardown_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss)
+{
+    dss->callback(egc, dss, dss->remus_state->saved_rc);
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 612645c..cb3d926 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -563,6 +563,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 4/8] remus: implement the API to buffer/release packages
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-04-02 11:04 ` [PATCH V8 3/8] remus: Remus network buffering core and APIs to setup/teardown Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 5/8] libxl: use the API to setup/teardown network buffering Yang Hongyang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

This patch implements two APIs:
1. libxl__remus_netbuf_start_new_epoch()
   It marks a new epoch. The packages before this epoch will
   be flushed, and the packages after this epoch will be buffered.
   It will be called after the guest is suspended.
2. libxl__remus_netbuf_release_prev_epoch()
   It flushes the buffered packages to client, and it will be
   called when a checkpoint finishes.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h    |  7 ++++++
 tools/libxl/libxl_netbuffer.c   | 52 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 14 +++++++++++
 3 files changed, 73 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a6e5cff..1357f2d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2465,6 +2465,13 @@ _hidden void libxl__remus_teardown_done(libxl__egc *egc,
 _hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
                                           libxl__domain_suspend_state *dss);
 
+_hidden int
+libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                    libxl__remus_state *remus_state);
+_hidden int
+libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                       libxl__remus_state *remus_state);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 865cbb2..22fffec 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -509,6 +509,58 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
         libxl__remus_teardown_done(egc, dss);
 }
 
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+static int remus_netbuf_op(libxl__gc *gc, uint32_t domid,
+                           libxl__remus_state *remus_state,
+                           int buffer_op)
+{
+    int i, ret;
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+
+    for (i = 0; i < netbuf_state->num_netbufs; ++i) {
+        if (buffer_op == tc_buffer_start)
+            ret = rtnl_qdisc_plug_buffer(netbuf_state->netbuf_qdisc_list[i]);
+        else
+            ret = rtnl_qdisc_plug_release_one(netbuf_state->netbuf_qdisc_list[i]);
+
+        if (!ret) {
+            ret = rtnl_qdisc_add(netbuf_state->nlsock,
+                                 netbuf_state->netbuf_qdisc_list[i],
+                                 NLM_F_REQUEST);
+            if (ret)
+                goto out;
+        }
+    }
+
+    return 0;
+
+out:
+    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+        ((buffer_op == tc_buffer_start) ?
+        "start_new_epoch" : "release_prev_epoch"),
+        netbuf_state->ifb_list[i], nl_geterror(ret));
+    return ERROR_FAIL;
+}
+
+int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                        libxl__remus_state *remus_state)
+{
+    return remus_netbuf_op(gc, domid, remus_state, tc_buffer_start);
+}
+
+int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                           libxl__remus_state *remus_state)
+{
+    return remus_netbuf_op(gc, domid, remus_state, tc_buffer_release);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 559d0a6..92f35bc 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -33,6 +33,20 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
 {
 }
 
+int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                        libxl__remus_state *remus_state)
+{
+    LOG(ERROR, "Remus: No support for network buffering");
+    return ERROR_FAIL;
+}
+
+int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                           libxl__remus_state *remus_state)
+{
+    LOG(ERROR, "Remus: No support for network buffering");
+    return ERROR_FAIL;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 5/8] libxl: use the API to setup/teardown network buffering
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-04-02 11:04 ` [PATCH V8 4/8] remus: implement the API to buffer/release packages Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 6/8] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Yang Hongyang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

If there is network buffering hotplug scripts, call
libxl__remus_netbuf_setup() to setup the network
buffering and libxl__remus_netbuf_teardown() to
teardown network buffering.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |  6 +-----
 tools/libxl/libxl_dom.c      | 11 +++++++++++
 tools/libxl/libxl_internal.h |  7 +++++++
 tools/libxl/libxl_remus.c    | 22 ++++++++++++++++++++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index f7696a3..88b34a4 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -768,7 +768,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     }
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_setup_initiate(egc, dss);
     return AO_INPROGRESS;
 
  out:
@@ -784,10 +784,6 @@ static void remus_failover_cb(libxl__egc *egc,
      * backup died or some network error occurred preventing us
      * from sending checkpoints.
      */
-
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 20aaec8..0695f3e 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1715,6 +1715,17 @@ void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (dss->remus_state) {
+        /*
+        * With Remus, if we reach this point, it means either
+        * backup died or some network error occurred preventing us
+        * from sending checkpoints. Teardown the network buffers and
+        * release netlink resources.  This is an async op.
+        */
+        libxl__remus_teardown_initiate(egc, dss, rc);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1357f2d..34953ae 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2472,6 +2472,13 @@ _hidden int
 libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
                                        libxl__remus_state *remus_state);
 
+_hidden void libxl__remus_setup_initiate(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                            libxl__domain_suspend_state *dss,
+                                            int rc);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 4e40412..da303e7 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -18,6 +18,15 @@
 #include "libxl_internal.h"
 
 /*----- remus setup/teardown code -----*/
+void libxl__remus_setup_initiate(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss)
+{
+    libxl__ev_time_init(&dss->remus_state->timeout);
+    if (!dss->remus_state->netbufscript)
+        libxl__remus_setup_done(egc, dss, 0);
+    else
+        libxl__remus_netbuf_setup(egc, dss);
+}
 
 void libxl__remus_setup_done(libxl__egc *egc,
                              libxl__domain_suspend_state *dss,
@@ -34,6 +43,19 @@ void libxl__remus_setup_done(libxl__egc *egc,
     domain_suspend_done(egc, dss, rc);
 }
 
+void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss,
+                                    int rc)
+{
+    /* stash rc somewhere before invoking teardown ops. */
+    dss->remus_state->saved_rc = rc;
+
+    if (!dss->remus_state->netbuf_state)
+        libxl__remus_teardown_done(egc, dss);
+    else
+        libxl__remus_netbuf_teardown(egc, dss);
+}
+
 void libxl__remus_teardown_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss)
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 6/8] libxl: rename remus_failover_cb() to remus_replication_failure_cb()
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-04-02 11:04 ` [PATCH V8 5/8] libxl: use the API to setup/teardown network buffering Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 7/8] libxl: control network buffering in remus callbacks Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 8/8] libxl: network buffering cmdline switch Yang Hongyang
  7 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Failover means that: the machine on which primary vm is running is
down, and we need to start the secondary vm to take over the primary
vm. remus_failover_cb() is called when remus fails, not when we need
to do failover. So rename it to remus_replication_failure_cb()

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 88b34a4..c4a4751 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -710,8 +710,9 @@ out:
     return ptr;
 }
 
-static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc);
+static void remus_replication_failure_cb(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss,
+                                         int rc);
 
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
@@ -730,7 +731,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     GCNEW(dss);
     dss->ao = ao;
-    dss->callback = remus_failover_cb;
+    dss->callback = remus_replication_failure_cb;
     dss->domid = domid;
     dss->fd = send_fd;
     /* TODO do something with recv_fd */
@@ -775,8 +776,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     return AO_ABORT(rc);
 }
 
-static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc)
+static void remus_replication_failure_cb(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss,
+                                         int rc)
 {
     STATE_AO_GC(dss->ao);
     /*
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 7/8] libxl: control network buffering in remus callbacks
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-04-02 11:04 ` [PATCH V8 6/8] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-02 11:04 ` [PATCH V8 8/8] libxl: network buffering cmdline switch Yang Hongyang
  7 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

This patch constitutes the core network buffering logic.
and does the following:
 a) create a new network buffer when the domain is suspended
    (remus_domain_suspend_callback)
 b) release the previous network buffer pertaining to the
    committed checkpoint (remus_domain_checkpoint_dm_saved)

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom.c | 77 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 71 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 0695f3e..1272bf6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1432,7 +1432,24 @@ static void libxl__remus_domain_suspend_callback(void *data)
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
+    /* Convenience aliases */
+    libxl__remus_state *const remus_state = dss->remus_state;
+
+    STATE_AO_GC(dss->ao);
+
+    /* REMUS TODO: Issue disk checkpoint reqs. */
+    if (!remus_state->netbuf_state || !ok) goto out;
+
+    /* The domain was suspended successfully. Start a new network
+     * buffer for the next epoch. If this operation fails, then act
+     * as though domain suspend failed -- libxc exits its infinite
+     * loop and ultimately, the replication stops.
+     */
+    if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid,
+                                            remus_state))
+        ok = 0;
+
+out:
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
@@ -1446,7 +1463,7 @@ static int libxl__remus_domain_resume_callback(void *data)
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
         return 0;
 
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
+    /* REMUS TODO: Deal with disk. */
     return 1;
 }
 
@@ -1454,6 +1471,8 @@ static int libxl__remus_domain_resume_callback(void *data)
 
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1473,10 +1492,56 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* Convenience aliases */
+    /*
+     * REMUS TODO: Wait for disk and explicit memory ack (through restore
+     * callback from remote) before releasing network buffer.
+     */
+    libxl__remus_state *const remus_state = dss->remus_state;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    if (remus_state->netbuf_state) {
+        rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
+                                                    remus_state);
+        if (rc) {
+            LOG(ERROR, "Failed to release network buffer."
+                " Terminating Remus..");
+            goto out;
+        }
+    }
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(gc, &remus_state->timeout,
+                                     remus_next_checkpoint,
+                                     dss->interval);
+    if (rc) {
+        LOG(ERROR, "unable to register timeout for next epoch."
+            " Terminating Remus..");
+        goto out;
+    }
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__remus_state *remus_state = CONTAINER_OF(ev, *remus_state, timeout);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dss = remus_state->dss;
+
+    STATE_AO_GC(dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_state->timeout);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V8 8/8] libxl: network buffering cmdline switch
  2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
                   ` (6 preceding siblings ...)
  2014-04-02 11:04 ` [PATCH V8 7/8] libxl: control network buffering in remus callbacks Yang Hongyang
@ 2014-04-02 11:04 ` Yang Hongyang
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
  7 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-02 11:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Command line switch to 'xl remus' command, to enable network buffering.
Pass on this flag to libxl so that it can act accordingly.
Also update man pages to reflect the addition of a new option to
'xl remus' command.

Note: the network buffering is enabled as default. If you want to
disable it, please use -n option.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5    |  6 ++++++
 docs/man/xl.pod.1         | 11 ++++++++++-
 tools/libxl/xl.c          |  4 ++++
 tools/libxl/xl.h          |  1 +
 tools/libxl/xl_cmdimpl.c  | 28 ++++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c |  3 +++
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f7ceaa8..6319f36 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -405,7 +405,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     There is no support for disk buffering at the moment.
 
 B<OPTIONS>
 
@@ -424,6 +424,15 @@ Generally useful for debugging.
 
 Disable memory checkpoint compression.
 
+=item B<-n>
+
+Disable network output buffering.
+
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the instead of
+the default (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 527b4c5..43ef42a 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -46,6 +46,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -178,6 +179,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+        &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..087eb8c 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -170,6 +170,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8389468..0632e8f 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7279,8 +7279,9 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     r_info.blackhole = 0;
     r_info.compression = 1;
+    r_info.netbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7290,6 +7291,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         r_info.compression = 0;
         break;
+    case 'n':
+        r_info.netbuf = 0;
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7301,6 +7308,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (r_info.blackhole) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7338,13 +7348,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index e8ab93a..cfb5999 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -484,6 +484,9 @@ struct cmd_spec cmd_table[] = {
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
+      "-n                      Disable network output buffering.\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done()
  2014-04-02 11:04 ` [PATCH V8 8/8] libxl: network buffering cmdline switch Yang Hongyang
@ 2014-04-03 12:22   ` Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 2/7] introduce a new function libxl__remus_netbuf_teardown_done() Lai Jiangshan
                       ` (6 more replies)
  0 siblings, 7 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

We will exec some scripts to setup netbuf and disk, so we need two
asynchronous functions that are called when the setup is done.

This patch introduces the asynchronous function for netbuf.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h  |    6 +++---
 tools/libxl/libxl_netbuffer.c |    4 ++--
 tools/libxl/libxl_remus.c     |   15 +++++++++++----
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9ad0e27..3719bf4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2454,9 +2454,9 @@ _hidden void domain_suspend_done(libxl__egc *egc,
                                  libxl__domain_suspend_state *dss,
                                  int rc);
 
-_hidden void libxl__remus_setup_done(libxl__egc *egc,
-                                     libxl__domain_suspend_state *dss,
-                                     int rc);
+_hidden void libxl__remus_netbuf_setup_done(libxl__egc *egc,
+                                            libxl__domain_suspend_state *dss,
+                                            int rc);
 
 _hidden void libxl__remus_netbuf_setup(libxl__egc *egc,
                                        libxl__domain_suspend_state *dss);
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 22fffec..a596d31 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -413,7 +413,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     rc = init_qdiscs(gc, remus_state);
  out:
-    libxl__remus_setup_done(egc, remus_state->dss, rc);
+    libxl__remus_netbuf_setup_done(egc, remus_state->dss, rc);
 }
 
 /* Scan through the list of vifs belonging to domid and
@@ -454,7 +454,7 @@ void libxl__remus_netbuf_setup(libxl__egc *egc,
     return;
 
  out:
-    libxl__remus_setup_done(egc, dss, rc);
+    libxl__remus_netbuf_setup_done(egc, dss, rc);
 }
 
 static void netbuf_teardown_script_cb(libxl__egc *egc,
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index da303e7..ef90e6b 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -23,14 +23,14 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
 {
     libxl__ev_time_init(&dss->remus_state->timeout);
     if (!dss->remus_state->netbufscript)
-        libxl__remus_setup_done(egc, dss, 0);
+        libxl__remus_netbuf_setup_done(egc, dss, 0);
     else
         libxl__remus_netbuf_setup(egc, dss);
 }
 
-void libxl__remus_setup_done(libxl__egc *egc,
-                             libxl__domain_suspend_state *dss,
-                             int rc)
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss,
+                                    int rc)
 {
     STATE_AO_GC(dss->ao);
     if (!rc) {
@@ -43,6 +43,13 @@ void libxl__remus_setup_done(libxl__egc *egc,
     domain_suspend_done(egc, dss, rc);
 }
 
+void libxl__remus_netbuf_setup_done(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss,
+                                    int rc)
+{
+    libxl__remus_setup_done(egc, dss, rc);
+}
+
 void libxl__remus_teardown_initiate(libxl__egc *egc,
                                     libxl__domain_suspend_state *dss,
                                     int rc)
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 2/7] introduce a new function libxl__remus_netbuf_teardown_done()
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
@ 2014-04-03 12:22     ` Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 3/7] introduce an API to async exec scripts Lai Jiangshan
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

We will exec some scripts to teardown netbuf and disk, so we need two
asynchronous functions that are called when the teardown is done.

This patch introduces the asynchronous function for netbuf.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h  |    4 ++--
 tools/libxl/libxl_netbuffer.c |    4 ++--
 tools/libxl/libxl_remus.c     |   12 +++++++++---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3719bf4..dc49f16 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2461,8 +2461,8 @@ _hidden void libxl__remus_netbuf_setup_done(libxl__egc *egc,
 _hidden void libxl__remus_netbuf_setup(libxl__egc *egc,
                                        libxl__domain_suspend_state *dss);
 
-_hidden void libxl__remus_teardown_done(libxl__egc *egc,
-                                        libxl__domain_suspend_state *dss);
+_hidden void libxl__remus_netbuf_teardown_done(libxl__egc *egc,
+                                               libxl__domain_suspend_state *dss);
 
 _hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
                                           libxl__domain_suspend_state *dss);
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index a596d31..c9c1ba7 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -485,7 +485,7 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
     }
 
  out:
-    libxl__remus_teardown_done(egc, remus_state->dss);
+    libxl__remus_netbuf_teardown_done(egc, remus_state->dss);
 }
 
 /* Note: This function will be called in the same gc context as
@@ -506,7 +506,7 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
     remus_state->dev_id = 0;
     if (exec_netbuf_script(gc, remus_state, "teardown",
                            netbuf_teardown_script_cb))
-        libxl__remus_teardown_done(egc, dss);
+        libxl__remus_netbuf_teardown_done(egc, dss);
 }
 
 /* The buffer_op's value, not the value passed to kernel */
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index ef90e6b..78652d2 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -58,13 +58,19 @@ void libxl__remus_teardown_initiate(libxl__egc *egc,
     dss->remus_state->saved_rc = rc;
 
     if (!dss->remus_state->netbuf_state)
-        libxl__remus_teardown_done(egc, dss);
+        libxl__remus_netbuf_teardown_done(egc, dss);
     else
         libxl__remus_netbuf_teardown(egc, dss);
 }
 
-void libxl__remus_teardown_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss)
+static void libxl__remus_teardown_done(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dss)
 {
     dss->callback(egc, dss, dss->remus_state->saved_rc);
 }
+
+void libxl__remus_netbuf_teardown_done(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dss)
+{
+    libxl__remus_teardown_done(egc, dss);
+}
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 3/7] introduce an API to async exec scripts
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 2/7] introduce a new function libxl__remus_netbuf_teardown_done() Lai Jiangshan
@ 2014-04-03 12:22     ` Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 4/7] netbuffer: use async exec API to exec the netbuffer script Lai Jiangshan
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.c |   79 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |   17 +++++++++
 2 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/tools/libxl/libxl_internal.c b/tools/libxl/libxl_internal.c
index 6c94d3e..4c9c23b 100644
--- a/tools/libxl/libxl_internal.c
+++ b/tools/libxl/libxl_internal.c
@@ -375,6 +375,85 @@ out:
     return rc;
 }
 
+static void libxl_async_exec_timeout(libxl__egc *egc,
+                                     libxl__ev_time *ev,
+                                     const struct timeval *requested_abs)
+{
+    libxl_async_exec *async_exec = CONTAINER_OF(ev, *async_exec, time);
+
+    STATE_AO_GC(async_exec->ao);
+
+    libxl__ev_time_deregister(gc, &async_exec->time);
+    assert(libxl__ev_child_inuse(&async_exec->child));
+
+    LOG(DEBUG, "killing hotplug script %s because of timeout",
+        async_exec->args[0]);
+
+    if (kill(async_exec->child.pid, SIGKILL)) {
+        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
+              async_exec->args[0],
+              (unsigned long)async_exec->child.pid);
+    }
+
+    return;
+}
+
+static void libxl_async_exec_done(libxl__egc *egc,
+                                  libxl__ev_child *child,
+                                  pid_t pid, int status)
+{
+    libxl_async_exec *async_exec = CONTAINER_OF(child, *async_exec, child);
+
+    STATE_AO_GC(async_exec->ao);
+
+    libxl__ev_time_deregister(gc, &async_exec->time);
+
+    if (status && !async_exec->allow_fail) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      async_exec->args[0],
+                                      pid, status);
+    }
+
+    async_exec->finish_cb(async_exec->opaque, status);
+}
+
+int libxl_async_exec_script(libxl__gc *gc, libxl_async_exec *async_exec)
+{
+    pid_t pid;
+
+    /* Convenience aliases */
+    libxl__ev_child *const child = &async_exec->child;
+    char * const *args = async_exec->args;
+    char * const *env = async_exec->env;
+
+    /* Set hotplug timeout */
+    if (libxl__ev_time_register_rel(gc, &async_exec->time,
+                                    libxl_async_exec_timeout,
+                                    async_exec->timeout * 1000)) {
+        LOG(ERROR, "unable to register timeout for "
+            "script %s", args[0]);
+        return ERROR_FAIL;
+    }
+
+    LOG(DEBUG, "Calling script: %s ", args[0]);
+
+    /* Fork and exec netbuf script */
+    pid = libxl__ev_child_fork(gc, child, libxl_async_exec_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork for script %s", args[0]);
+        return ERROR_FAIL;
+    }
+
+    if (!pid) {
+        /* child: Launch netbuf script */
+        libxl__exec(gc, -1, -1, -1, args[0], args, env);
+        /* notreached */
+        abort();
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index dc49f16..ab82334 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3133,6 +3133,23 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
  */
 #define CTYPE(isfoo,c) (isfoo((unsigned char)(c)))
 
+typedef struct libxl_async_exec {
+    char * *env;
+    char * *args;
+    void *opaque;
+    void (*finish_cb)(void *opaque, int status);
+    /* unit: second */
+    int timeout;
+    bool allow_fail;
+
+    libxl__ev_time time;
+    libxl__ev_child child;
+    libxl__ao *ao;
+} libxl_async_exec;
+
+_hidden extern int libxl_async_exec_script(libxl__gc *gc,
+                                           libxl_async_exec *async_exec);
+
 
 #endif
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 4/7] netbuffer: use async exec API to exec the netbuffer script
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 2/7] introduce a new function libxl__remus_netbuf_teardown_done() Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 3/7] introduce an API to async exec scripts Lai Jiangshan
@ 2014-04-03 12:22     ` Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 5/7] netbuf: move dev_id from remus_state to netbuf_state Lai Jiangshan
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c           |    2 +-
 tools/libxl/libxl_internal.h  |    3 +-
 tools/libxl/libxl_netbuffer.c |  139 ++++++++++++-----------------------------
 3 files changed, 43 insertions(+), 101 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c4a4751..1596146 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -747,7 +747,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     /* convenience shorthand */
     libxl__remus_state *remus_state = dss->remus_state;
     remus_state->dss = dss;
-    libxl__ev_child_init(&remus_state->child);
+    remus_state->egc = egc;
 
     /* TBD: enable disk buffering */
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index ab82334..bf92975 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2438,14 +2438,15 @@ typedef struct libxl__remus_state {
     /* Script to setup/teardown network buffers */
     const char *netbufscript;
     libxl__domain_suspend_state *dss;
+    libxl__egc *egc;
 
     /* private */
     int saved_rc;
     int dev_id;
     /* Opaque context containing network buffer related stuff */
     void *netbuf_state;
+    /* used for checkpoint */
     libxl__ev_time timeout;
-    libxl__ev_child child;
 } libxl__remus_state;
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index c9c1ba7..d996832 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -32,7 +32,7 @@ typedef struct libxl__remus_netbuf_state {
     const char **vif_list;
     const char **ifb_list;
     uint32_t num_netbufs;
-    uint32_t unused;
+    libxl_async_exec async_exec;
 } libxl__remus_netbuf_state;
 
 int libxl__netbuffer_enabled(libxl__gc *gc)
@@ -238,52 +238,20 @@ static int init_qdiscs(libxl__gc *gc,
     return ERROR_FAIL;
 }
 
-static void netbuf_setup_timeout_cb(libxl__egc *egc,
-                                    libxl__ev_time *ev,
-                                    const struct timeval *requested_abs)
-{
-    libxl__remus_state *remus_state = CONTAINER_OF(ev, *remus_state, timeout);
-
-    /* Convenience aliases */
-    const int devid = remus_state->dev_id;
-    libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
-    const char *const vif = netbuf_state->vif_list[devid];
-
-    STATE_AO_GC(remus_state->dss->ao);
-
-    libxl__ev_time_deregister(gc, &remus_state->timeout);
-    assert(libxl__ev_child_inuse(&remus_state->child));
-
-    LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout",
-        remus_state->netbufscript, vif);
-
-    if (kill(remus_state->child.pid, SIGKILL)) {
-        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
-              remus_state->netbufscript,
-              (unsigned long)remus_state->child.pid);
-    }
-
-    return;
-}
-
 /* the script needs the following env & args
  * $vifname
  * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
  * $IFB (for teardown)
  * setup/teardown as command line arg.
- * In return, the script writes the name of IFB device (during setup) to be
- * used for output buffering into XENBUS_PATH/ifb
  */
-static int exec_netbuf_script(libxl__gc *gc, libxl__remus_state *remus_state,
-                              char *op, libxl__ev_child_callback *death)
+static void setup_env(libxl_async_exec *async_exec, char *op,
+                      libxl__remus_state *remus_state)
 {
     int arraysize, nr = 0;
     char **env = NULL, **args = NULL;
-    pid_t pid;
+    STATE_AO_GC(remus_state->dss->ao);
 
     /* Convenience aliases */
-    libxl__ev_child *const child = &remus_state->child;
-    libxl__ev_time *const timeout = &remus_state->timeout;
     char *const script = libxl__strdup(gc, remus_state->netbufscript);
     const uint32_t domid = remus_state->dss->domid;
     const int devid = remus_state->dev_id;
@@ -312,40 +280,17 @@ static int exec_netbuf_script(libxl__gc *gc, libxl__remus_state *remus_state,
     args[nr++] = NULL;
     assert(nr == arraysize);
 
-    /* Set hotplug timeout */
-    if (libxl__ev_time_register_rel(gc, timeout,
-                                    netbuf_setup_timeout_cb,
-                                    LIBXL_HOTPLUG_TIMEOUT * 1000)) {
-        LOG(ERROR, "unable to register timeout for "
-            "netbuf setup script %s on vif %s", script, vif);
-        return ERROR_FAIL;
-    }
-
-    LOG(DEBUG, "Calling netbuf script: %s %s on vif %s",
-        script, op, vif);
-
-    /* Fork and exec netbuf script */
-    pid = libxl__ev_child_fork(gc, child, death);
-    if (pid == -1) {
-        LOG(ERROR, "unable to fork netbuf script %s", script);
-        return ERROR_FAIL;
-    }
-
-    if (!pid) {
-        /* child: Launch netbuf script */
-        libxl__exec(gc, -1, -1, -1, args[0], args, env);
-        /* notreached */
-        abort();
-    }
-
-    return 0;
+    async_exec->env = env;
+    async_exec->args = args;
 }
 
-static void netbuf_setup_script_cb(libxl__egc *egc,
-                                   libxl__ev_child *child,
-                                   pid_t pid, int status)
+/*
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(void *opaque, int status)
 {
-    libxl__remus_state *remus_state = CONTAINER_OF(child, *remus_state, child);
+    libxl__remus_state *remus_state = opaque;
     const char *out_path_base, *hotplug_error = NULL;
     int rc = ERROR_FAIL;
 
@@ -358,7 +303,8 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     STATE_AO_GC(remus_state->dss->ao);
 
-    libxl__ev_time_deregister(gc, &remus_state->timeout);
+    if (status)
+        goto out;
 
     out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
                               libxl__xs_libxl_path(gc, domid), devid);
@@ -377,14 +323,6 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
         goto out;
     }
 
-    if (status) {
-        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
-                                      remus_state->netbufscript,
-                                      pid, status);
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
     rc = libxl__xs_read_checked(gc, XBT_NULL,
                                 GCSPRINTF("%s/remus/netbuf/%d/ifb",
                                           libxl__xs_libxl_path(gc, domid),
@@ -403,9 +341,8 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
     remus_state->dev_id++;
     if (remus_state->dev_id < netbuf_state->num_netbufs) {
-        rc = exec_netbuf_script(gc, remus_state,
-                                "setup", netbuf_setup_script_cb);
-        if (rc)
+        setup_env(&netbuf_state->async_exec, "setup", remus_state);
+        if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
             goto out;
 
         return;
@@ -413,7 +350,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     rc = init_qdiscs(gc, remus_state);
  out:
-    libxl__remus_netbuf_setup_done(egc, remus_state->dss, rc);
+    libxl__remus_netbuf_setup_done(remus_state->egc, remus_state->dss, rc);
 }
 
 /* Scan through the list of vifs belonging to domid and
@@ -444,12 +381,19 @@ void libxl__remus_netbuf_setup(libxl__egc *egc,
 
     if (num_netbufs < 0) goto out;
 
+    libxl__ev_child_init(&netbuf_state->async_exec.child);
+
     GCNEW_ARRAY(netbuf_state->ifb_list, num_netbufs);
     netbuf_state->num_netbufs = num_netbufs;
     remus_state->netbuf_state = netbuf_state;
     remus_state->dev_id = 0;
-    if (exec_netbuf_script(gc, remus_state, "setup",
-                           netbuf_setup_script_cb))
+
+    netbuf_state->async_exec.timeout = LIBXL_HOTPLUG_TIMEOUT;
+    netbuf_state->async_exec.opaque = remus_state;
+    netbuf_state->async_exec.finish_cb = netbuf_setup_script_cb;
+    netbuf_state->async_exec.ao = ao;
+    setup_env(&netbuf_state->async_exec, "setup", remus_state);
+    if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
         goto out;
     return;
 
@@ -457,35 +401,25 @@ void libxl__remus_netbuf_setup(libxl__egc *egc,
     libxl__remus_netbuf_setup_done(egc, dss, rc);
 }
 
-static void netbuf_teardown_script_cb(libxl__egc *egc,
-                                      libxl__ev_child *child,
-                                      pid_t pid, int status)
+static void netbuf_teardown_script_cb(void *opaque, int status)
 {
-    libxl__remus_state *remus_state = CONTAINER_OF(child, *remus_state, child);
+    libxl__remus_state *remus_state = opaque;
 
     /* Convenience aliases */
     libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
 
     STATE_AO_GC(remus_state->dss->ao);
 
-    libxl__ev_time_deregister(gc, &remus_state->timeout);
-
-    if (status) {
-        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
-                                      remus_state->netbufscript,
-                                      pid, status);
-    }
-
     remus_state->dev_id++;
     if (remus_state->dev_id < netbuf_state->num_netbufs) {
-        if (exec_netbuf_script(gc, remus_state,
-                               "teardown", netbuf_teardown_script_cb))
+        setup_env(&netbuf_state->async_exec, "teardown", remus_state);
+        if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
             goto out;
         return;
     }
 
  out:
-    libxl__remus_netbuf_teardown_done(egc, remus_state->dss);
+    libxl__remus_netbuf_teardown_done(remus_state->egc, remus_state->dss);
 }
 
 /* Note: This function will be called in the same gc context as
@@ -501,11 +435,18 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
 
     STATE_AO_GC(dss->ao);
 
+    libxl__ev_child_init(&netbuf_state->async_exec.child);
+
     free_qdiscs(netbuf_state);
 
+    netbuf_state->async_exec.timeout = LIBXL_HOTPLUG_TIMEOUT;
+    netbuf_state->async_exec.opaque = remus_state;
+    netbuf_state->async_exec.finish_cb = netbuf_teardown_script_cb;
+    netbuf_state->async_exec.ao = ao;
     remus_state->dev_id = 0;
-    if (exec_netbuf_script(gc, remus_state, "teardown",
-                           netbuf_teardown_script_cb))
+    setup_env(&netbuf_state->async_exec, "teardown", remus_state);
+
+    if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
         libxl__remus_netbuf_teardown_done(egc, dss);
 }
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 5/7] netbuf: move dev_id from remus_state to netbuf_state
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
                       ` (2 preceding siblings ...)
  2014-04-03 12:22     ` [PATCH 4/7] netbuffer: use async exec API to exec the netbuffer script Lai Jiangshan
@ 2014-04-03 12:22     ` Lai Jiangshan
  2014-04-03 12:22     ` [PATCH 6/7] remus: implement remus replicated checkpointing disk Lai Jiangshan
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h  |    1 -
 tools/libxl/libxl_netbuffer.c |   17 +++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bf92975..4e07969 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2442,7 +2442,6 @@ typedef struct libxl__remus_state {
 
     /* private */
     int saved_rc;
-    int dev_id;
     /* Opaque context containing network buffer related stuff */
     void *netbuf_state;
     /* used for checkpoint */
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index d996832..cd822f8 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -32,6 +32,7 @@ typedef struct libxl__remus_netbuf_state {
     const char **vif_list;
     const char **ifb_list;
     uint32_t num_netbufs;
+    int dev_id;
     libxl_async_exec async_exec;
 } libxl__remus_netbuf_state;
 
@@ -254,8 +255,8 @@ static void setup_env(libxl_async_exec *async_exec, char *op,
     /* Convenience aliases */
     char *const script = libxl__strdup(gc, remus_state->netbufscript);
     const uint32_t domid = remus_state->dss->domid;
-    const int devid = remus_state->dev_id;
     libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const int devid = netbuf_state->dev_id;
     const char *const vif = netbuf_state->vif_list[devid];
     const char *const ifb = netbuf_state->ifb_list[devid];
 
@@ -296,8 +297,8 @@ static void netbuf_setup_script_cb(void *opaque, int status)
 
     /* Convenience aliases */
     const uint32_t domid = remus_state->dss->domid;
-    const int devid = remus_state->dev_id;
     libxl__remus_netbuf_state *const netbuf_state = remus_state->netbuf_state;
+    const int devid = netbuf_state->dev_id;
     const char *const vif = netbuf_state->vif_list[devid];
     const char **const ifb = &netbuf_state->ifb_list[devid];
 
@@ -339,8 +340,8 @@ static void netbuf_setup_script_cb(void *opaque, int status)
     }
 
     LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
-    remus_state->dev_id++;
-    if (remus_state->dev_id < netbuf_state->num_netbufs) {
+    netbuf_state->dev_id++;
+    if (netbuf_state->dev_id < netbuf_state->num_netbufs) {
         setup_env(&netbuf_state->async_exec, "setup", remus_state);
         if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
             goto out;
@@ -386,7 +387,7 @@ void libxl__remus_netbuf_setup(libxl__egc *egc,
     GCNEW_ARRAY(netbuf_state->ifb_list, num_netbufs);
     netbuf_state->num_netbufs = num_netbufs;
     remus_state->netbuf_state = netbuf_state;
-    remus_state->dev_id = 0;
+    netbuf_state->dev_id = 0;
 
     netbuf_state->async_exec.timeout = LIBXL_HOTPLUG_TIMEOUT;
     netbuf_state->async_exec.opaque = remus_state;
@@ -410,8 +411,8 @@ static void netbuf_teardown_script_cb(void *opaque, int status)
 
     STATE_AO_GC(remus_state->dss->ao);
 
-    remus_state->dev_id++;
-    if (remus_state->dev_id < netbuf_state->num_netbufs) {
+    netbuf_state->dev_id++;
+    if (netbuf_state->dev_id < netbuf_state->num_netbufs) {
         setup_env(&netbuf_state->async_exec, "teardown", remus_state);
         if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
             goto out;
@@ -443,7 +444,7 @@ void libxl__remus_netbuf_teardown(libxl__egc *egc,
     netbuf_state->async_exec.opaque = remus_state;
     netbuf_state->async_exec.finish_cb = netbuf_teardown_script_cb;
     netbuf_state->async_exec.ao = ao;
-    remus_state->dev_id = 0;
+    netbuf_state->dev_id = 0;
     setup_env(&netbuf_state->async_exec, "teardown", remus_state);
 
     if (libxl_async_exec_script(gc, &netbuf_state->async_exec))
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 6/7] remus: implement remus replicated checkpointing disk
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
                       ` (3 preceding siblings ...)
  2014-04-03 12:22     ` [PATCH 5/7] netbuf: move dev_id from remus_state to netbuf_state Lai Jiangshan
@ 2014-04-03 12:22     ` Lai Jiangshan
  2014-04-03 16:41       ` Shriram Rajagopalan
  2014-04-03 12:22     ` [PATCH 7/7] drbd: implement " Lai Jiangshan
  2014-04-03 14:08     ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Ian Jackson
  6 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/Makefile           |    1 +
 tools/libxl/libxl_dom.c        |   19 +++-
 tools/libxl/libxl_internal.h   |   16 +++
 tools/libxl/libxl_remus.c      |   18 +++
 tools/libxl/libxl_remus_disk.c |  285 ++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_remus_disk.h |   74 +++++++++++
 6 files changed, 411 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_disk.c
 create mode 100644 tools/libxl/libxl_remus_disk.h

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 670a2bc..b040a79 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -53,6 +53,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o
+LIBXL_OBJS-y += libxl_remus_disk.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 1272bf6..ebaece2 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1437,9 +1437,14 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
 
     STATE_AO_GC(dss->ao);
 
-    /* REMUS TODO: Issue disk checkpoint reqs. */
     if (!remus_state->netbuf_state || !ok) goto out;
 
+    /* Issue disk checkpoint reqs. */
+    if (libxl__remus_disk_postsuspend(remus_state)) {
+        ok = 0;
+        goto out;
+    }
+
     /* The domain was suspended successfully. Start a new network
      * buffer for the next epoch. If this operation fails, then act
      * as though domain suspend failed -- libxc exits its infinite
@@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void *data)
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
         return 0;
 
-    /* REMUS TODO: Deal with disk. */
+    /* Deal with disk. */
+    if (libxl__remus_disk_preresume(dss->remus_state))
+        return 0;
+
     return 1;
 }
 
@@ -1506,6 +1514,13 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
         goto out;
     }
 
+    rc = libxl__remus_disk_commit(remus_state);
+    if (rc) {
+        LOG(ERROR, "Failed to commit disk state"
+            " Terminating Remus..");
+        goto out;
+    }
+
     if (remus_state->netbuf_state) {
         rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
                                                     remus_state);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4e07969..7c30d9a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2446,6 +2446,9 @@ typedef struct libxl__remus_state {
     void *netbuf_state;
     /* used for checkpoint */
     libxl__ev_time timeout;
+
+    /* Opaque context containing disk related stuff */
+    void *disk_state;
 } libxl__remus_state;
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
@@ -2474,6 +2477,19 @@ _hidden int
 libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
                                        libxl__remus_state *remus_state);
 
+_hidden int libxl__remus_disk_postsuspend(libxl__remus_state *state);
+_hidden int libxl__remus_disk_preresume(libxl__remus_state *state);
+_hidden int libxl__remus_disk_commit(libxl__remus_state *state);
+_hidden void libxl__remus_disk_setup(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss);
+_hidden void libxl__remus_disk_setup_done(libxl__egc *egc,
+                                          libxl__domain_suspend_state *dss,
+                                          int rc);
+_hidden void libxl__remus_disk_teardown(libxl__egc *egc,
+                                        libxl__domain_suspend_state *dss);
+_hidden void libxl__remus_disk_teardown_done(libxl__egc *egc,
+                                             libxl__domain_suspend_state *dss);
+
 _hidden void libxl__remus_setup_initiate(libxl__egc *egc,
                                          libxl__domain_suspend_state *dss);
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 78652d2..9eab98d 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -47,6 +47,18 @@ void libxl__remus_netbuf_setup_done(libxl__egc *egc,
                                     libxl__domain_suspend_state *dss,
                                     int rc)
 {
+    if (rc) {
+        libxl__remus_setup_done(egc, dss, rc);
+        return;
+    }
+
+    libxl__remus_disk_setup(egc, dss);
+}
+
+void libxl__remus_disk_setup_done(libxl__egc *egc,
+                                   libxl__domain_suspend_state *dss,
+                                   int rc)
+{
     libxl__remus_setup_done(egc, dss, rc);
 }
 
@@ -72,5 +84,11 @@ static void libxl__remus_teardown_done(libxl__egc *egc,
 void libxl__remus_netbuf_teardown_done(libxl__egc *egc,
                                        libxl__domain_suspend_state *dss)
 {
+    libxl__remus_disk_teardown(egc, dss);
+}
+
+void libxl__remus_disk_teardown_done(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss)
+{
     libxl__remus_teardown_done(egc, dss);
 }
diff --git a/tools/libxl/libxl_remus_disk.c b/tools/libxl/libxl_remus_disk.c
new file mode 100644
index 0000000..ca3e879
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+#include "libxl_remus_disk.h"
+
+typedef struct libxl__remus_disk_state {
+    libxl__remus_state *remus_state;
+    libxl_device_disk *disks;
+    struct libxl__remus_disk **remus_disks;
+    uint32_t num_disks;
+    uint32_t curr_disk_id;
+    uint32_t curr_disktype_id;
+
+    libxl_async_exec async_exec;
+} libxl__remus_disk_state;
+
+/*** checkpoint disks states and callbacks ***/
+static const libxl__remus_disk_type *remus_disk_types[] =
+{
+};
+
+int libxl__remus_disk_postsuspend(libxl__remus_state *remus_state)
+{
+    int i;
+    int rc = 0;
+    libxl__remus_disk *remus_disk;
+
+    /* Convenience aliases */
+    libxl__remus_disk_state *disk_state = remus_state->disk_state;
+
+    for (i = 0; rc == 0 && i < disk_state->num_disks; i++) {
+        remus_disk = disk_state->remus_disks[i];
+        rc = remus_disk->type->postsuspend(remus_disk);
+    }
+
+    return rc;
+}
+
+int libxl__remus_disk_preresume(libxl__remus_state *remus_state)
+{
+    int i;
+    int rc = 0;
+    libxl__remus_disk *remus_disk;
+
+    /* Convenience aliases */
+    libxl__remus_disk_state *disk_state = remus_state->disk_state;
+
+    for (i = 0; rc == 0 && i < disk_state->num_disks; i++) {
+        remus_disk = disk_state->remus_disks[i];
+        rc = remus_disk->type->preresume(remus_disk);
+    }
+
+    return rc;
+}
+
+int libxl__remus_disk_commit(libxl__remus_state *remus_state)
+{
+    int i;
+    int rc = 0;
+    libxl__remus_disk *remus_disk;
+
+    /* Convenience aliases */
+    libxl__remus_disk_state *disk_state = remus_state->disk_state;
+
+    for (i = 0; rc == 0 && i < disk_state->num_disks; i++) {
+        remus_disk = disk_state->remus_disks[i];
+        rc = remus_disk->type->commit(remus_disk);
+    }
+
+    return rc;
+}
+
+static void alloc_remus_disk(libxl__domain_suspend_state *dss,
+                             libxl__remus_disk **remus_disk,
+                             const libxl_device_disk *disk,
+                             const libxl__remus_disk_type *disk_type)
+{
+    STATE_AO_GC(dss->ao);
+    libxl__remus_disk *new_disk;
+
+    new_disk = libxl__zalloc(gc, sizeof(libxl__remus_disk) + disk_type->size);
+    new_disk->type = disk_type;
+    new_disk->disk = disk;
+    new_disk->disk_state = dss->remus_state->disk_state;
+    new_disk->opaque = &new_disk[1];
+
+    *remus_disk = new_disk;
+}
+
+static void setup_all_disks(libxl__remus_disk_state *disk_state)
+{
+    int i, rc = 0;
+    libxl__remus_disk *remus_disk;
+
+    /* Convenience aliases */
+    libxl__egc *egc = disk_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = disk_state->remus_state->dss;
+
+    for (i = 0; i < disk_state->num_disks; i++) {
+        remus_disk = disk_state->remus_disks[i];
+        rc = remus_disk->type->setup(remus_disk);
+        if (rc) {
+            libxl__remus_disk_setup_done(egc, dss, rc);
+            return;
+        }
+    }
+
+    libxl__remus_disk_setup_done(egc, dss, 0);
+}
+
+static int disk_match_once(libxl__remus_disk_state *disk_state)
+{
+    int i, j, rc = 0;
+    const libxl__remus_disk_type *disk_type;
+    const libxl_device_disk *disk;
+
+    /* Convenience aliases */
+    libxl__egc *egc = disk_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = disk_state->remus_state->dss;
+
+    i = disk_state->curr_disk_id;
+    j = disk_state->curr_disktype_id;
+    for (; i < disk_state->num_disks; i++) {
+        disk = &disk_state->disks[i];
+        for (; j < ARRAY_SIZE(remus_disk_types); j++) {
+            disk_type = remus_disk_types[j];
+
+            rc = disk_type->match(dss, disk, &disk_state->async_exec,
+                                  disk_state);
+            if (rc)
+                goto out;
+
+            if (rc == 0) {
+                alloc_remus_disk(dss, &disk_state->remus_disks[i],
+                                 disk, disk_type);
+                break;
+            }
+        }
+        j = 0;
+    }
+
+out:
+    if (rc < 0)
+        libxl__remus_disk_setup_done(egc, dss, rc);
+    else if (i < disk_state->num_disks) {
+        disk_state->curr_disk_id = i;
+        disk_state->curr_disktype_id = j;
+    }
+
+    return rc;
+}
+
+void disk_match_script_cb(void *opaque, int status)
+{
+    libxl__remus_disk_state *disk_state = opaque;
+    int rc;
+
+    /* Convenience aliases */
+    int curr_disktype_id = disk_state->curr_disktype_id;
+    int curr_disk_id = disk_state->curr_disk_id;
+    libxl_device_disk *disk = &disk_state->disks[disk_state->curr_disk_id];
+    const libxl__remus_disk_type *disk_type = remus_disk_types[curr_disktype_id];
+    libxl__egc *egc = disk_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = disk_state->remus_state->dss;
+
+    if (!status) {
+        alloc_remus_disk(dss, &disk_state->remus_disks[curr_disk_id],
+                         disk, disk_type);
+        disk_state->curr_disk_id++;
+        disk_state->curr_disktype_id = 0;
+    } else {
+        disk_state->curr_disktype_id++;
+        if (++disk_state->curr_disk_id > ARRAY_SIZE(remus_disk_types)) {
+            /* no disktype matches with this disk */
+            libxl__remus_disk_setup_done(egc, dss, ERROR_FAIL);
+            return;
+        }
+    }
+
+    rc = disk_match_once(disk_state);
+    if (rc)
+        return;
+
+    setup_all_disks(disk_state);
+}
+
+void libxl__remus_disk_setup(libxl__egc *egc, libxl__domain_suspend_state *dss)
+{
+    int rc, num_disks;
+    libxl__remus_disk_state *disk_state = NULL;
+
+    STATE_AO_GC(dss->ao);
+
+    GCNEW(disk_state);
+
+    dss->remus_state->disk_state = disk_state;
+    libxl__ev_child_init(&disk_state->async_exec.child);
+    disk_state->async_exec.ao = dss->ao;
+    disk_state->remus_state = dss->remus_state;
+
+    disk_state->disks = libxl_device_disk_list(CTX, dss->domid, &num_disks);
+    disk_state->num_disks = num_disks;
+    GCNEW_ARRAY(disk_state->remus_disks, num_disks);
+
+    disk_state->curr_disk_id = 0;
+    disk_state->curr_disktype_id = 0;
+    rc = disk_match_once(disk_state);
+    if (rc)
+        return;
+
+    setup_all_disks(disk_state);
+}
+
+static void disk_teardown_once(libxl__remus_disk_state *disk_state)
+{
+    int i, rc = 0;
+    const libxl__remus_disk_type *disk_type;
+
+    /* Convenience aliases */
+    libxl__egc *egc = disk_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = disk_state->remus_state->dss;
+
+    i = disk_state->curr_disk_id;
+    for (; i < disk_state->num_disks; i++) {
+        if (!disk_state->remus_disks[i])
+            continue;
+
+        disk_type = disk_state->remus_disks[i]->type;
+
+        rc = disk_type->teardown(disk_state->remus_disks[i],
+                                 &disk_state->async_exec);
+        if (rc)
+            break;
+
+        disk_state->remus_disks[i] = NULL;
+    }
+
+    if (rc < 0 || i == disk_state->num_disks)
+        libxl__remus_disk_teardown_done(egc, dss);
+
+    disk_state->curr_disk_id = i;
+}
+
+void disk_teardown_script_cb(void *opaque)
+{
+    libxl__remus_disk_state *disk_state = opaque;
+
+    disk_state->remus_disks[disk_state->curr_disk_id] = NULL;
+    disk_state->curr_disk_id++;
+    disk_teardown_once(disk_state);
+}
+
+void libxl__remus_disk_teardown(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss)
+{
+    /* Convenience aliases */
+    libxl__remus_disk_state *disk_state = dss->remus_state->disk_state;
+
+    if (!disk_state) {
+        libxl__remus_disk_teardown_done(egc, dss);
+        return;
+    }
+
+    libxl__ev_child_init(&disk_state->async_exec.child);
+    disk_state->async_exec.ao = dss->ao;
+
+    disk_state->curr_disk_id = 0;
+    disk_teardown_once(disk_state);
+}
diff --git a/tools/libxl/libxl_remus_disk.h b/tools/libxl/libxl_remus_disk.h
new file mode 100644
index 0000000..33b0e59
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_REMUS_DISK_H
+#define LIBXL_REMUS_DISK_H
+
+typedef struct libxl__remus_disk {
+    const struct libxl_device_disk *disk;
+    const struct libxl__remus_disk_type *type;
+    void *opaque;
+    void *disk_state;
+
+    /*
+     * the asynchronous function will fill it, we will output the
+     * script name when the script fails
+     */
+    const char *disk_script;
+} libxl__remus_disk;
+
+typedef struct libxl__remus_disk_type {
+    /* checkpointing */
+    int (*postsuspend)(libxl__remus_disk *remus_disk);
+    int (*preresume)(libxl__remus_disk *remus_disk);
+    int (*commit)(libxl__remus_disk *remus_disk);
+
+    /*
+     * Return value:
+     *   1: the disk is not this type or the script is still running
+     *   0: the disk is this type
+     *  -1: error
+     */
+    int (*match)(libxl__domain_suspend_state *dss,
+                 const libxl_device_disk *disk,
+                 libxl_async_exec *async_exec,
+                 void *disk_state);
+
+    /*
+     * This is synchronous callback. Return value:
+     *  0: setup is done
+     * -1: error
+     *
+     */
+    int (*setup)(libxl__remus_disk *remus_disk);
+
+    /*
+     * Return value:
+     *   1: the script is still running
+     *   0: the script is done
+     *  -1: error
+     */
+    int (*teardown)(libxl__remus_disk *remus_disk,
+                    libxl_async_exec *async_exec);
+
+    /* the size of the private data */
+    int size;
+} libxl__remus_disk_type;
+
+/* used for asynchronous API */
+extern void disk_match_script_cb(void *disk_state, int status);
+extern void disk_teardown_script_cb(void *disk_state);
+
+#endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 7/7] drbd: implement replicated checkpointing disk
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
                       ` (4 preceding siblings ...)
  2014-04-03 12:22     ` [PATCH 6/7] remus: implement remus replicated checkpointing disk Lai Jiangshan
@ 2014-04-03 12:22     ` Lai Jiangshan
  2014-04-03 16:07       ` Shriram Rajagopalan
  2014-04-03 14:08     ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Ian Jackson
  6 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-03 12:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |    1 +
 tools/hotplug/Linux/block-drbd-probe |   56 ++++++++++++++
 tools/libxl/Makefile                 |    2 +-
 tools/libxl/libxl_remus_disk.c       |    1 +
 tools/libxl/libxl_remus_disk.h       |    2 +
 tools/libxl/libxl_remus_disk_drbd.c  |  132 ++++++++++++++++++++++++++++++++++
 6 files changed, 193 insertions(+), 1 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 6139c1f..b830a8e 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -24,6 +24,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..432f051
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,56 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+exit $?
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index b040a79..658e1b1 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -53,7 +53,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o
-LIBXL_OBJS-y += libxl_remus_disk.o
+LIBXL_OBJS-y += libxl_remus_disk.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_remus_disk.c b/tools/libxl/libxl_remus_disk.c
index ca3e879..3af43e6 100644
--- a/tools/libxl/libxl_remus_disk.c
+++ b/tools/libxl/libxl_remus_disk.c
@@ -33,6 +33,7 @@ typedef struct libxl__remus_disk_state {
 /*** checkpoint disks states and callbacks ***/
 static const libxl__remus_disk_type *remus_disk_types[] =
 {
+    &drbd_disk_type,
 };
 
 int libxl__remus_disk_postsuspend(libxl__remus_state *remus_state)
diff --git a/tools/libxl/libxl_remus_disk.h b/tools/libxl/libxl_remus_disk.h
index 33b0e59..dfd2432 100644
--- a/tools/libxl/libxl_remus_disk.h
+++ b/tools/libxl/libxl_remus_disk.h
@@ -71,4 +71,6 @@ typedef struct libxl__remus_disk_type {
 extern void disk_match_script_cb(void *disk_state, int status);
 extern void disk_teardown_script_cb(void *disk_state);
 
+extern const libxl__remus_disk_type drbd_disk_type;
+
 #endif
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..719e950
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+#include "libxl_remus_disk.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+char *drbd_probe_script;
+
+typedef struct libxl__remus_drbd_disk
+{
+    int ctl_fd;
+    int ackwait;
+    const char *path;
+} libxl__remus_drbd_disk;
+
+static int drbd_postsuspend(libxl__remus_disk *remus_disk)
+{
+    struct libxl__remus_drbd_disk *drbd = remus_disk->opaque;
+
+    if (!drbd->ackwait) {
+        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            drbd->ackwait = 1;
+    }
+
+    return 0;
+}
+
+static int drbd_preresume(libxl__remus_disk *remus_disk)
+{
+    struct libxl__remus_drbd_disk *drbd = remus_disk->opaque;
+
+    if (drbd->ackwait) {
+        ioctl(drbd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        drbd->ackwait = 0;
+    }
+
+    return 0;
+}
+
+static int drbd_commit(libxl__remus_disk *remus_disk)
+{
+    /* nothing to do, all work are done by DRBD's protocal-D. */
+    return 0;
+}
+
+static int drbd_match(libxl__domain_suspend_state *dss,
+                      const libxl_device_disk *disk,
+                      libxl_async_exec *async_exec,
+                      void *disk_state)
+{
+    int arraysize, nr = 0;
+    STATE_AO_GC(dss->ao);
+
+    if (!drbd_probe_script)
+        drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                      libxl__xen_script_dir_path());
+
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(async_exec->env, arraysize);
+    async_exec->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(async_exec->args, arraysize);
+    async_exec->args[nr++] = drbd_probe_script;
+    async_exec->args[nr++] = disk->pdev_path;
+    async_exec->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    async_exec->finish_cb = disk_match_script_cb;
+    async_exec->opaque = disk_state;
+    async_exec->allow_fail = true;
+    async_exec->timeout = LIBXL_HOTPLUG_TIMEOUT;
+
+    if (libxl_async_exec_script(gc, async_exec))
+        return ERROR_FAIL;
+
+    return 1;
+}
+
+static int drbd_setup(libxl__remus_disk *remus_disk)
+{
+    libxl__remus_drbd_disk *drbd = remus_disk->opaque;
+
+    drbd->path = remus_disk->disk->pdev_path;
+    drbd->ctl_fd = open(drbd->path, O_RDONLY);
+    drbd->ackwait = 0;
+
+    if (drbd->ctl_fd < 0)
+        return ERROR_INVAL;
+
+    return 0;
+}
+
+static int drbd_teardown(libxl__remus_disk *remus_disk, libxl_async_exec *async_exec)
+{
+    struct libxl__remus_drbd_disk *drbd = remus_disk->opaque;
+
+    close(drbd->ctl_fd);
+    return 0;
+}
+
+const libxl__remus_disk_type drbd_disk_type = {
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+    .commit = drbd_commit,
+    .match = drbd_match,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+    .size = sizeof(libxl__remus_drbd_disk),
+};
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown [and 1 more messages]
  2014-03-03 17:44       ` Ian Jackson
@ 2014-04-03 14:06         ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-04-03 14:06 UTC (permalink / raw)
  To: Yang Hongyang, Lai Jiangshan, xen-devel, Shriram Rajagopalan,
	Andrew Cooper, Roger Pau Monne, Ian Campbell, Stefano Stabellini,
	Dong Eddie, Jiang Yunhong, FNST-Wen Congyang

Yang Hongyang writes ("[PATCH V8 3/8] remus: Remus network buffering core and APIs to setup/teardown"):
> 1.Add two members in libxl_domain_remus_info:
>     netbuf: whether netbuf is enabled
>     netbufscript: the path of the script which will be run to setup
>        and tear down the guest's interface.
> 2.introduce a new structure libxl__remus_state to save the remus state
> 3.introduces remus-netbuf-setup hotplug script responsible for
>   setting up and tearing down the necessary infrastructure required for
>   network output buffering in Remus.  This script is intended to be invoked
>   by libxl for each guest interface, when starting or stopping Remus.

Thanks for your submission.

However, the last time this was posted, I commented as follows:

  > This function [netbuf_setup_timeout_cb] bears a striking
  > resemblance to device_hotplug_timeout_cb.  Likewise parts of
  > exec_netbuf_script look very much like parts of device_hotplug,
  > etc.
  > 
  > You should arrange to reuse code rather than clone-and-hacking it,
  > refactoring if necessary.  If refactoring is necessary, that should be
  > brought out into a pre-patch with no functional change.

It looks like several of my other comments haven't been taken into
account, either.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done()
  2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
                       ` (5 preceding siblings ...)
  2014-04-03 12:22     ` [PATCH 7/7] drbd: implement " Lai Jiangshan
@ 2014-04-03 14:08     ` Ian Jackson
  2014-04-04  8:53       ` Hongyang Yang
  6 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-04-03 14:08 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, FNST-Yang Hongyang, Roger Pau Monne

Lai Jiangshan writes ("[PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done()"):
> We will exec some scripts to setup netbuf and disk, so we need two
> asynchronous functions that are called when the setup is done.

Is there a 0/7 for this series somewhere ?  I don't seem to have a
copy of it.

Is this series complementary to Yang Hongyang's or does it want to go
on top of it, or what ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/7] drbd: implement replicated checkpointing disk
  2014-04-03 12:22     ` [PATCH 7/7] drbd: implement " Lai Jiangshan
@ 2014-04-03 16:07       ` Shriram Rajagopalan
  0 siblings, 0 replies; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-04-03 16:07 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson,
	<xen-devel@lists.xen.org>,
	Dong Eddie, FNST-Yang Hongyang, Roger Pau Monne


> On Apr 3, 2014, at 5:22 AM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
> tools/hotplug/Linux/Makefile         |    1 +
> tools/hotplug/Linux/block-drbd-probe |   56 ++++++++++++++
> tools/libxl/Makefile                 |    2 +-
> tools/libxl/libxl_remus_disk.c       |    1 +
> tools/libxl/libxl_remus_disk.h       |    2 +
> tools/libxl/libxl_remus_disk_drbd.c  |  132 ++++++++++++++++++++++++++++++++++
> 6 files changed, 193 insertions(+), 1 deletions(-)
> create mode 100755 tools/hotplug/Linux/block-drbd-probe
> create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
> 
> diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
> index 6139c1f..b830a8e 100644
> --- a/tools/hotplug/Linux/Makefile
> +++ b/tools/hotplug/Linux/Makefile
> @@ -24,6 +24,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
> XEN_SCRIPTS += external-device-migrate
> XEN_SCRIPTS += vscsi
> XEN_SCRIPTS += block-iscsi
> +XEN_SCRIPTS += block-drbd-probe
> XEN_SCRIPTS += $(XEN_SCRIPTS-y)
> 
> XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
> diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
> new file mode 100755
> index 0000000..432f051
> --- /dev/null
> +++ b/tools/hotplug/Linux/block-drbd-probe
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +#
> +# Copyright (C) 2014 FUJITSU LIMITED
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of version 2.1 of the GNU Lesser General Public
> +# License as published by the Free Software Foundation.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> +#
> +# Usage:
> +#     block-drbd-probe devicename
> +#
> +# Return value:
> +#     0: the device is drbd device
> +#     1: the device is not drbd device
> +

I was under the impression xl had a separate type for drbd disks.
Anyway, since the disk type checking is done in an external script, 
I suggest you also check for the protocol type (proto D) and drbd version (8.3.9 or 8.3.11 iirc).
Otherwise, there is no point initiating drbd replication with Remus.
The old python code has this check, in case you need a reference implementation. 

> +function get_res_name()
> +{
> +    local drbd_dev=$1
> +    local drbd_dev_list=($(drbdadm sh-dev all))
> +    local drbd_res_list=($(drbdadm sh-resource all))
> +    local temp_drbd_dev temp_drbd_res
> +    local found=0
> +
> +    for temp_drbd_dev in ${drbd_dev_list[@]}; do
> +        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
> +            found=1
> +            break
> +        fi
> +    done
> +
> +    if [[ $found -eq 0 ]]; then
> +        return 1
> +    fi
> +
> +    for temp_drbd_res in ${drbd_res_list[@]}; do
> +        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
> +        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
> +            return 0
> +        fi
> +    done
> +
> +    # OOPS
> +    return 2
> +}
> +
> +get_res_name $1
> +exit $?
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index b040a79..658e1b1 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -53,7 +53,7 @@ LIBXL_OBJS-y += libxl_nonetbuffer.o
> endif
> 
> LIBXL_OBJS-y += libxl_remus.o
> -LIBXL_OBJS-y += libxl_remus_disk.o
> +LIBXL_OBJS-y += libxl_remus_disk.o libxl_remus_disk_drbd.o
> 
> LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
> LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
> diff --git a/tools/libxl/libxl_remus_disk.c b/tools/libxl/libxl_remus_disk.c
> index ca3e879..3af43e6 100644
> --- a/tools/libxl/libxl_remus_disk.c
> +++ b/tools/libxl/libxl_remus_disk.c
> @@ -33,6 +33,7 @@ typedef struct libxl__remus_disk_state {
> /*** checkpoint disks states and callbacks ***/
> static const libxl__remus_disk_type *remus_disk_types[] =
> {
> +    &drbd_disk_type,
> };
> 
> int libxl__remus_disk_postsuspend(libxl__remus_state *remus_state)
> diff --git a/tools/libxl/libxl_remus_disk.h b/tools/libxl/libxl_remus_disk.h
> index 33b0e59..dfd2432 100644
> --- a/tools/libxl/libxl_remus_disk.h
> +++ b/tools/libxl/libxl_remus_disk.h
> @@ -71,4 +71,6 @@ typedef struct libxl__remus_disk_type {
> extern void disk_match_script_cb(void *disk_state, int status);
> extern void disk_teardown_script_cb(void *disk_state);
> 
> +extern const libxl__remus_disk_type drbd_disk_type;
> +
> #endif
> diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
> new file mode 100644
> index 0000000..719e950
> --- /dev/null
> +++ b/tools/libxl/libxl_remus_disk_drbd.c
> @@ -0,0 +1,132 @@
> +/*
> + * Copyright (C) 2014 FUJITSU LIMITED
> + * Author Lai Jiangshan <laijs@cn.fujitsu.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +
> +#include "libxl_internal.h"
> +
> +#include "libxl_remus_disk.h"
> +
> +/*** drbd implementation ***/
> +const int DRBD_SEND_CHECKPOINT = 20;
> +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
> +
> +char *drbd_probe_script;
> +
> +typedef struct libxl__remus_drbd_disk
> +{
> +    int ctl_fd;
> +    int ackwait;
> +    const char *path;
> +} libxl__remus_drbd_disk;
> +
> +static int drbd_postsuspend(libxl__remus_disk *remus_disk)
> +{
> +    struct libxl__remus_drbd_disk *drbd = remus_disk->opaque;
> +
> +    if (!drbd->ackwait) {
> +        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
> +            drbd->ackwait = 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int drbd_preresume(libxl__remus_disk *remus_disk)
> +{
> +    struct libxl__remus_drbd_disk *drbd = remus_disk->opaque;
> +
> +    if (drbd->ackwait) {
> +        ioctl(drbd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
> +        drbd->ackwait = 0;
> +    }
> +
> +    return 0;
> +}
> +
> +static int drbd_commit(libxl__remus_disk *remus_disk)
> +{
> +    /* nothing to do, all work are done by DRBD's protocal-D. */
> +    return 0;
> +}
> +
> +static int drbd_match(libxl__domain_suspend_state *dss,
> +                      const libxl_device_disk *disk,
> +                      libxl_async_exec *async_exec,
> +                      void *disk_state)
> +{
> +    int arraysize, nr = 0;
> +    STATE_AO_GC(dss->ao);
> +
> +    if (!drbd_probe_script)
> +        drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
> +                                      libxl__xen_script_dir_path());
> +
> +    /* setup env & args */
> +    arraysize = 1;
> +    GCNEW_ARRAY(async_exec->env, arraysize);
> +    async_exec->env[nr++] = NULL;
> +    assert(nr <= arraysize);
> +
> +    arraysize = 3;
> +    nr = 0;
> +    GCNEW_ARRAY(async_exec->args, arraysize);
> +    async_exec->args[nr++] = drbd_probe_script;
> +    async_exec->args[nr++] = disk->pdev_path;
> +    async_exec->args[nr++] = NULL;
> +    assert(nr <= arraysize);
> +
> +    async_exec->finish_cb = disk_match_script_cb;
> +    async_exec->opaque = disk_state;
> +    async_exec->allow_fail = true;
> +    async_exec->timeout = LIBXL_HOTPLUG_TIMEOUT;
> +
> +    if (libxl_async_exec_script(gc, async_exec))
> +        return ERROR_FAIL;
> +
> +    return 1;
> +}
> +
> +static int drbd_setup(libxl__remus_disk *remus_disk)
> +{
> +    libxl__remus_drbd_disk *drbd = remus_disk->opaque;
> +
> +    drbd->path = remus_disk->disk->pdev_path;
> +    drbd->ctl_fd = open(drbd->path, O_RDONLY);
> +    drbd->ackwait = 0;
> +
> +    if (drbd->ctl_fd < 0)
> +        return ERROR_INVAL;
> +
> +    return 0;
> +}
> +
> +static int drbd_teardown(libxl__remus_disk *remus_disk, libxl_async_exec *async_exec)
> +{
> +    struct libxl__remus_drbd_disk *drbd = remus_disk->opaque;
> +
> +    close(drbd->ctl_fd);
> +    return 0;
> +}
> +
> +const libxl__remus_disk_type drbd_disk_type = {
> +    .postsuspend = drbd_postsuspend,
> +    .preresume = drbd_preresume,
> +    .commit = drbd_commit,
> +    .match = drbd_match,
> +    .setup = drbd_setup,
> +    .teardown = drbd_teardown,
> +    .size = sizeof(libxl__remus_drbd_disk),
> +};
> -- 
> 1.7.4.4
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 6/7] remus: implement remus replicated checkpointing disk
  2014-04-03 12:22     ` [PATCH 6/7] remus: implement remus replicated checkpointing disk Lai Jiangshan
@ 2014-04-03 16:41       ` Shriram Rajagopalan
  2014-04-04  3:04         ` Lai Jiangshan
  0 siblings, 1 reply; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-04-03 16:41 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson,
	<xen-devel@lists.xen.org>,
	Dong Eddie, FNST-Yang Hongyang, Roger Pau Monne

> @@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void *data)
>     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>         return 0;
>
> -    /* REMUS TODO: Deal with disk. */
> +    /* Deal with disk. */
> +    if (libxl__remus_disk_preresume(dss->remus_state))
> +        return 0;
> +
>     return 1;
> }
>

Bug. I think I mentioned this last time. Disk needs to be resumed before the
domain is resumed. Just move the domain resume call below the above
code snippet.


> +typedef struct libxl__remus_disk_type {
> +    /* checkpointing */
> +    int (*postsuspend)(libxl__remus_disk *remus_disk);
> +    int (*preresume)(libxl__remus_disk *remus_disk);
> +    int (*commit)(libxl__remus_disk *remus_disk);
> +
> +    /*
> +     * Return value:
> +     *   1: the disk is not this type or the script is still running
> +     *   0: the disk is this type
> +     *  -1: error
> +     */
> +    int (*match)(libxl__domain_suspend_state *dss,
> +                 const libxl_device_disk *disk,
> +                 libxl_async_exec *async_exec,
> +                 void *disk_state);
> +
> +    /*
> +     * This is synchronous callback. Return value:
> +     *  0: setup is done
> +     * -1: error
> +     *
> +     */
> +    int (*setup)(libxl__remus_disk *remus_disk);
> +
> +    /*
> +     * Return value:
> +     *   1: the script is still running
> +     *   0: the script is done
> +     *  -1: error
> +     */
> +    int (*teardown)(libxl__remus_disk *remus_disk,
> +                    libxl_async_exec *async_exec);
> +
> +    /* the size of the private data */
> +    int size;
> +} libxl__remus_disk_type;
> +


This vtable approach is neat. I am fine with the current disk
checkpoint approach you have taken.

Something that might be worth thinking about:
The old remus code used this approach for both the disk and network buffering.
Given that this code is going in a similar direction, I suggest
hoisting this structure
up to an abstract buffer type, with setup, teardown, postsuspend, preresume and
commit callbacks.

For disks, semantically,
setup [..]
teardown [..]
postsuspend [start flushing buffered writes to backup host]
preresume [wait until all writes have been flushed to backup host]
commit  [no-op]

For network devices, semantically,
setup [..]
teardown [..]
postsuspend [no-op]
preresume [start_new_epoch - libnl call]
commit [release_prev_epoch - libnl call]

This way, in domain_suspend_done, the only thing we need to do is
foreach remus buffer
 buffer.postsuspend()

Similarly, in resume_callback()

foreach remus buffer
 buffer.preresume()
domain_resume()


in remus_checkpoint_dm_saved()
 foreach remus buffer
  buffer.commit()

Lai, I can take an crack at it if you would like.

shriram

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 6/7] remus: implement remus replicated checkpointing disk
  2014-04-03 16:41       ` Shriram Rajagopalan
@ 2014-04-04  3:04         ` Lai Jiangshan
  0 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-04  3:04 UTC (permalink / raw)
  To: rshriram
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson,
	<xen-devel@lists.xen.org>,
	Dong Eddie, FNST-Yang Hongyang, Roger Pau Monne

On 04/04/2014 12:41 AM, Shriram Rajagopalan wrote:
>> @@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void *data)
>>     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>>         return 0;
>>
>> -    /* REMUS TODO: Deal with disk. */
>> +    /* Deal with disk. */
>> +    if (libxl__remus_disk_preresume(dss->remus_state))
>> +        return 0;
>> +
>>     return 1;
>> }
>>
> 
> Bug. I think I mentioned this last time. Disk needs to be resumed before the
> domain is resumed. Just move the domain resume call below the above
> code snippet.
> 
> 
>> +typedef struct libxl__remus_disk_type {
>> +    /* checkpointing */
>> +    int (*postsuspend)(libxl__remus_disk *remus_disk);
>> +    int (*preresume)(libxl__remus_disk *remus_disk);
>> +    int (*commit)(libxl__remus_disk *remus_disk);
>> +
>> +    /*
>> +     * Return value:
>> +     *   1: the disk is not this type or the script is still running
>> +     *   0: the disk is this type
>> +     *  -1: error
>> +     */
>> +    int (*match)(libxl__domain_suspend_state *dss,
>> +                 const libxl_device_disk *disk,
>> +                 libxl_async_exec *async_exec,
>> +                 void *disk_state);
>> +
>> +    /*
>> +     * This is synchronous callback. Return value:
>> +     *  0: setup is done
>> +     * -1: error
>> +     *
>> +     */
>> +    int (*setup)(libxl__remus_disk *remus_disk);
>> +
>> +    /*
>> +     * Return value:
>> +     *   1: the script is still running
>> +     *   0: the script is done
>> +     *  -1: error
>> +     */
>> +    int (*teardown)(libxl__remus_disk *remus_disk,
>> +                    libxl_async_exec *async_exec);
>> +
>> +    /* the size of the private data */
>> +    int size;
>> +} libxl__remus_disk_type;
>> +
> 
> 
> This vtable approach is neat. I am fine with the current disk
> checkpoint approach you have taken.
> 
> Something that might be worth thinking about:
> The old remus code used this approach for both the disk and network buffering.
> Given that this code is going in a similar direction, I suggest
> hoisting this structure
> up to an abstract buffer type, with setup, teardown, postsuspend, preresume and
> commit callbacks.
> 
> For disks, semantically,
> setup [..]
> teardown [..]
> postsuspend [start flushing buffered writes to backup host]
> preresume [wait until all writes have been flushed to backup host]
> commit  [no-op]
> 
> For network devices, semantically,
> setup [..]
> teardown [..]
> postsuspend [no-op]
> preresume [start_new_epoch - libnl call]
> commit [release_prev_epoch - libnl call]
> 
> This way, in domain_suspend_done, the only thing we need to do is
> foreach remus buffer
>  buffer.postsuspend()
> 
> Similarly, in resume_callback()
> 
> foreach remus buffer
>  buffer.preresume()
> domain_resume()
> 
> 
> in remus_checkpoint_dm_saved()
>  foreach remus buffer
>   buffer.commit()
> 
> Lai, I can take an crack at it if you would like.
> 

Your idea is great, I look forward to your work on it.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done()
  2014-04-03 14:08     ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Ian Jackson
@ 2014-04-04  8:53       ` Hongyang Yang
  0 siblings, 0 replies; 89+ messages in thread
From: Hongyang Yang @ 2014-04-04  8:53 UTC (permalink / raw)
  To: Ian Jackson, Lai Jiangshan
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Shriram Rajagopalan, Roger Pau Monne

On 2014年04月03日 22:08, Ian Jackson wrote:
> Lai Jiangshan writes ("[PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done()"):
>> We will exec some scripts to setup netbuf and disk, so we need two
>> asynchronous functions that are called when the setup is done.
> Is there a 0/7 for this series somewhere ?  I don't seem to have a
> copy of it.
>
> Is this series complementary to Yang Hongyang's or does it want to go
> on top of it, or what ?

Hi, Ian, It's based on the remus net patchset I've send.

Thanks

>
> Thanks,
> Ian.
> .
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH V9 00/12] Remus/Libxl: Network buffering support
@ 2014-04-15  5:38 Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 01/12] introduce an API to async exec scripts Yang Hongyang
                   ` (11 more replies)
  0 siblings, 12 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

This patch series adds support for network buffering in the Remus
codebase in libxl. 

Changes in V9:
  introduce an API to async exec scripts for both device
  and netbuffer.
  Use async exec script api to exec scripts.

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks

Shriram Rajagopalan (7):
  remus: add libnl3 dependency to autoconf scripts
  remus: introduce a function to check whether network buffering is
    enabled
  remus: Remus network buffering core and APIs to setup/teardown
  remus: implement the API to buffer/release packages
  libxl: rename remus_failover_cb() to remus_replication_failure_cb()
  libxl: control network buffering in remus callbacks
  libxl: network buffering cmdline switch

Yang Hongyang (5):
  introduce an API to async exec scripts
  libxl_device: use async exec script api
  remus: remus device core and APIs to setup/teardown
  remus: implement the API for checkpoint
  libxl: use the API to setup/teardown network buffering

 README                                 |   4 +
 config/Tools.mk.in                     |   3 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  11 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  15 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
 tools/libxl/Makefile                   |  11 +
 tools/libxl/libxl.c                    |  42 ++-
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_device.c             |  80 ++----
 tools/libxl/libxl_dom.c                |  83 +++++-
 tools/libxl/libxl_internal.c           |  82 ++++++
 tools/libxl/libxl_internal.h           |  70 ++++-
 tools/libxl/libxl_netbuffer.c          | 491 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  54 ++++
 tools/libxl/libxl_remus.c              |  57 ++++
 tools/libxl/libxl_remus_device.c       | 383 +++++++++++++++++++++++++
 tools/libxl/libxl_remus_device.h       |  99 +++++++
 tools/libxl/libxl_types.idl            |   2 +
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  28 +-
 tools/libxl/xl_cmdtable.c              |   3 +
 tools/remus/README                     |   6 +
 26 files changed, 1650 insertions(+), 86 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus.c
 create mode 100644 tools/libxl/libxl_remus_device.c
 create mode 100644 tools/libxl/libxl_remus_device.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH V9 01/12] introduce an API to async exec scripts
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-23 15:44   ` Ian Jackson
  2014-04-15  5:38 ` [PATCH V9 02/12] libxl_device: use async exec script api Yang Hongyang
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

introduce an API to async exec scripts.it will be used
for both device and netbuffer.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.c | 82 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h | 21 ++++++++++++
 2 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_internal.c b/tools/libxl/libxl_internal.c
index 6c94d3e..e47eb90 100644
--- a/tools/libxl/libxl_internal.c
+++ b/tools/libxl/libxl_internal.c
@@ -375,6 +375,88 @@ out:
     return rc;
 }
 
+static void libxl_async_exec_timeout(libxl__egc *egc,
+                                     libxl__ev_time *ev,
+                                     const struct timeval *requested_abs)
+{
+    libxl_async_exec *async_exec = CONTAINER_OF(ev, *async_exec, time);
+
+    STATE_AO_GC(async_exec->ao);
+
+    libxl__ev_time_deregister(gc, &async_exec->time);
+    assert(libxl__ev_child_inuse(&async_exec->child));
+
+    LOG(DEBUG, "killing hotplug script %s because of timeout",
+        async_exec->args[0]);
+
+    if (kill(async_exec->child.pid, SIGKILL)) {
+        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
+              async_exec->args[0],
+              (unsigned long)async_exec->child.pid);
+    }
+
+    return;
+}
+
+static void libxl_async_exec_done(libxl__egc *egc,
+                                  libxl__ev_child *child,
+                                  pid_t pid, int status)
+{
+    libxl_async_exec *async_exec = CONTAINER_OF(child, *async_exec, child);
+
+    STATE_AO_GC(async_exec->ao);
+
+    libxl__ev_time_deregister(gc, &async_exec->time);
+
+    if (status && !async_exec->allow_fail) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      async_exec->args[0],
+                                      pid, status);
+    }
+
+    async_exec->finish_cb(async_exec->opaque, status);
+}
+
+int libxl_async_exec_script(libxl__gc *gc, libxl_async_exec *async_exec)
+{
+    pid_t pid;
+
+    /* Convenience aliases */
+    libxl__ev_child *const child = &async_exec->child;
+    char * const *args = async_exec->args;
+    char * const *env = async_exec->env;
+    const int stdinfd = async_exec->stdinfd;
+    const int stdoutfd = async_exec->stdoutfd;
+    const int stderrfd = async_exec->stderrfd;
+
+    /* Set hotplug timeout */
+    if (libxl__ev_time_register_rel(gc, &async_exec->time,
+                                    libxl_async_exec_timeout,
+                                    async_exec->timeout * 1000)) {
+        LOG(ERROR, "unable to register timeout for "
+            "script %s", args[0]);
+        return ERROR_FAIL;
+    }
+
+    LOG(DEBUG, "Calling script: %s ", args[0]);
+
+    /* Fork and exec netbuf script */
+    pid = libxl__ev_child_fork(gc, child, libxl_async_exec_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork for script %s", args[0]);
+        return ERROR_FAIL;
+    }
+
+    if (!pid) {
+        /* child: Launch netbuf script */
+        libxl__exec(gc, stdinfd, stdoutfd, stderrfd, args[0], args, env);
+        /* notreached */
+        abort();
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c2b73c4..eddafaf 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2030,6 +2030,27 @@ _hidden const char *libxl__xen_script_dir_path(void);
 _hidden const char *libxl__lock_dir_path(void);
 _hidden const char *libxl__run_dir_path(void);
 
+/*----- asynchronous function -----*/
+typedef struct libxl_async_exec {
+    char **env;
+    char **args;
+    void *opaque;
+    void (*finish_cb)(void *opaque, int status);
+    /* unit: second */
+    int timeout;
+    bool allow_fail;
+    int stdinfd;
+    int stdoutfd;
+    int stderrfd;
+
+    libxl__ev_time time;
+    libxl__ev_child child;
+    libxl__ao *ao;
+} libxl_async_exec;
+
+_hidden extern int libxl_async_exec_script(libxl__gc *gc,
+                                           libxl_async_exec *async_exec);
+
 /*----- device addition/removal -----*/
 
 typedef struct libxl__ao_device libxl__ao_device;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 02/12] libxl_device: use async exec script api
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 01/12] introduce an API to async exec scripts Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-23 15:48   ` Ian Jackson
  2014-04-15  5:38 ` [PATCH V9 03/12] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

use async exec script api to exec device related scripts.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_device.c   | 80 ++++++++++++--------------------------------
 tools/libxl/libxl_internal.h |  4 ++-
 2 files changed, 25 insertions(+), 59 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index fa99f77..2e26799 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -440,7 +440,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
     aodev->active = 1;
     /* We init this here because we might call device_hotplug_done
      * without actually calling any hotplug script */
-    libxl__ev_child_init(&aodev->child);
+    libxl__ev_child_init(&aodev->async_exec.child);
 }
 
 /* multidev */
@@ -707,12 +707,7 @@ static void device_backend_cleanup(libxl__gc *gc,
 
 static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev);
 
-static void device_hotplug_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
-                                      const struct timeval *requested_abs);
-
-static void device_hotplug_child_death_cb(libxl__egc *egc,
-                                          libxl__ev_child *child,
-                                          pid_t pid, int status);
+static void device_hotplug_child_death_cb(void *opaque, int status);
 
 static void device_destroy_be_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
                                          const struct timeval *requested_abs);
@@ -957,7 +952,6 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
     char **args = NULL, **env = NULL;
     int rc = 0;
     int hotplug, nullfd = -1;
-    pid_t pid;
     uint32_t domid;
 
     /*
@@ -1009,15 +1003,6 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
     }
 
-    /* Set hotplug timeout */
-    rc = libxl__ev_time_register_rel(gc, &aodev->timeout,
-                                     device_hotplug_timeout_cb,
-                                     LIBXL_HOTPLUG_TIMEOUT * 1000);
-    if (rc) {
-        LOG(ERROR, "unable to register timeout for hotplug device %s", be_path);
-        goto out;
-    }
-
     aodev->what = GCSPRINTF("%s %s", args[0], args[1]);
     LOG(DEBUG, "calling hotplug script: %s %s", args[0], args[1]);
 
@@ -1028,23 +1013,24 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
     }
 
-    /* fork and execute hotplug script */
-    pid = libxl__ev_child_fork(gc, &aodev->child, device_hotplug_child_death_cb);
-    if (pid == -1) {
-        LOG(ERROR, "unable to fork");
-        rc = ERROR_FAIL;
+    aodev->egc = egc;
+    aodev->async_exec.env = env;
+    aodev->async_exec.args = args;
+    aodev->async_exec.opaque = aodev;
+    aodev->async_exec.finish_cb = device_hotplug_child_death_cb;
+    aodev->async_exec.timeout = LIBXL_HOTPLUG_TIMEOUT;
+    aodev->async_exec.allow_fail = false;
+    aodev->async_exec.stdinfd = nullfd;
+    aodev->async_exec.stdoutfd = 2;
+    aodev->async_exec.stderrfd = -1;
+    aodev->async_exec.ao = ao;
+
+    rc = libxl_async_exec_script(gc, &aodev->async_exec);
+    if (rc)
         goto out;
-    }
-
-    if (!pid) {
-        /* child */
-        libxl__exec(gc, nullfd, 2, -1, args[0], args, env);
-        /* notreached */
-        abort();
-    }
 
     close(nullfd);
-    assert(libxl__ev_child_inuse(&aodev->child));
+    assert(libxl__ev_child_inuse(&aodev->async_exec.child));
 
     return;
 
@@ -1055,29 +1041,9 @@ out:
     return;
 }
 
-static void device_hotplug_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
-                                      const struct timeval *requested_abs)
-{
-    libxl__ao_device *aodev = CONTAINER_OF(ev, *aodev, timeout);
-    STATE_AO_GC(aodev->ao);
-
-    libxl__ev_time_deregister(gc, &aodev->timeout);
-
-    assert(libxl__ev_child_inuse(&aodev->child));
-    LOG(DEBUG, "killing hotplug script %s because of timeout", aodev->what);
-    if (kill(aodev->child.pid, SIGKILL)) {
-        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
-                            aodev->what, (unsigned long)aodev->child.pid);
-    }
-
-    return;
-}
-
-static void device_hotplug_child_death_cb(libxl__egc *egc,
-                                          libxl__ev_child *child,
-                                          pid_t pid, int status)
+static void device_hotplug_child_death_cb(void *opaque, int status)
 {
-    libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
+    libxl__ao_device *aodev = opaque;
     STATE_AO_GC(aodev->ao);
     char *be_path = libxl__device_backend_path(gc, aodev->dev);
     char *hotplug_error;
@@ -1085,8 +1051,6 @@ static void device_hotplug_child_death_cb(libxl__egc *egc,
     device_hotplug_clean(gc, aodev);
 
     if (status) {
-        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
-                                      aodev->what, pid, status);
         hotplug_error = libxl__xs_read(gc, XBT_NULL,
                                        GCSPRINTF("%s/hotplug-error", be_path));
         if (hotplug_error)
@@ -1105,13 +1069,13 @@ static void device_hotplug_child_death_cb(libxl__egc *egc,
      * device_hotplug_done breaking the loop.
      */
     aodev->num_exec++;
-    device_hotplug(egc, aodev);
+    device_hotplug(aodev->egc, aodev);
 
     return;
 
 error:
     assert(aodev->rc);
-    device_hotplug_done(egc, aodev);
+    device_hotplug_done(aodev->egc, aodev);
 }
 
 static void device_destroy_be_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
@@ -1178,7 +1142,7 @@ static void device_hotplug_clean(libxl__gc *gc, libxl__ao_device *aodev)
     /* Clean events and check reentrancy */
     libxl__ev_time_deregister(gc, &aodev->timeout);
     libxl__ev_xswatch_deregister(gc, &aodev->xs_watch);
-    assert(!libxl__ev_child_inuse(&aodev->child));
+    assert(!libxl__ev_child_inuse(&aodev->async_exec.child));
 }
 
 static void devices_remove_callback(libxl__egc *egc,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index eddafaf..cc8d558 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2094,7 +2094,9 @@ struct libxl__ao_device {
     /* device hotplug execution */
     const char *what;
     int num_exec;
-    libxl__ev_child child;
+
+    libxl__egc *egc;
+    libxl_async_exec async_exec;
 };
 
 /*
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 03/12] remus: add libnl3 dependency to autoconf scripts
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 01/12] introduce an API to async exec scripts Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 02/12] libxl_device: use async exec script api Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 README               |  4 ++++
 config/Tools.mk.in   |  3 +++
 tools/configure.ac   | 15 +++++++++++++++
 tools/libxl/Makefile |  2 ++
 tools/remus/README   |  6 ++++++
 5 files changed, 30 insertions(+)

diff --git a/README b/README
index 9bbe734..e770932 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
     * cmake (if building vtpm stub domains)
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 0bdf37a..e8957b6 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -38,6 +38,8 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -54,6 +56,7 @@ CONFIG_SEABIOS      := @seabios@
 CONFIG_QEMU_TRAD    := @qemu_traditional@
 CONFIG_QEMU_XEN     := @qemu_xen@
 CONFIG_BLKTAP1      := @blktap1@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff --git a/tools/configure.ac b/tools/configure.ac
index 00fb47b..7f4e377 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -237,5 +237,20 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+		[libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+	    AC_MSG_WARN([Disabling support for Remus network buffering.
+	    Please install libnl3 libraries, command line tools and devel
+	    headers - version 3.2.8 or higher])
+	    AC_SUBST(remus_netbuf, [n])
+	    ],[
+	    AC_SUBST(LIBNL3_LIBS)
+	    AC_SUBST(LIBNL3_CFLAGS)
+	    AC_SUBST(remus_netbuf, [y])
+])
+
 AC_OUTPUT()
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 755b666..3647a2a 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,13 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+LIBXL_LIBS += $(LIBNL3_LIBS)
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
diff --git a/tools/remus/README b/tools/remus/README
index 9e8140b..4736252 100644
--- a/tools/remus/README
+++ b/tools/remus/README
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://nss.cs.ubc.ca/remus/ for details.
+
+Using Remus with libxl on Xen 4.4 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 03/12] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-23 15:50   ` Ian Jackson
  2014-04-15  5:38 ` [PATCH V9 05/12] remus: remus device core and APIs to setup/teardown Yang Hongyang
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
or returns 0 when network buffering is not compiled.

If network buffering is not compiled, and the user wants to use it, report
a error and exit.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/Makefile            |  7 +++++++
 tools/libxl/libxl_internal.h    |  2 ++
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 3647a2a..a29c505 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -45,6 +45,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index cc8d558..33b62a2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2457,6 +2457,8 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..8e23d75
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..6aa4bf1
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 05/12] remus: remus device core and APIs to setup/teardown
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 06/12] remus: implement the API for checkpoint Yang Hongyang
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

---
 tools/libxl/Makefile             |   2 +
 tools/libxl/libxl.c              |   7 +-
 tools/libxl/libxl_dom.c          |   4 +-
 tools/libxl/libxl_internal.h     |  28 ++++
 tools/libxl/libxl_remus.c        |  41 ++++++
 tools/libxl/libxl_remus_device.c | 311 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_remus_device.h |  97 ++++++++++++
 7 files changed, 486 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxl/libxl_remus.c
 create mode 100644 tools/libxl/libxl_remus_device.c
 create mode 100644 tools/libxl/libxl_remus_device.h

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index a29c505..8398386 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -52,6 +52,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 30b0b06..e3eca6e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -741,7 +741,12 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    GCNEW(dss->remus_state);
+
+    /* convenience shorthand */
+    libxl__remus_state *remus_state = dss->remus_state;
+    remus_state->dss = dss;
+    remus_state->egc = egc;
 
     /* Point of no return */
     libxl__domain_suspend(egc, dss);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 661999c..fc0c136 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -766,8 +766,6 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 
 /*==================== Domain suspend (save) ====================*/
 
-static void domain_suspend_done(libxl__egc *egc,
-                        libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok);
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
@@ -1716,7 +1714,7 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
-static void domain_suspend_done(libxl__egc *egc,
+void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
     STATE_AO_GC(dss->ao);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 33b62a2..421ae24 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2457,8 +2457,35 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+typedef struct libxl__remus_state {
+    libxl__domain_suspend_state *dss;
+    libxl__egc *egc;
+
+    /* private */
+    int saved_rc;
+    /* Opaque context containing device related stuff */
+    void *device_state;
+} libxl__remus_state;
+
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
+_hidden void domain_suspend_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss,
+                                 int rc);
+
+_hidden void libxl__remus_setup_done(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     int rc);
+
+_hidden void libxl__remus_device_setup(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_teardown_done(libxl__egc *egc,
+                                        libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_device_teardown(libxl__egc *egc,
+                                          libxl__domain_suspend_state *dss);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2470,6 +2497,7 @@ struct libxl__domain_suspend_state {
     int live;
     int debug;
     const libxl_domain_remus_info *remus;
+    libxl__remus_state *remus_state;
     /* private */
     libxl__ev_evtchn guest_evtchn;
     int guest_evtchn_lockfd;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
new file mode 100644
index 0000000..05af451
--- /dev/null
+++ b/tools/libxl/libxl_remus.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*----- remus setup/teardown code -----*/
+
+void libxl__remus_setup_done(libxl__egc *egc,
+                             libxl__domain_suspend_state *dss,
+                             int rc)
+{
+    STATE_AO_GC(dss->ao);
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
+        dss->domid);
+    domain_suspend_done(egc, dss, rc);
+}
+
+void libxl__remus_teardown_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss)
+{
+    dss->callback(egc, dss, dss->remus_state->saved_rc);
+}
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..6e7d0d5
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,311 @@
+/*
+ * Copyright (C) 2014
+ * Author: Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_remus_device.h"
+
+typedef struct libxl__remus_device_state {
+    /* nic */
+    libxl_device_nic *nics;
+    int num_nics;
+
+    /* disk */
+    libxl_device_disk *disks;
+    int num_disks;
+
+    int num_devices;
+    libxl__remus_device **dev;
+    libxl_async_exec async_exec;
+    libxl__remus_state *remus_state;
+    bool setup;
+    int curr_dev_id;
+    int curr_devtype_id;
+} libxl__remus_device_state;
+
+static libxl__remus_device_type *device_types[] = {
+};
+
+static void init_async_exec(libxl__remus_device_state *dev_state,
+                            void (*finish_cb)(void *opaque, int status))
+{
+    /*
+     * The callback may touch finish_cb/opaque to do something after
+     * the script is done.
+     */
+    dev_state->async_exec.ao = dev_state->remus_state->dss->ao;
+    dev_state->async_exec.finish_cb = finish_cb;
+    dev_state->async_exec.opaque = dev_state;
+}
+
+static void libxl__remus_teardown_cleanup(libxl__egc *egc,
+                                          libxl__domain_suspend_state *dss)
+{
+    int i;
+    libxl__remus_device_type *dev_type;
+
+    /* clean device_types */
+    for (i = 0; i < ARRAY_SIZE(device_types); i++) {
+        dev_type = device_types[i];
+        dev_type->destroy(dev_type);
+    }
+
+    libxl__remus_teardown_done(egc, dss);
+}
+
+static void dev_setup_teardown_script_cb(void *opaque, int status);
+static void dev_setup_teardown_once(libxl__remus_device_state *dev_state)
+{
+    int i, rc = REMUS_OK;
+    const libxl__remus_device_type *dev_type;
+
+    /* Convenience aliases */
+    libxl__egc *egc = dev_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = dev_state->remus_state->dss;
+    const bool setup = dev_state->setup;
+
+    i = dev_state->curr_dev_id;
+    for (; i < dev_state->num_devices; i++) {
+        if (!setup && !dev_state->dev[i])
+            continue;
+
+        dev_type = dev_state->dev[i]->dev_type;
+        init_async_exec(dev_state, dev_setup_teardown_script_cb);
+
+        if (setup)
+            rc = dev_type->setup(dev_state->dev[i],
+                                 &dev_state->async_exec);
+        else
+            rc = dev_type->teardown(dev_state->dev[i],
+                                    &dev_state->async_exec);
+
+        /* ignore teardown error to teardown as many devices as possible */
+        if (rc == REMUS_INPROGRESS || (setup && rc == REMUS_FAIL))
+            break;
+
+        if (!setup)
+            dev_state->dev[i] = NULL;
+    }
+
+    if (rc == REMUS_FAIL || i == dev_state->num_devices) {
+        if (setup)
+            libxl__remus_setup_done(egc, dss, rc);
+        else
+            libxl__remus_teardown_cleanup(egc, dss);
+    }
+
+    dev_state->curr_dev_id = i;
+}
+
+static void dev_setup_teardown_script_cb(void *opaque, int status)
+{
+    libxl__remus_device_state *dev_state = opaque;
+
+    /* Convenience aliases */
+    libxl__egc *egc = dev_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = dev_state->remus_state->dss;
+    const bool setup = dev_state->setup;
+
+    /* ignore teardown error to teardown as many devices as possible */
+    if (status == REMUS_FAIL && setup) {
+        libxl__remus_setup_done(egc, dss, ERROR_FAIL);
+        return;
+    }
+
+    if (!setup)
+        dev_state->dev[dev_state->curr_dev_id] = NULL;
+    dev_state->curr_dev_id++;
+    dev_setup_teardown_once(dev_state);
+}
+
+static void setup_all_devices(libxl__remus_device_state *dev_state)
+{
+    dev_state->curr_dev_id = 0;
+    dev_state->setup = true;
+    dev_setup_teardown_once(dev_state);
+}
+
+static void alloc_remus_dev(libxl__remus_device_state *dev_state,
+                            libxl__remus_device **remus_dev,
+                            const void *libxl_device,
+                            const libxl__remus_device_type *dev_type)
+{
+    STATE_AO_GC(dev_state->remus_state->dss->ao);
+    libxl__remus_device *new_dev;
+    int dev_id = dev_state->curr_dev_id;
+
+    if (dev_id >= dev_state->num_nics)
+        dev_id -= dev_state->num_nics;
+    new_dev = libxl__zalloc(gc, dev_type->size);
+    new_dev->dev_id = dev_id;
+    new_dev->dev_type = dev_type;
+    new_dev->libxl_device = libxl_device;
+
+    *remus_dev = new_dev;
+}
+
+static void dev_match_script_cb(void *opaque, int status);
+static int dev_match_once(libxl__remus_device_state *dev_state)
+{
+    int i, j, rc = REMUS_OK;
+    const libxl__remus_device_type *dev_type;
+    const void *libxl_device;
+    int device_type;
+
+    /* Convenience aliases */
+    libxl__egc *egc = dev_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = dev_state->remus_state->dss;
+
+    i = dev_state->curr_dev_id;
+    j = dev_state->curr_devtype_id;
+    for (; i < dev_state->num_devices; i++) {
+        if (i >= dev_state->num_nics) {
+            libxl_device = &dev_state->disks[i - dev_state->num_nics];
+            device_type = REMUS_DISK;
+        } else {
+            libxl_device = &dev_state->nics[i];
+            device_type = REMUS_NIC;
+        }
+
+        for (; j < ARRAY_SIZE(device_types); j++) {
+            dev_type = device_types[j];
+            init_async_exec(dev_state, dev_match_script_cb);
+
+            rc = dev_type->match(dev_type, libxl_device, device_type,
+                                 &dev_state->async_exec);
+            if (rc == REMUS_INPROGRESS || rc == REMUS_FAIL)
+                goto out;
+
+            if (rc == REMUS_OK) {
+                alloc_remus_dev(dev_state, &dev_state->dev[i],
+                                libxl_device, dev_type);
+                break;
+            }
+        }
+
+        if (j == ARRAY_SIZE(device_types)) {
+            /* no devtype matches with this dev */
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        j = 0;
+    }
+
+out:
+    if (rc == REMUS_FAIL)
+        libxl__remus_setup_done(egc, dss, rc);
+    else if (i < dev_state->num_devices) {
+        dev_state->curr_dev_id = i;
+        dev_state->curr_devtype_id = j;
+    }
+
+    return rc;
+}
+
+static void dev_match_script_cb(void *opaque, int status)
+{
+    libxl__remus_device_state *dev_state = opaque;
+    int rc;
+    const void *libxl_device;
+
+    /* Convenience aliases */
+    int curr_devtype_id = dev_state->curr_devtype_id;
+    int curr_dev_id = dev_state->curr_dev_id;
+    const libxl__remus_device_type *dev_type = device_types[curr_devtype_id];
+    libxl__egc *egc = dev_state->remus_state->egc;
+    libxl__domain_suspend_state *dss = dev_state->remus_state->dss;
+
+    if (curr_dev_id >= dev_state->num_nics)
+        libxl_device = &dev_state->disks[curr_dev_id - dev_state->num_nics];
+    else
+        libxl_device = &dev_state->nics[curr_dev_id];
+
+    if (status == REMUS_FAIL) {
+        libxl__remus_setup_done(egc, dss, ERROR_FAIL);
+        return;
+    } else if (status == REMUS_OK) {
+        alloc_remus_dev(dev_state, &dev_state->dev[curr_dev_id],
+                        libxl_device, dev_type);
+        dev_state->curr_dev_id++;
+        dev_state->curr_devtype_id = 0;
+    } else {
+        if (++dev_state->curr_devtype_id >= ARRAY_SIZE(device_types)) {
+            /* no devtype matches with this dev */
+            libxl__remus_setup_done(egc, dss, ERROR_FAIL);
+            return;
+        }
+    }
+
+    rc = dev_match_once(dev_state);
+    if (rc)
+        return;
+
+    setup_all_devices(dev_state);
+}
+
+void libxl__remus_device_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+    int i, rc;
+    libxl__remus_device_state *dev_state = NULL;
+    libxl__remus_device_type *dev_type;
+
+    STATE_AO_GC(dss->ao);
+
+    GCNEW(dev_state);
+
+    dss->remus_state->device_state = dev_state;
+    libxl__ev_child_init(&dev_state->async_exec.child);
+    dev_state->remus_state = dss->remus_state;
+
+    for (i = 0; i < ARRAY_SIZE(device_types); i++) {
+        dev_type = device_types[i];
+        if (dev_type->init(dev_type, dss->remus_state)) {
+            libxl__remus_setup_done(egc, dss, ERROR_FAIL);
+            return;
+        }
+    }
+
+    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+    GCNEW_ARRAY(dev_state->dev, dev_state->num_devices);
+
+    dev_state->curr_dev_id = 0;
+    dev_state->curr_devtype_id = 0;
+    rc = dev_match_once(dev_state);
+    if (rc)
+        return;
+
+    setup_all_devices(dev_state);
+}
+
+void libxl__remus_device_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *dev_state = dss->remus_state->device_state;
+
+    if (!dev_state) {
+        libxl__remus_teardown_done(egc, dss);
+        return;
+    }
+
+    libxl__ev_child_init(&dev_state->async_exec.child);
+
+    dev_state->curr_dev_id = 0;
+    dev_state->setup = false;
+    dev_setup_teardown_once(dev_state);
+}
diff --git a/tools/libxl/libxl_remus_device.h b/tools/libxl/libxl_remus_device.h
new file mode 100644
index 0000000..d8d16ff
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.h
@@ -0,0 +1,97 @@
+/*
+ * Copyright (C) 2014
+ * Author: Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_REMUS_DEVICE_H
+#define LIBXL_REMUS_DEVICE_H
+
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_device_type libxl__remus_device_type;
+
+enum {
+    REMUS_NIC,
+    REMUS_DISK,
+};
+
+/* Return value of the callback */
+enum {
+    REMUS_FAIL = ERROR_FAIL,    /* -3 */
+    REMUS_OK = 0,
+    REMUS_INPROGRESS,
+    REMUS_NOT_SUPPORT,
+};
+
+struct libxl__remus_device_type {
+    int (*init)(libxl__remus_device_type *self,
+                libxl__remus_state *remus_state);
+    void (*destroy)(libxl__remus_device_type *self);
+    void *data;
+
+    /*
+     * checkpointing callbacks, don't execute any script in it.
+     */
+    int (*postsuspend)(libxl__remus_device *dev);
+    int (*preresume)(libxl__remus_device *dev);
+    int (*commit)(libxl__remus_device *dev);
+
+    /*
+     * libxl_device:
+     *   REMUS_NIC: libxl_device_nic
+     *  REMUS_DISK: libxl_device_disk
+     *
+     * Return value:
+     *   2: the device is not this type
+     *   1: the script is still running
+     *   0: the device is this type
+     *  -3: error
+     *
+     * If the callback execute a script, pass the return value via finish_cb.
+     */
+    int (*match)(const libxl__remus_device_type *self,
+                 const void *libxl_device, int devcie_type,
+                 libxl_async_exec *async_exec);
+
+    /*
+     * Return value:
+     *   1: the script is still running
+     *   0: no script is executed
+     *  -3: error
+     *
+     * If the callback execute a script, pass the return value via finish_cb.
+     */
+    int (*setup)(libxl__remus_device *remus_dev,
+                 libxl_async_exec *async_exec);
+
+    /*
+     * Return value:
+     *   1: the script is still running
+     *   0: no script is executed
+     *  -3: error
+     *
+     * If the callback execute a script, pass the return value via finish_cb.
+     */
+    int (*teardown)(libxl__remus_device *remus_dev,
+                    libxl_async_exec *async_exec);
+
+    /* the size of libxl__remus_device */
+    int size;
+};
+
+struct libxl__remus_device {
+    int dev_id;
+    const void *libxl_device;
+    const libxl__remus_device_type *dev_type;
+};
+
+#endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 06/12] remus: implement the API for checkpoint
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 05/12] remus: remus device core and APIs to setup/teardown Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-23 16:04   ` Ian Jackson
  2014-04-15  5:38 ` [PATCH V9 07/12] remus: Remus network buffering core and APIs to setup/teardown Yang Hongyang
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

---
 tools/libxl/libxl_internal.h     |  3 +++
 tools/libxl/libxl_remus_device.c | 54 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 421ae24..14094aa 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2485,6 +2485,9 @@ _hidden void libxl__remus_teardown_done(libxl__egc *egc,
 
 _hidden void libxl__remus_device_teardown(libxl__egc *egc,
                                           libxl__domain_suspend_state *dss);
+_hidden int libxl__remus_device_postsuspend(libxl__remus_state *remus_state);
+_hidden int libxl__remus_device_preresume(libxl__remus_state *remus_state);
+_hidden int libxl__remus_device_commit(libxl__remus_state *remus_state);
 
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 6e7d0d5..bfe6080 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -39,6 +39,60 @@ typedef struct libxl__remus_device_state {
 static libxl__remus_device_type *device_types[] = {
 };
 
+int libxl__remus_device_postsuspend(libxl__remus_state *remus_state)
+{
+    int i;
+    int rc = 0;
+    libxl__remus_device *remus_dev;
+
+    /* Convenience aliases */
+    libxl__remus_device_state *dev_state = remus_state->device_state;
+
+    for (i = 0; rc == 0 && i < dev_state->num_devices; i++) {
+        remus_dev = dev_state->dev[i];
+        if (remus_dev->dev_type->postsuspend)
+            rc = remus_dev->dev_type->postsuspend(remus_dev);
+    }
+
+    return rc;
+}
+
+int libxl__remus_device_preresume(libxl__remus_state *remus_state)
+{
+    int i;
+    int rc = 0;
+    libxl__remus_device *remus_dev;
+
+    /* Convenience aliases */
+    libxl__remus_device_state *dev_state = remus_state->device_state;
+
+    for (i = 0; rc == 0 && i < dev_state->num_devices; i++) {
+        remus_dev = dev_state->dev[i];
+        if (remus_dev->dev_type->preresume)
+            rc = remus_dev->dev_type->preresume(remus_dev);
+    }
+
+    return rc;
+}
+
+int libxl__remus_device_commit(libxl__remus_state *remus_state)
+{
+    int i;
+    int rc = 0;
+    libxl__remus_device *remus_dev;
+
+    /* Convenience aliases */
+    libxl__remus_device_state *dev_state = remus_state->device_state;
+
+    for (i = 0; rc == 0 && i < dev_state->num_devices; i++) {
+        remus_dev = dev_state->dev[i];
+        if (remus_dev->dev_type->commit)
+            rc = remus_dev->dev_type->commit(remus_dev);
+    }
+
+    return rc;
+}
+
 static void init_async_exec(libxl__remus_device_state *dev_state,
                             void (*finish_cb)(void *opaque, int status))
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 07/12] remus: Remus network buffering core and APIs to setup/teardown
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 06/12] remus: implement the API for checkpoint Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-15  5:38 ` [PATCH V9 08/12] remus: implement the API to buffer/release packages Yang Hongyang
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

1.Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
       and tear down the guest's interface.
2.introduces remus-netbuf-setup hotplug script responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering in Remus.  This script is intended to be invoked
  by libxl for each guest interface, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

  The following steps are taken during init:
    a) establish a dedicated remus context containing libnl related
       state (netlink sockets, qdisc caches, etc.,)

  The following steps are taken for each vif during setup:
    a) call the hotplug script to setup its network buffer

    b) Obtain handles to plug qdiscs installed on the IFB devices
       chosen by the hotplug scripts.

  And during teardown, the netlink resources are released, followed by
  invocation of hotplug scripts to remove the ifb devices.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 183 +++++++++++++++
 tools/libxl/libxl.c                    |  17 ++
 tools/libxl/libxl.h                    |  13 ++
 tools/libxl/libxl_internal.h           |   2 +
 tools/libxl/libxl_netbuffer.c          | 403 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  23 ++
 tools/libxl/libxl_remus_device.c       |  22 +-
 tools/libxl/libxl_remus_device.h       |   2 +
 tools/libxl/libxl_types.idl            |   2 +
 11 files changed, 670 insertions(+), 2 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 70ab7f4..039eaea 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index a14cb42..baaaa41 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -15,6 +15,7 @@ XEN_SCRIPTS += network-nat vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..aed2583
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,183 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB         ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        local installed=`nl-qdisc-list -d $ifb`
+        [ -n "$installed" ] && continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$IFB"
+        add_plug_qdisc "$vifname" "$IFB"
+        release_lock "pickifb"
+
+        #not using xenstore_write that automatically exits on error
+        #because we need to cleanup
+        _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index e3eca6e..687a2a9 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -748,6 +748,23 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     remus_state->dss = dss;
     remus_state->egc = egc;
 
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
+        if (info->netbufscript) {
+            remus_state->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            remus_state->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                libxl__xen_script_dir_path());
+        }
+    }
+
     /* Point of no return */
     libxl__domain_suspend(egc, dss);
     return AO_INPROGRESS;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index b2c3015..62f7dd4 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -410,6 +410,19 @@
 #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
 
 /*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
+/*
  * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
  *
  * If this is defined:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 14094aa..b72643b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2458,6 +2458,8 @@ typedef struct libxl__logdirty_switch {
 } libxl__logdirty_switch;
 
 typedef struct libxl__remus_state {
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
     libxl__domain_suspend_state *dss;
     libxl__egc *egc;
 
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 8e23d75..a5f2b9a 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -16,12 +16,415 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_remus_device.h"
+
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_state {
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+
+    const char *netbufscript;
+    uint32_t domid;
+    libxl__ao *ao;
+} libxl__remus_netbuf_state;
+
+typedef struct libxl__remus_device_nic {
+    libxl__remus_device remus_dev;
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+
+    void *saved_opaque;
+    void (*saved_finish_cb)(void *opaque, int status);
+} libxl__remus_device_nic;
 
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static const char *get_vifname(libxl__remus_device_nic *remus_nic,
+                               const libxl_device_nic *nic)
+{
+    libxl__remus_netbuf_state *netbuf_state =
+        remus_nic->remus_dev.dev_type->data;
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = netbuf_state->domid;
+
+    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        /* use the default name */
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    /* free qdiscs */
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Now that we have brought up IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    ret = nl_cache_refill(netbuf_state->nlsock, netbuf_state->qdisc_cache);
+    if (ret < 0) {
+        LOG(ERROR, "cannot refill qdisc cache");
+        goto out;
+    }
+
+    /* get a handle to the IFB interface */
+    ifb = NULL;
+    ret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+                               remus_nic->ifb, &ifb);
+    if (ret) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(ret));
+        ret = REMUS_FAIL;
+        goto out;
+    }
+
+    ret = REMUS_FAIL;
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+                                     TC_H_ROOT);
+
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            nl_object_put((struct nl_object *)qdisc);
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+        ret = REMUS_OK;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+    }
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    return ret;
+}
+
+/*
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(void *opaque, int status)
+{
+    libxl__remus_device_nic *remus_nic = opaque;
+    libxl__remus_netbuf_state *netbuf_state =
+        remus_nic->remus_dev.dev_type->data;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    /* Convenience aliases */
+    const uint32_t domid = netbuf_state->domid;
+    const int devid = remus_nic->remus_dev.dev_id;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    if (status) {
+        rc = REMUS_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc) {
+        rc = REMUS_FAIL;
+        goto out;
+    }
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            netbuf_state->netbufscript, vif, hotplug_error);
+        rc = REMUS_FAIL;
+        goto out;
+    }
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc) {
+        rc = REMUS_FAIL;
+        goto out;
+    }
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = REMUS_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(netbuf_state, remus_nic);
+
+out:
+    remus_nic->saved_finish_cb(remus_nic->saved_opaque, rc);
+}
+
+static void netbuf_teardown_script_cb(void *opaque, int status)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = opaque;
+
+    if (status)
+        rc = REMUS_FAIL;
+    else
+        rc = REMUS_OK;
+
+    free_qdisc(remus_nic);
+
+    remus_nic->saved_finish_cb(remus_nic->saved_opaque, rc);
+}
+
+/* the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl_async_exec *async_exec, char *op,
+                             libxl__remus_device_nic *remus_nic)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_netbuf_state *netbuf_state =
+        remus_nic->remus_dev.dev_type->data;
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Convenience aliases */
+    char *const script = libxl__strdup(gc, netbuf_state->netbufscript);
+    const uint32_t domid = netbuf_state->domid;
+    const int dev_id = remus_nic->remus_dev.dev_id;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    async_exec->env = env;
+    async_exec->args = args;
+    async_exec->timeout = LIBXL_HOTPLUG_TIMEOUT;
+
+    async_exec->stdinfd = -1;
+    async_exec->stdoutfd = -1;
+    async_exec->stderrfd = -1;
+    async_exec->allow_fail = false;
+
+    remus_nic->saved_opaque = async_exec->opaque;
+    remus_nic->saved_finish_cb = async_exec->finish_cb;
+    async_exec->opaque = remus_nic;
+    if (!strcmp(op, "teardown"))
+        async_exec->finish_cb = netbuf_teardown_script_cb;
+    else
+        async_exec->finish_cb = netbuf_setup_script_cb;
+}
+
+static int nic_init(libxl__remus_device_type *self,
+                    libxl__remus_state *remus_state)
+{
+    int ret;
+    libxl__remus_netbuf_state *netbuf_state;
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    if (!remus_state->netbufscript)
+        return REMUS_OK;
+
+    GCNEW(netbuf_state);
+    self->data = netbuf_state;
+
+    netbuf_state->nlsock = nl_socket_alloc();
+    if (!netbuf_state->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        return REMUS_FAIL;
+    }
+
+    ret = nl_connect(netbuf_state->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        return REMUS_FAIL;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(netbuf_state->nlsock,
+                                 &netbuf_state->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        return REMUS_FAIL;
+    }
+
+    netbuf_state->domid = remus_state->dss->domid;
+    netbuf_state->netbufscript = remus_state->netbufscript;
+    netbuf_state->ao = remus_state->dss->ao;
+
+    return REMUS_OK;
+}
+
+static void nic_destroy(libxl__remus_device_type *self)
+{
+    libxl__remus_netbuf_state *netbuf_state = self->data;
+
+    if (!self->data)
+        return;
+
+    /* free qdisc cache */
+    if (netbuf_state->qdisc_cache) {
+        nl_cache_clear(netbuf_state->qdisc_cache);
+        nl_cache_free(netbuf_state->qdisc_cache);
+        netbuf_state->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (netbuf_state->nlsock) {
+        nl_close(netbuf_state->nlsock);
+        nl_socket_free(netbuf_state->nlsock);
+        netbuf_state->nlsock = NULL;
+    }
+}
+
+static int nic_match(const libxl__remus_device_type *self,
+                     const void *libxl_device, int device_type,
+                     libxl_async_exec *async_exec)
+{
+    if (device_type == REMUS_NIC)
+        return REMUS_OK;
+
+    return REMUS_NOT_SUPPORT;
+}
+
+static int nic_setup(libxl__remus_device *remus_dev,
+                     libxl_async_exec *async_exec)
+{
+    libxl__remus_device_nic *remus_nic =
+        CONTAINER_OF(remus_dev, *remus_nic, remus_dev);
+    libxl__remus_netbuf_state *netbuf_state = remus_dev->dev_type->data;
+    const libxl_device_nic *nic = remus_nic->remus_dev.libxl_device;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    remus_nic->vif = get_vifname(remus_nic, nic);
+
+    setup_async_exec(async_exec, "setup", remus_nic);
+    if (libxl_async_exec_script(gc, async_exec))
+        return REMUS_FAIL;
+
+    return REMUS_INPROGRESS;
+}
+
+/* Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+static int nic_teardown(libxl__remus_device *remus_dev,
+                        libxl_async_exec *async_exec)
+{
+    libxl__remus_device_nic *remus_nic =
+        CONTAINER_OF(remus_dev, *remus_nic, remus_dev);
+    libxl__remus_netbuf_state *netbuf_state = remus_dev->dev_type->data;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    setup_async_exec(async_exec, "teardown", remus_nic);
+
+    if (libxl_async_exec_script(gc, async_exec))
+        return REMUS_FAIL;
+
+    return REMUS_INPROGRESS;
+}
+
+libxl__remus_device_type remus_device_nic = {
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .match = nic_match,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+    .size = sizeof(libxl__remus_device_nic),
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 6aa4bf1..4ee24e1 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,29 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+static int nic_match(const void *libxl_device, int devcie_type,
+                     libxl_async_exec *async_exec)
+{
+    return REMUS_NOT_SUPPORT;
+}
+
+static int nic_init(libxl__remus_device_type *self,
+                    libxl__remus_state *remus_state)
+{
+    return REMUS_OK;
+}
+
+static void nic_destroy(libxl__remus_device_type *self)
+{
+    return;
+}
+
+libxl__remus_device_type remus_nic = {
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .match = nic_match,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index bfe6080..1564bd1 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -37,6 +37,7 @@ typedef struct libxl__remus_device_state {
 } libxl__remus_device_state;
 
 static libxl__remus_device_type *device_types[] = {
+    &remus_device_nic,
 };
 
 int libxl__remus_device_postsuspend(libxl__remus_state *remus_state)
@@ -111,6 +112,16 @@ static void libxl__remus_teardown_cleanup(libxl__egc *egc,
     int i;
     libxl__remus_device_type *dev_type;
 
+    /* Convenience aliases */
+    libxl__remus_device_state *dev_state = dss->remus_state->device_state;
+
+    /* clean nic */
+    for (i = 0; i < dev_state->num_nics; i++)
+        libxl_device_nic_dispose(&dev_state->nics[i]);
+    free(dev_state->nics);
+    dev_state->nics = NULL;
+    dev_state->num_nics = 0;
+
     /* clean device_types */
     for (i = 0; i < ARRAY_SIZE(device_types); i++) {
         dev_type = device_types[i];
@@ -313,7 +324,7 @@ static void dev_match_script_cb(void *opaque, int status)
 void libxl__remus_device_setup(libxl__egc *egc,
                                libxl__domain_suspend_state *dss)
 {
-    int i, rc;
+    int i, rc, num_devices;
     libxl__remus_device_state *dev_state = NULL;
     libxl__remus_device_type *dev_type;
 
@@ -333,7 +344,14 @@ void libxl__remus_device_setup(libxl__egc *egc,
         }
     }
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* nic */
+    if (dss->remus_state->netbufscript) {
+        dev_state->nics = libxl_device_nic_list(CTX, dss->domid, &num_devices);
+        dev_state->num_nics = num_devices;
+        dev_state->num_devices += num_devices;
+    }
+
+    /* TBD: enable disk buffering */
 
     GCNEW_ARRAY(dev_state->dev, dev_state->num_devices);
 
diff --git a/tools/libxl/libxl_remus_device.h b/tools/libxl/libxl_remus_device.h
index d8d16ff..30b368d 100644
--- a/tools/libxl/libxl_remus_device.h
+++ b/tools/libxl/libxl_remus_device.h
@@ -94,4 +94,6 @@ struct libxl__remus_device {
     const libxl__remus_device_type *dev_type;
 };
 
+extern libxl__remus_device_type remus_device_nic;
+
 #endif
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 612645c..cb3d926 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -563,6 +563,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 08/12] remus: implement the API to buffer/release packages
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (6 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 07/12] remus: Remus network buffering core and APIs to setup/teardown Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-23 16:10   ` Ian Jackson
  2014-04-15  5:38 ` [PATCH V9 09/12] libxl: use the API to setup/teardown network buffering Yang Hongyang
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

This patch implements two APIs:
1. netbuf_start_new_epoch()
   It marks a new epoch. The packages before this epoch will
   be flushed, and the packages after this epoch will be buffered.
   It will be called after the guest is suspended.
2. netbuf_release_prev_epoch()
   It flushes the buffered packages to client, and it will be
   called when a checkpoint finishes.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_netbuffer.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index a5f2b9a..4b4bc9d 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -416,9 +416,66 @@ static int nic_teardown(libxl__remus_device *remus_dev,
     return REMUS_INPROGRESS;
 }
 
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+                           libxl__remus_netbuf_state *netbuf_state,
+                           int buffer_op)
+{
+    int ret;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    if (buffer_op == tc_buffer_start)
+        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (!ret) {
+        ret = rtnl_qdisc_add(netbuf_state->nlsock,
+                             remus_nic->qdisc,
+                             NLM_F_REQUEST);
+        if (ret)
+            goto out;
+    }
+
+    return REMUS_OK;
+
+out:
+    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+        ((buffer_op == tc_buffer_start) ?
+        "start_new_epoch" : "release_prev_epoch"),
+        remus_nic->ifb, nl_geterror(ret));
+    return REMUS_FAIL;
+}
+
+static int netbuf_start_new_epoch(libxl__remus_device *remus_dev)
+{
+    libxl__remus_device_nic *remus_nic =
+        CONTAINER_OF(remus_dev, *remus_nic, remus_dev);
+    libxl__remus_netbuf_state *netbuf_state = remus_dev->dev_type->data;
+
+    return remus_netbuf_op(remus_nic, netbuf_state, tc_buffer_start);
+}
+
+static int netbuf_release_prev_epoch(libxl__remus_device *remus_dev)
+{
+    libxl__remus_device_nic *remus_nic =
+        CONTAINER_OF(remus_dev, *remus_nic, remus_dev);
+    libxl__remus_netbuf_state *netbuf_state = remus_dev->dev_type->data;
+
+    return remus_netbuf_op(remus_nic, netbuf_state, tc_buffer_release);
+}
+
 libxl__remus_device_type remus_device_nic = {
     .init = nic_init,
     .destroy = nic_destroy,
+    .postsuspend = netbuf_start_new_epoch,
+    .commit = netbuf_release_prev_epoch,
     .match = nic_match,
     .setup = nic_setup,
     .teardown = nic_teardown,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH V9 09/12] libxl: use the API to setup/teardown network buffering
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (7 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 08/12] remus: implement the API to buffer/release packages Yang Hongyang
@ 2014-04-15  5:38 ` Yang Hongyang
  2014-04-23 16:12   ` Ian Jackson
  2014-04-16  2:55 ` [PATCH 1/2] drbd: implement replicated checkpointing disk Lai Jiangshan
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 89+ messages in thread
From: Yang Hongyang @ 2014-04-15  5:38 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

---
 tools/libxl/libxl.c          |  6 +-----
 tools/libxl/libxl_dom.c      | 11 +++++++++++
 tools/libxl/libxl_internal.h | 10 ++++++++++
 tools/libxl/libxl_remus.c    | 16 ++++++++++++++++
 4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 687a2a9..32b836f 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -766,7 +766,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     }
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_setup_initiate(egc, dss);
     return AO_INPROGRESS;
 
  out:
@@ -782,10 +782,6 @@ static void remus_failover_cb(libxl__egc *egc,
      * backup died or some network error occurred preventing us
      * from sending checkpoints.
      */
-
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index fc0c136..16ea7d8 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1728,6 +1728,17 @@ void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (dss->remus_state) {
+        /*
+        * With Remus, if we reach this point, it means either
+        * backup died or some network error occurred preventing us
+        * from sending checkpoints. Teardown the network buffers and
+        * release netlink resources.  This is an async op.
+        */
+        libxl__remus_teardown_initiate(egc, dss, rc);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b72643b..3c0f94d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2467,6 +2467,9 @@ typedef struct libxl__remus_state {
     int saved_rc;
     /* Opaque context containing device related stuff */
     void *device_state;
+
+    /* used for checkpoint */
+    libxl__ev_time timeout;
 } libxl__remus_state;
 
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
@@ -2491,6 +2494,13 @@ _hidden int libxl__remus_device_postsuspend(libxl__remus_state *remus_state);
 _hidden int libxl__remus_device_preresume(libxl__remus_state *remus_state);
 _hidden int libxl__remus_device_commit(libxl__remus_state *remus_state);
 
+_hidden void libxl__remus_setup_initiate(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                            libxl__domain_suspend_state *dss,
+                                            int rc);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 05af451..1e17259 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -18,6 +18,12 @@
 #include "libxl_internal.h"
 
 /*----- remus setup/teardown code -----*/
+void libxl__remus_setup_initiate(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss)
+{
+    libxl__ev_time_init(&dss->remus_state->timeout);
+    libxl__remus_device_setup(egc, dss);
+}
 
 void libxl__remus_setup_done(libxl__egc *egc,
                              libxl__domain_suspend_state *dss,
@@ -34,6 +40,16 @@ void libxl__remus_setup_done(libxl__egc *egc,
     domain_suspend_done(egc, dss, rc);
 }
 
+void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss,
+                                    int rc)
+{
+    /* stash rc somewhere before invoking teardown ops. */
+    dss->remus_state->saved_rc = rc;
+
+    libxl__remus_device_teardown(egc, dss);
+}
+
 void libxl__remus_teardown_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss)
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 1/2] drbd: implement replicated checkpointing disk
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (8 preceding siblings ...)
  2014-04-15  5:38 ` [PATCH V9 09/12] libxl: use the API to setup/teardown network buffering Yang Hongyang
@ 2014-04-16  2:55 ` Lai Jiangshan
  2014-04-16  2:56   ` [PATCH 2/2] remus: support disk replicated checkpointing Lai Jiangshan
  2014-04-23  9:53 ` [PATCH V9 00/12] Remus/Libxl: Network buffering support Hongyang Yang
  2014-04-23 15:51 ` Ian Jackson
  11 siblings, 1 reply; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-16  2:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Implement remus-drbd-replicated-checkpointing-disk based on
generic remus devices framework.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |    1 +
 tools/hotplug/Linux/block-drbd-probe |   84 +++++++++++++++++
 tools/libxl/Makefile                 |    2 +-
 tools/libxl/libxl_remus_device.c     |    1 +
 tools/libxl/libxl_remus_device.h     |    1 +
 tools/libxl/libxl_remus_disk_drbd.c  |  164 ++++++++++++++++++++++++++++++++++
 6 files changed, 252 insertions(+), 1 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index baaaa41..31d9e00 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -23,6 +23,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..163ad04
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,84 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+#     2: unkown error
+#     3: the drbd device does not use protocol D
+#     4: the drbd device is not ready
+
+drbd_res=
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            drbd_res="$temp_drbd_res"
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+if [[ $? -ne 0 ]]; then
+    exit $?
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+    exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+    exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+    exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 8398386..53461ab 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -52,7 +52,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 1564bd1..c323773 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -38,6 +38,7 @@ typedef struct libxl__remus_device_state {
 
 static libxl__remus_device_type *device_types[] = {
     &remus_device_nic,
+    &remus_device_drbd_disk,
 };
 
 int libxl__remus_device_postsuspend(libxl__remus_state *remus_state)
diff --git a/tools/libxl/libxl_remus_device.h b/tools/libxl/libxl_remus_device.h
index 30b368d..4b41021 100644
--- a/tools/libxl/libxl_remus_device.h
+++ b/tools/libxl/libxl_remus_device.h
@@ -95,5 +95,6 @@ struct libxl__remus_device {
 };
 
 extern libxl__remus_device_type remus_device_nic;
+extern libxl__remus_device_type remus_device_drbd_disk;
 
 #endif
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..583b8a3
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,164 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+#include "libxl_remus_device.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+typedef struct libxl__remus_drbd_disk {
+    libxl__remus_device remus_dev;
+    int ctl_fd;
+    int ackwait;
+    const char *path;
+} libxl__remus_drbd_disk;
+
+typedef struct libxl__remus_drbd_state {
+    char *drbd_probe_script;
+    libxl__ao *ao;
+} libxl__remus_drbd_state;
+
+static int drbd_postsuspend(libxl__remus_device *remus_dev)
+{
+    libxl__remus_drbd_disk *drbd_disk =
+        CONTAINER_OF(remus_dev, *drbd_disk, remus_dev);
+
+    if (!drbd_disk->ackwait) {
+        if (ioctl(drbd_disk->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            drbd_disk->ackwait = 1;
+    }
+
+    return 0;
+}
+
+static int drbd_preresume(libxl__remus_device *remus_dev)
+{
+    libxl__remus_drbd_disk *drbd_disk =
+        CONTAINER_OF(remus_dev, *drbd_disk, remus_dev);
+
+    if (drbd_disk->ackwait) {
+        ioctl(drbd_disk->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        drbd_disk->ackwait = 0;
+    }
+
+    return 0;
+}
+
+static int drbd_commit(libxl__remus_device *remus_dev)
+{
+    /* nothing to do, all work are done by DRBD's protocal-D. */
+    return 0;
+}
+
+static int drbd_init(libxl__remus_device_type *self,
+                     libxl__remus_state *remus_state)
+{
+    libxl__remus_drbd_state *drbd_state;
+
+    STATE_AO_GC(remus_state->dss->ao);
+
+    GCNEW(drbd_state);
+    self->data = drbd_state;
+    drbd_state->ao = ao;
+    drbd_state->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                              libxl__xen_script_dir_path());
+
+
+    return REMUS_OK;
+}
+
+static void drbd_destroy(libxl__remus_device_type *self)
+{
+    return;
+}
+
+static int drbd_match(const libxl__remus_device_type *self,
+                      const void *libxl_device, int device_type,
+                      libxl_async_exec *async_exec)
+{
+    int arraysize, nr = 0;
+    const libxl_device_disk *disk = libxl_device;
+    libxl__remus_drbd_state *drbd_state = self->data;
+    STATE_AO_GC(drbd_state->ao);
+
+    if (device_type != REMUS_DISK)
+        return REMUS_NOT_SUPPORT;
+
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(async_exec->env, arraysize);
+    async_exec->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(async_exec->args, arraysize);
+    async_exec->args[nr++] = drbd_state->drbd_probe_script;
+    async_exec->args[nr++] = disk->pdev_path;
+    async_exec->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    async_exec->allow_fail = true;
+    async_exec->timeout = LIBXL_HOTPLUG_TIMEOUT;
+
+    if (libxl_async_exec_script(gc, async_exec))
+        return REMUS_FAIL;
+
+    return REMUS_INPROGRESS;
+}
+
+static int drbd_setup(libxl__remus_device *remus_dev,
+                      libxl_async_exec *async_exec)
+{
+    libxl__remus_drbd_disk *drbd_disk =
+        CONTAINER_OF(remus_dev, *drbd_disk, remus_dev);
+    const libxl_device_disk *disk = remus_dev->libxl_device;
+
+    drbd_disk->path = disk->pdev_path;
+    drbd_disk->ctl_fd = open(drbd_disk->path, O_RDONLY);
+    drbd_disk->ackwait = 0;
+
+    if (drbd_disk->ctl_fd < 0)
+        return REMUS_FAIL;
+
+    return REMUS_OK;
+}
+
+static int drbd_teardown(libxl__remus_device *remus_dev,
+                         libxl_async_exec *async_exec)
+{
+    libxl__remus_drbd_disk *drbd_disk =
+        CONTAINER_OF(remus_dev, *drbd_disk, remus_dev);
+
+    close(drbd_disk->ctl_fd);
+    return REMUS_OK;
+}
+
+libxl__remus_device_type remus_device_drbd_disk = {
+    .init = drbd_init,
+    .destroy = drbd_destroy,
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+    .commit = drbd_commit,
+    .match = drbd_match,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+    .size = sizeof(libxl__remus_drbd_disk),
+};
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 2/2] remus: support disk replicated checkpointing
  2014-04-16  2:55 ` [PATCH 1/2] drbd: implement replicated checkpointing disk Lai Jiangshan
@ 2014-04-16  2:56   ` Lai Jiangshan
  0 siblings, 0 replies; 89+ messages in thread
From: Lai Jiangshan @ 2014-04-16  2:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, Lai Jiangshan,
	Dong Eddie, Shriram Rajagopalan, FNST-Yang Hongyang,
	Roger Pau Monne

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_remus_device.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index c323773..316f832 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -123,6 +123,13 @@ static void libxl__remus_teardown_cleanup(libxl__egc *egc,
     dev_state->nics = NULL;
     dev_state->num_nics = 0;
 
+    /* clean disk */
+    for (i = 0; i < dev_state->num_disks; i++)
+        libxl_device_disk_dispose(&dev_state->disks[i]);
+    free(dev_state->disks);
+    dev_state->disks = NULL;
+    dev_state->num_disks = 0;
+
     /* clean device_types */
     for (i = 0; i < ARRAY_SIZE(device_types); i++) {
         dev_type = device_types[i];
@@ -352,7 +359,9 @@ void libxl__remus_device_setup(libxl__egc *egc,
         dev_state->num_devices += num_devices;
     }
 
-    /* TBD: enable disk buffering */
+    dev_state->disks = libxl_device_disk_list(CTX, dss->domid, &num_devices);
+    dev_state->num_disks = num_devices;
+    dev_state->num_devices += num_devices;
 
     GCNEW_ARRAY(dev_state->dev, dev_state->num_devices);
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 00/12] Remus/Libxl: Network buffering support
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (9 preceding siblings ...)
  2014-04-16  2:55 ` [PATCH 1/2] drbd: implement replicated checkpointing disk Lai Jiangshan
@ 2014-04-23  9:53 ` Hongyang Yang
  2014-04-23 15:51 ` Ian Jackson
  11 siblings, 0 replies; 89+ messages in thread
From: Hongyang Yang @ 2014-04-23  9:53 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, laijs, eddie.dong, rshriram,
	roger.pau

Ping

于 2014年04月15日 13:38, Yang Hongyang 写道:
> This patch series adds support for network buffering in the Remus
> codebase in libxl.
>
> Changes in V9:
>    introduce an API to async exec scripts for both device
>    and netbuffer.
>    Use async exec script api to exec scripts.
>
> Changes in V8:
>    Applied some comments(by IanJ).
>    Merge some struct definitions to it's implementation.
>    (2/3/5 in V7 => 3 in V8)
>
> Changes in V7:
>    Applied missing comments(by IanJ).
>    Applied Shriram comments.
>
>    merge netbufering tangled setup/teardown code into one patch.
>    (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
>
> Changes in V6:
>    Applied Ian Jackson's comments of V5 series.
>    the [PATCH 2/4 V5] is split by small functionalities.
>
>    [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
>
> Changes in V5:
>
> Merge hotplug script patch (2/5) and hotplug script setup/teardown
> patch (3/5) into a single patch.
>
> Changes in V4:
>
> [1/5] Remove check for libnl command line utils in autoconf checks
>
> [2/5] minor nits
>
> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
>
> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
>
> [5/5] minor nits
>
> Changes in V3:
> [1/5] Fix redundant checks in configure scripts
>        (based on Ian Campbell's suggestions)
>
> [2/5] Introduce locking in the script, during IFB setup.
>        Add xenstore paths used by netbuf scripts
>        to xenstore-paths.markdown
>
> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
>        following IanJ's feedback.  However, the invocations are still
>        sequential.
>
> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
>        commmand.
>
> And minor nits throughout the series based on feedback from
> the last version
>
> Changes in V2:
> [1/5] Configure script will automatically enable/disable network
>        buffer support depending on the availability of the appropriate
>        libnl3 version. [If libnl3 is unavailable, a warning message will be
>        printed to let the user know that the feature has been disabled.]
>
>        use macros from pkg.m4 instead of pkg-config commands
>        removed redundant checks for libnl3 libraries.
>
> [3,4/5] - Minor nits.
>
> Version 1:
>
> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
>        to libxl Makefile.
>
> [2/5] External script to setup/teardown network buffering using libnl3's
>        CLI. This script will be invoked by libxl before starting Remus.
>        The script's main job is to bring up an IFB device with plug qdisc
>        attached to it.  It then re-routes egress traffic from the guest's
>        vif to the IFB device.
>
> [3/5] Libxl code to invoke the external setup script, followed by netlink
>        related setup to obtain a handle on the output buffers attached
>        to each vif.
>
> [4/5] Libxl interaction with network buffer module in the kernel via
>        libnl3 API.
>
> [5/5] xl cmdline switch to explicitly enable network buffering when
>        starting remus.
>
>
>    Few things to note(by shriram):
>
>      a) Based on previous email discussions, the setup/teardown task has
>      been moved to a hotplug style shell script which can be customized as
>      desired, instead of implementing it as C code inside libxl.
>
>      b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
>     (Linux).  So I have made network buffering support an optional feature
>     so that it can be disabled if desired.
>
>     c) NetBSD does not have libnl3. So I have put the setup script under
>     tools/hotplug/Linux folder.
>
> thanks
>
> Shriram Rajagopalan (7):
>    remus: add libnl3 dependency to autoconf scripts
>    remus: introduce a function to check whether network buffering is
>      enabled
>    remus: Remus network buffering core and APIs to setup/teardown
>    remus: implement the API to buffer/release packages
>    libxl: rename remus_failover_cb() to remus_replication_failure_cb()
>    libxl: control network buffering in remus callbacks
>    libxl: network buffering cmdline switch
>
> Yang Hongyang (5):
>    introduce an API to async exec scripts
>    libxl_device: use async exec script api
>    remus: remus device core and APIs to setup/teardown
>    remus: implement the API for checkpoint
>    libxl: use the API to setup/teardown network buffering
>
>   README                                 |   4 +
>   config/Tools.mk.in                     |   3 +
>   docs/man/xl.conf.pod.5                 |   6 +
>   docs/man/xl.pod.1                      |  11 +-
>   docs/misc/xenstore-paths.markdown      |   4 +
>   tools/configure.ac                     |  15 +
>   tools/hotplug/Linux/Makefile           |   1 +
>   tools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
>   tools/libxl/Makefile                   |  11 +
>   tools/libxl/libxl.c                    |  42 ++-
>   tools/libxl/libxl.h                    |  13 +
>   tools/libxl/libxl_device.c             |  80 ++----
>   tools/libxl/libxl_dom.c                |  83 +++++-
>   tools/libxl/libxl_internal.c           |  82 ++++++
>   tools/libxl/libxl_internal.h           |  70 ++++-
>   tools/libxl/libxl_netbuffer.c          | 491 +++++++++++++++++++++++++++++++++
>   tools/libxl/libxl_nonetbuffer.c        |  54 ++++
>   tools/libxl/libxl_remus.c              |  57 ++++
>   tools/libxl/libxl_remus_device.c       | 383 +++++++++++++++++++++++++
>   tools/libxl/libxl_remus_device.h       |  99 +++++++
>   tools/libxl/libxl_types.idl            |   2 +
>   tools/libxl/xl.c                       |   4 +
>   tools/libxl/xl.h                       |   1 +
>   tools/libxl/xl_cmdimpl.c               |  28 +-
>   tools/libxl/xl_cmdtable.c              |   3 +
>   tools/remus/README                     |   6 +
>   26 files changed, 1650 insertions(+), 86 deletions(-)
>   create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>   create mode 100644 tools/libxl/libxl_netbuffer.c
>   create mode 100644 tools/libxl/libxl_nonetbuffer.c
>   create mode 100644 tools/libxl/libxl_remus.c
>   create mode 100644 tools/libxl/libxl_remus_device.c
>   create mode 100644 tools/libxl/libxl_remus_device.h
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 01/12] introduce an API to async exec scripts
  2014-04-15  5:38 ` [PATCH V9 01/12] introduce an API to async exec scripts Yang Hongyang
@ 2014-04-23 15:44   ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 15:44 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 01/12] introduce an API to async exec scripts"):
> introduce an API to async exec scripts.it will be used
> for both device and netbuffer.

Thanks.  This mostly looks plausible.

I think it would be better to combine it with the next patch.  That
way we just move/reorganise the code, rather than making a copy and
deleting the old approach.

I have some comments about details.

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index c2b73c4..eddafaf 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2030,6 +2030,27 @@ _hidden const char *libxl__xen_script_dir_path(void);
>  _hidden const char *libxl__lock_dir_path(void);
>  _hidden const char *libxl__run_dir_path(void);
>  
> +/*----- asynchronous function -----*/
> +typedef struct libxl_async_exec {

We would normally call structs of this kind "libxl__something_state".
So this should be called "libxl__async_exec_state".

Your struct needs to have comments explaining which parts are to be
filled in by the caller, and which parts are private.  See
libxl__xswait_state for an example.

...
> +    void *opaque;

You shouldn't need this field.  Callers should be able to use
CONTAINER_OF to find their own state structure.

> +    void (*finish_cb)(void *opaque, int status);

Most of these kinds of setups simply name the field "callback".

> +    /* unit: second */
> +    int timeout;

The comment would be better on the same line.  However, in fact it is
better to express the units in the variable name.

And the timeout should be in milliseconds - all the timeouts in our
internal APIs are in ms.

So, "timeout_ms".

> +    bool allow_fail;

None of your callers set this to "true".  And in fact its function is
only to inhibit a log message - it makes no difference to the control
flow.  Can it be abolished ?

> +    int stdinfd;
> +    int stdoutfd;
> +    int stderrfd;

It would probably be better to make this an array, rather than 3 named
fields.  That will make it easier in the future to deal with them all
at once.

> +    libxl__ao *ao;

This field should be at the top, the way that libxl__xswait_state has
it.

> +} libxl_async_exec;
> +
> +_hidden extern int libxl_async_exec_script(libxl__gc *gc,
> +                                           libxl_async_exec *async_exec);

This function needs to be called "libxl__async_exec_start" or maybe
just "libxl__async_exec".  It doesn't seem to be limited to scripts.

It should be in libxl_exec.c, probably, or libxl_aoutils.c.

> +static void libxl_async_exec_timeout(libxl__egc *egc,
> +                                     libxl__ev_time *ev,
> +                                     const struct timeval *requested_abs)
> +{
> +    libxl_async_exec *async_exec = CONTAINER_OF(ev, *async_exec, time);

We normally give these local state variables very short names,
because we need to refer to them a lot.  If the type is changed to
libxl__async_exec_state, the local variable should be "aes".

> +    STATE_AO_GC(async_exec->ao);
> +
> +    libxl__ev_time_deregister(gc, &async_exec->time);
> +    assert(libxl__ev_child_inuse(&async_exec->child));
> +
> +    LOG(DEBUG, "killing hotplug script %s because of timeout",

You have changed the formatting here, as you move the code.  I think
the previous location of the blank line is better.

> +        async_exec->args[0]);

This still says "hotplug script" (in several places).  You probably
need to add a new field to the libxl_async_exec.  We generally seem to
use the name "what" for this - cf libxl__xswait_state.

> +static void libxl_async_exec_done(libxl__egc *egc,
> +                                  libxl__ev_child *child,
> +                                  pid_t pid, int status)

This function can be called "async_exec_done" (because it has only
internal linkage).  If you prefix it with libxl, it needs to be
libxl__.

> +    if (status && !async_exec->allow_fail) {
> +        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
> +                                      async_exec->args[0],

It's probably better to use the new "what" value rather args[0].

> +int libxl_async_exec_script(libxl__gc *gc, libxl_async_exec *async_exec)
> +{
> +    pid_t pid;
> +
> +    /* Convenience aliases */
> +    libxl__ev_child *const child = &async_exec->child;
> +    char * const *args = async_exec->args;
> +    char * const *env = async_exec->env;

Your constness doesn't seem correct here.  In the struct these
parameters are char**.  And your use of whitespace is anomalouse.
I think you want
       char **const args = aes->args;

> +    const int stdinfd = async_exec->stdinfd;
> +    const int stdoutfd = async_exec->stdoutfd;
> +    const int stderrfd = async_exec->stderrfd;

These don't buy you anything - you only use each of these once.

> +    /* Set hotplug timeout */

The word "hotplug" again.

> +    if (libxl__ev_time_register_rel(gc, &async_exec->time,
> +                                    libxl_async_exec_timeout,
> +                                    async_exec->timeout * 1000)) {
> +        LOG(ERROR, "unable to register timeout for "
> +            "script %s", args[0]);
> +        return ERROR_FAIL;
> +    }
> +
> +    LOG(DEBUG, "Calling script: %s ", args[0]);
> +    /* Fork and exec netbuf script */

The assumption of "script" again.  And a specialised comment.  Can you
make sure all of these messages use "what" as appropriate and that
messages and comments are fully general ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 02/12] libxl_device: use async exec script api
  2014-04-15  5:38 ` [PATCH V9 02/12] libxl_device: use async exec script api Yang Hongyang
@ 2014-04-23 15:48   ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 15:48 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 02/12] libxl_device: use async exec script api"):
> use async exec script api to exec device related scripts.

Thanks.  Most of this is the other half of the code motion from the
previous patch.

> -    libxl__ev_child_init(&aodev->child);
> +    libxl__ev_child_init(&aodev->async_exec.child);

You need an init function, to avoid a layering violation.  The child
field should be accessed only from the async exec implementation.

> -    assert(libxl__ev_child_inuse(&aodev->child));
> +    assert(libxl__ev_child_inuse(&aodev->async_exec.child));

Likewise, you need an inuse function.

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index eddafaf..cc8d558 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2094,7 +2094,9 @@ struct libxl__ao_device {
>      /* device hotplug execution */
>      const char *what;
>      int num_exec;
> -    libxl__ev_child child;
> +
> +    libxl__egc *egc;
> +    libxl_async_exec async_exec;

I think this struct field name could profitably be shortened.  Perhaps
"exec" is too likely to clash but "aexec" would be OK.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled
  2014-04-15  5:38 ` [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
@ 2014-04-23 15:50   ` Ian Jackson
  2014-04-23 15:51     ` Shriram Rajagopalan
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 15:50 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled"):
> libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
> or returns 0 when network buffering is not compiled.
> 
> If network buffering is not compiled, and the user wants to use it, report
> a error and exit.
...
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +LIBXL_OBJS-y += libxl_netbuffer.o
> +else
> +LIBXL_OBJS-y += libxl_nonetbuffer.o
> +endif

Thanks.  But I think this needs to be set somewhere ?

Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 00/12] Remus/Libxl: Network buffering support
  2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
                   ` (10 preceding siblings ...)
  2014-04-23  9:53 ` [PATCH V9 00/12] Remus/Libxl: Network buffering support Hongyang Yang
@ 2014-04-23 15:51 ` Ian Jackson
  11 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 15:51 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 00/12] Remus/Libxl: Network buffering support"):
> This patch series adds support for network buffering in the Remus
> codebase in libxl. 

Thanks.  I'm reviewing this, but I am lacking patch 10.  So my review
will only include up to patch 9.

If you were to provide a public git branch I could fetch, that would
be slightly more convenient and also help work around any lost
messages.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled
  2014-04-23 15:50   ` Ian Jackson
@ 2014-04-23 15:51     ` Shriram Rajagopalan
  2014-04-30 14:36       ` Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-04-23 15:51 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Roger Pau Monne, Ian Campbell, Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Yang Hongyang, Lai Jiangshan


[-- Attachment #1.1: Type: text/plain, Size: 699 bytes --]

On Wed, Apr 23, 2014 at 10:50 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Yang Hongyang writes ("[PATCH V9 04/12] remus: introduce a function to
> check whether network buffering is enabled"):
> > libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
> > or returns 0 when network buffering is not compiled.
> >
> > If network buffering is not compiled, and the user wants to use it,
> report
> > a error and exit.
> ...
> > +ifeq ($(CONFIG_REMUS_NETBUF),y)
> > +LIBXL_OBJS-y += libxl_netbuffer.o
> > +else
> > +LIBXL_OBJS-y += libxl_nonetbuffer.o
> > +endif
>
> Thanks.  But I think this needs to be set somewhere ?
>
> Ian.
>
>

Its already done at the autoconf level.

[-- Attachment #1.2: Type: text/html, Size: 1256 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-03-03 17:51       ` Ian Jackson
@ 2014-04-23 16:02         ` Ian Jackson
  2014-04-23 16:55           ` Shriram Rajagopalan
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 16:02 UTC (permalink / raw)
  To: Yang Hongyang, Ian Jackson
  Cc: Ian Campbell, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, Lai Jiangshan,
	Shriram Rajagopalan, xen-devel, ian.jackson, Roger Pau Monne

Yang Hongyang writes ("[PATCH V9 05/12] remus: remus device core and APIs to setup/teardown"):
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 33b62a2..421ae24 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2457,8 +2457,35 @@ typedef struct libxl__logdirty_switch {
>      libxl__ev_time timeout;
>  } libxl__logdirty_switch;
>  
> +typedef struct libxl__remus_state {
> +    libxl__domain_suspend_state *dss;
> +    libxl__egc *egc;
> +
> +    /* private */
> +    int saved_rc;
> +    /* Opaque context containing device related stuff */
> +    void *device_state;
> +} libxl__remus_state;

I'm afraid that the interface between the remus code and the rest of
the code is still not very clear.

Earlier, I wrote:

  > [...]  I wonder if it might not be better to provide a firmer
  > interface between the remus code and the rest of the save/restore
  > machinery.  That is, have an explicit callback function recorded
  > by the save/restore code which is called back by the remus
  > machinery when it has done its work.  What do you think ?
  > 
  > I think having the flow of control spring off into libxl_remus.c and
  > magically come back by libxl_remus.c knowing to call
  > domain_suspend_done is rather opaque.

I think you have basically two options:

1. Make the remus part of this be a fully self-contained standard
   asynchronous callback-based suboperation, like libxl__xswait,
   libxl__bootloader, et al.

   In this case you should rigorously follow the existing patterns,
   defining a clear interface between the two parts, providing a
   callback function set by the caller, etc.

2. Integrate the remus part into the suspend/resume code in an
   ad hoc fashion, with extremely clear comments everywhere about the
   expected interface, and no extraneous moving parts.

At the moment you seem to have mixed these two approaches.

> @@ -2470,6 +2497,7 @@ struct libxl__domain_suspend_state {
>      int live;
>      int debug;
>      const libxl_domain_remus_info *remus;
> +    libxl__remus_state *remus_state;

I'm not sure why this variable is called "remus_state" rather than
just "remus".

> +typedef struct libxl__remus_device_state {
> +    /* nic */
> +    libxl_device_nic *nics;
> +    int num_nics;
> +
> +    /* disk */
> +    libxl_device_disk *disks;
> +    int num_disks;

Much of this doesn't seem be used in this patch.  I think you may need
to restructure your patch series.

In general, when patching existing code, you should introduce an
internal structure or variable in the same patch as you introduce the
code which uses it.

This is in contrast to new functions (or other facilities) with a
well-defined API, where it is usually best to introduce the function
fully-formed and then the callers.

Since I normally look at the header file first, to try to see what all
the pieces are and what's going on, it's difficult for me to review
this patch in more detail as it stands.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 06/12] remus: implement the API for checkpoint
  2014-04-15  5:38 ` [PATCH V9 06/12] remus: implement the API for checkpoint Yang Hongyang
@ 2014-04-23 16:04   ` Ian Jackson
  2014-05-14  1:46     ` Hongyang Yang
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 16:04 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 06/12] remus: implement the API for checkpoint"):
> +_hidden int libxl__remus_device_postsuspend(libxl__remus_state *remus_state);
> +_hidden int libxl__remus_device_preresume(libxl__remus_state *remus_state);
> +_hidden int libxl__remus_device_commit(libxl__remus_state *remus_state);

This patch seems to introduce these functions with no callers.  The
functions are then patched in a subsequent patch, again without any
callers.  See my comments on the previous patch.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 08/12] remus: implement the API to buffer/release packages
  2014-04-15  5:38 ` [PATCH V9 08/12] remus: implement the API to buffer/release packages Yang Hongyang
@ 2014-04-23 16:10   ` Ian Jackson
  2014-04-23 17:04     ` Shriram Rajagopalan
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 16:10 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 08/12] remus: implement the API to buffer/release packages"):
> This patch implements two APIs:
> 1. netbuf_start_new_epoch()
>    It marks a new epoch. The packages before this epoch will
>    be flushed, and the packages after this epoch will be buffered.
>    It will be called after the guest is suspended.
> 2. netbuf_release_prev_epoch()
>    It flushes the buffered packages to client, and it will be
>    called when a checkpoint finishes.

Thanks.

(BTW "packages" should be "packets" throughout.)

> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
> index a5f2b9a..4b4bc9d 100644
> --- a/tools/libxl/libxl_netbuffer.c
> +++ b/tools/libxl/libxl_netbuffer.c
> @@ -416,9 +416,66 @@ static int nic_teardown(libxl__remus_device *remus_dev,
>      return REMUS_INPROGRESS;
>  }
>  
> +/* The buffer_op's value, not the value passed to kernel */
> +enum {
> +    tc_buffer_start,
> +    tc_buffer_release
> +};

The comment refers to a "buffer_op" variable which is not nearby.  I
suggest replacing the comment with
    /* Internal value for libxl, not the value passed to kernel */

Are these names "tc_" not possibly clashing with some future kernel
things ?

> +static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
> +                           libxl__remus_netbuf_state *netbuf_state,
> +                           int buffer_op)
> +{
> +    int ret;
> +
> +    STATE_AO_GC(netbuf_state->ao);
> +
> +    if (buffer_op == tc_buffer_start)
> +        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
> +    else
> +        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);

If you're going to have an enum here, I think you should use
"switch", and explicitly abort() on unknown values.  Otherwise when a
new op is invented this code will silently do the wrong thing.

> +    if (!ret) {

This is an opaque error handling style.  Please use libxl's standard
"goto out" style.

> +        ret = rtnl_qdisc_add(netbuf_state->nlsock,
> +                             remus_nic->qdisc,
> +                             NLM_F_REQUEST);
> +        if (ret)
> +            goto out;
> +    }
> +
> +    return REMUS_OK;

Please don't invent your own error numbers.  Please use the existing
libxl error code type.  Feel free to add new values if you need them.

> +out:
> +    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
> +        ((buffer_op == tc_buffer_start) ?
> +        "start_new_epoch" : "release_prev_epoch"),

Perhaps there should be a static const array of these strings.  At the
very least you should make sure this code handles the _release op
explicitly and crashes if the value is not recognised.

>  libxl__remus_device_type remus_device_nic = {
>      .init = nic_init,
>      .destroy = nic_destroy,
> +    .postsuspend = netbuf_start_new_epoch,
> +    .commit = netbuf_release_prev_epoch,

Presumably, this is the patch where network buffering actually starts
to happen.  So should this come with a documentation change removing
any warnings about remus and networking ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 09/12] libxl: use the API to setup/teardown network buffering
  2014-04-15  5:38 ` [PATCH V9 09/12] libxl: use the API to setup/teardown network buffering Yang Hongyang
@ 2014-04-23 16:12   ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-04-23 16:12 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH V9 09/12] libxl: use the API to setup/teardown network buffering"):
> ---
...
>      /* Point of no return */
> -    libxl__domain_suspend(egc, dss);
> +    libxl__remus_setup_initiate(egc, dss);

I think this patch may need to become part of some other patch; see my
other comments.

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index b72643b..3c0f94d 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2467,6 +2467,9 @@ typedef struct libxl__remus_state {
>      int saved_rc;
>      /* Opaque context containing device related stuff */
>      void *device_state;
> +
> +    /* used for checkpoint */
> +    libxl__ev_time timeout;

There are no calls which actually set this timeout.  Nor any which
clean it up.  Obviously some important moving parts are missing here.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-04-23 16:02         ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages] Ian Jackson
@ 2014-04-23 16:55           ` Shriram Rajagopalan
  2014-05-02 16:08             ` Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-04-23 16:55 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lai Jiangshan, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel, Ian Jackson,
	Ian Campbell, Yang Hongyang, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 3350 bytes --]

On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Yang Hongyang writes ("[PATCH V9 05/12] remus: remus device core and APIs
> to setup/teardown"):
> > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> > index 33b62a2..421ae24 100644
> > --- a/tools/libxl/libxl_internal.h
> > +++ b/tools/libxl/libxl_internal.h
> > @@ -2457,8 +2457,35 @@ typedef struct libxl__logdirty_switch {
> >      libxl__ev_time timeout;
> >  } libxl__logdirty_switch;
> >
> > +typedef struct libxl__remus_state {
> > +    libxl__domain_suspend_state *dss;
> > +    libxl__egc *egc;
> > +
> > +    /* private */
> > +    int saved_rc;
> > +    /* Opaque context containing device related stuff */
> > +    void *device_state;
> > +} libxl__remus_state;
>
> I'm afraid that the interface between the remus code and the rest of
> the code is still not very clear.
>
> Earlier, I wrote:
>
>   > [...]  I wonder if it might not be better to provide a firmer
>   > interface between the remus code and the rest of the save/restore
>   > machinery.  That is, have an explicit callback function recorded
>   > by the save/restore code which is called back by the remus
>   > machinery when it has done its work.  What do you think ?
>   >
>   > I think having the flow of control spring off into libxl_remus.c and
>   > magically come back by libxl_remus.c knowing to call
>   > domain_suspend_done is rather opaque.
>
> I think you have basically two options:
>
> 1. Make the remus part of this be a fully self-contained standard
>    asynchronous callback-based suboperation, like libxl__xswait,
>    libxl__bootloader, et al.
>
>    In this case you should rigorously follow the existing patterns,
>    defining a clear interface between the two parts, providing a
>    callback function set by the caller, etc.
>
> 2. Integrate the remus part into the suspend/resume code in an
>    ad hoc fashion, with extremely clear comments everywhere about the
>    expected interface, and no extraneous moving parts.
>
> At the moment you seem to have mixed these two approaches.
>

Sorry, I missed the previous comment of yours. The two options you
note are bit more clearer than the previous comment. And I also
agree that the current approach is mixing options 1 & 2.

The entire Remus code (executed from start to end) is one giant async op.
Internally, per checkpoint, the code executes for no more than tens of
milliseconds at max, with the exception of sleeping until the next
checkpoint
needs to be taken.

Doing checkpoint related work (i.e., syscalls to control disk/network
buffers)
in an async op is an overkill. So, they are integrated into the
suspend/resume
infrastructure (option 2)

The async op is useful (please correct me if I am wrong) if the op runs for
a long time, such that you don't want users of libxl to block. Which is
exactly
why the setup/teardown and the sleep_until_next_checkpoint operations
are ao suboperations. (option 1).


All said, perhaps, it may be more clear to add a level of indirection:
 make domain_suspend_done a callback field in libxl__domain_state.

Change all direct callers of domain_suspend_done to invoke this callback.

In this way,
* domain_suspend -> domain_suspend_done (for normal suspend/save operations)
* domain_remus_start->domain_remus_terminated

What do you think?

[-- Attachment #1.2: Type: text/html, Size: 4445 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 08/12] remus: implement the API to buffer/release packages
  2014-04-23 16:10   ` Ian Jackson
@ 2014-04-23 17:04     ` Shriram Rajagopalan
  2014-05-02 16:10       ` Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-04-23 17:04 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Roger Pau Monne, Ian Campbell, Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Yang Hongyang, Lai Jiangshan


[-- Attachment #1.1: Type: text/plain, Size: 1819 bytes --]

On Wed, Apr 23, 2014 at 11:10 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Yang Hongyang writes ("[PATCH V9 08/12] remus: implement the API to
> buffer/release packages"):
> > This patch implements two APIs:
> > 1. netbuf_start_new_epoch()
> >    It marks a new epoch. The packages before this epoch will
> >    be flushed, and the packages after this epoch will be buffered.
> >    It will be called after the guest is suspended.
> > 2. netbuf_release_prev_epoch()
> >    It flushes the buffered packages to client, and it will be
> >    called when a checkpoint finishes.
>
> Thanks.
>
> (BTW "packages" should be "packets" throughout.)
>
> > diff --git a/tools/libxl/libxl_netbuffer.c
> b/tools/libxl/libxl_netbuffer.c
> > index a5f2b9a..4b4bc9d 100644
> > --- a/tools/libxl/libxl_netbuffer.c
> > +++ b/tools/libxl/libxl_netbuffer.c
> > @@ -416,9 +416,66 @@ static int nic_teardown(libxl__remus_device
> *remus_dev,
> >      return REMUS_INPROGRESS;
> >  }
> >
> > +/* The buffer_op's value, not the value passed to kernel */
> > +enum {
> > +    tc_buffer_start,
> > +    tc_buffer_release
> > +};
>
> The comment refers to a "buffer_op" variable which is not nearby.  I
> suggest replacing the comment with
>     /* Internal value for libxl, not the value passed to kernel */
>
> Are these names "tc_" not possibly clashing with some future kernel
> things ?
>
>
These are local enums. They have nothing to do with the sch_plug
qdisc in the kernel. That said, I agree that they could be renamed to
something more local:
 netbuf_begin_next_epoch /* start buffering pkts for next epoch */
 netbuf_commit_prev_epoch  /* release buffered pkts from prev epoch, after
                                                     * receiving commit ack
from backup
                                                     */

[-- Attachment #1.2: Type: text/html, Size: 2567 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled
  2014-04-23 15:51     ` Shriram Rajagopalan
@ 2014-04-30 14:36       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-04-30 14:36 UTC (permalink / raw)
  To: rshriram
  Cc: Roger Pau Monne, Ian Campbell, Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Yang Hongyang, Lai Jiangshan

Shriram Rajagopalan writes ("Re: [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled"):
> On Wed, Apr 23, 2014 at 10:50 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
> wrote:
> 
>     Yang Hongyang writes ("[PATCH V9 04/12] remus: introduce a function to
>     check whether network buffering is enabled"):
>     > libxl__netbuffer_enabled() returns 1 when network buffering is compiled,
>     > or returns 0 when network buffering is not compiled.
>     >
>     > If network buffering is not compiled, and the user wants to use it,
>     report
>     > a error and exit.
>     ...
>     > +ifeq ($(CONFIG_REMUS_NETBUF),y)
>     > +LIBXL_OBJS-y += libxl_netbuffer.o
>     > +else
>     > +LIBXL_OBJS-y += libxl_nonetbuffer.o
>     > +endif
> 
>     Thanks.  But I think this needs to be set somewhere ?
...
> Its already done at the autoconf level.

In another patch ?  I think this is another example of a case where
the series needs restructuring.  Semantically related pieces should be
together so that reviewing an individual patch makes sense.

Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-04-23 16:55           ` Shriram Rajagopalan
@ 2014-05-02 16:08             ` Ian Jackson
  2014-05-02 21:59               ` Shriram Rajagopalan
  2014-05-07  5:42               ` Hongyang Yang
  0 siblings, 2 replies; 89+ messages in thread
From: Ian Jackson @ 2014-05-02 16:08 UTC (permalink / raw)
  To: rshriram
  Cc: Lai Jiangshan, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Ian Campbell, Yang Hongyang, Roger Pau Monne

Shriram Rajagopalan writes ("Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]"):
> On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
> wrote:
>     I think you have basically two options:
> 
>     1. Make the remus part of this be a fully self-contained standard
>        asynchronous callback-based suboperation, like libxl__xswait,
>        libxl__bootloader, et al.
> 
>        In this case you should rigorously follow the existing patterns,
>        defining a clear interface between the two parts, providing a
>        callback function set by the caller, etc.
> 
>     2. Integrate the remus part into the suspend/resume code in an
>        ad hoc fashion, with extremely clear comments everywhere about the
>        expected interface, and no extraneous moving parts.
> 
>     At the moment you seem to have mixed these two approaches.
> 
> Sorry, I missed the previous comment of yours. The two options you
> note are bit more clearer than the previous comment. And I also
> agree that the current approach is mixing options 1 & 2.

Right.

> The entire Remus code (executed from start to end) is one giant
> async op.  Internally, per checkpoint, the code executes for no more
> than tens of milliseconds at max, with the exception of sleeping
> until the next checkpoint needs to be taken.

Yes.

> Doing checkpoint related work (i.e., syscalls to control
> disk/network buffers) in an async op is an overkill. So, they are
> integrated into the suspend/resume infrastructure (option 2)
> 
> The async op is useful (please correct me if I am wrong) if the op
> runs for a long time, such that you don't want users of libxl to
> block. Which is exactly why the setup/teardown and the
> sleep_until_next_checkpoint operations are ao suboperations. (option
> 1).

This isn't quite the right distinction.  You should use an
asynchronous style for anything which might block.  So it is OK to
synchronously make fast syscalls.  (I assume that the disk/network
control syscalls are fast.)

And in this context, blocking includes child processes, and calls to
(u)sleep.

> All said, perhaps, it may be more clear to add a level of indirection:
>  make domain_suspend_done a callback field in libxl__domain_state.
> 
> Change all direct callers of domain_suspend_done to invoke this callback.
> 
> In this way,
> * domain_suspend -> domain_suspend_done (for normal suspend/save operations)
> * domain_remus_start->domain_remus_terminated
> 
> What do you think?

I'm not sure I quite follow but I think you are still mixing the two
approaches I discuss above.  I would like you to pick one.

If you're not sure, you should probably pick option 1.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 08/12] remus: implement the API to buffer/release packages
  2014-04-23 17:04     ` Shriram Rajagopalan
@ 2014-05-02 16:10       ` Ian Jackson
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Jackson @ 2014-05-02 16:10 UTC (permalink / raw)
  To: rshriram
  Cc: Roger Pau Monne, Ian Campbell, Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Yang Hongyang, Lai Jiangshan

Shriram Rajagopalan writes ("Re: [PATCH V9 08/12] remus: implement the API to buffer/release packages"):
> On Wed, Apr 23, 2014 at 11:10 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
> wrote:

>     Are these names "tc_" not possibly clashing with some future kernel
>     things ?
> 
> These are local enums. They have nothing to do with the sch_plug
> qdisc in the kernel.

Right.  All I want to know is whether the kernel is likely to
introduce its own meanings for these names.  (I haven't looked at the
kernel headers in any detail.)  The kernel has historically been a bit
cavalier about namespacing.

If the kernel might think it owns the tc_* bit of the namespace then
we should avoid it.  If the kernel isn't likely to think that then
it's good as it is.

> That said, I agree that they could be renamed to
> something more local:
>  netbuf_begin_next_epoch /* start buffering pkts for next epoch */
>  netbuf_commit_prev_epoch  /* release buffered pkts from prev epoch, after

But probably we don't need to go so far !  Short names are generally a
good thing too.

Ian.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-05-02 16:08             ` Ian Jackson
@ 2014-05-02 21:59               ` Shriram Rajagopalan
  2014-05-07  5:42               ` Hongyang Yang
  1 sibling, 0 replies; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-05-02 21:59 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3810 bytes --]

On May 2, 2014 11:08 AM, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote:
>
> Shriram Rajagopalan writes ("Re: [PATCH 07/10 V7] libxl: use the API to
setup/teardown network buffering [and 1 more messages]"):
> > On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@eu.citrix.com
>
> > wrote:
> >     I think you have basically two options:
> >
> >     1. Make the remus part of this be a fully self-contained standard
> >        asynchronous callback-based suboperation, like libxl__xswait,
> >        libxl__bootloader, et al.
> >
> >        In this case you should rigorously follow the existing patterns,
> >        defining a clear interface between the two parts, providing a
> >        callback function set by the caller, etc.
> >
> >     2. Integrate the remus part into the suspend/resume code in an
> >        ad hoc fashion, with extremely clear comments everywhere about
the
> >        expected interface, and no extraneous moving parts.
> >
> >     At the moment you seem to have mixed these two approaches.
> >
> > Sorry, I missed the previous comment of yours. The two options you
> > note are bit more clearer than the previous comment. And I also
> > agree that the current approach is mixing options 1 & 2.
>
> Right.
>
> > The entire Remus code (executed from start to end) is one giant
> > async op.  Internally, per checkpoint, the code executes for no more
> > than tens of milliseconds at max, with the exception of sleeping
> > until the next checkpoint needs to be taken.
>
> Yes.
>
> > Doing checkpoint related work (i.e., syscalls to control
> > disk/network buffers) in an async op is an overkill. So, they are
> > integrated into the suspend/resume infrastructure (option 2)
> >
> > The async op is useful (please correct me if I am wrong) if the op
> > runs for a long time, such that you don't want users of libxl to
> > block. Which is exactly why the setup/teardown and the
> > sleep_until_next_checkpoint operations are ao suboperations. (option
> > 1).
>
> This isn't quite the right distinction.  You should use an
> asynchronous style for anything which might block.  So it is OK to
> synchronously make fast syscalls.  (I assume that the disk/network
> control syscalls are fast.)
>
> And in this context, blocking includes child processes, and calls to
> (u)sleep.
>

Isn't that what the patches are doing currently?
Setup, teardown and (u)sleep are AO subops (following the option 1 style
you suggested), while the rest of Remus code runs every checkpoint in non
AO fashion (syscalls to disk, libnl calls (which in turn issue syscalls)
for network)

> > All said, perhaps, it may be more clear to add a level of indirection:
> >  make domain_suspend_done a callback field in libxl__domain_state.
> >
> > Change all direct callers of domain_suspend_done to invoke this
callback.
> >
> > In this way,
> > * domain_suspend -> domain_suspend_done (for normal suspend/save
operations)
> > * domain_remus_start->domain_remus_terminated
> >
> > What do you think?
>
> I'm not sure I quite follow

The indirection helps separate the control flow between Remus and non Remus
executions. Currently both end up in domain_suspend_done which as you
stated, seems ad-hoc, bolting the Remus error path into a function that is
used for indicating completion of AO libxl calls like domain suspend, save,
etc.
This suggestion was more inline with the option (1) you suggested.

> but I think you are still mixing the two
> approaches I discuss above.

But which parts of the code? As I said earlier, one part of the code
(setup, teardown, etc) follows option 1, while the other path (per
checkpoint execution) follows option 2, because they are not blocking in
anyway.

> I would like you to pick one.
>
> If you're not sure, you should probably pick option 1.
>
> Thanks,
> Ian.
>

[-- Attachment #1.2: Type: text/html, Size: 4847 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-05-02 16:08             ` Ian Jackson
  2014-05-02 21:59               ` Shriram Rajagopalan
@ 2014-05-07  5:42               ` Hongyang Yang
  2014-05-07 13:12                 ` Shriram Rajagopalan
  1 sibling, 1 reply; 89+ messages in thread
From: Hongyang Yang @ 2014-05-07  5:42 UTC (permalink / raw)
  To: Ian Jackson, rshriram
  Cc: Lai Jiangshan, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Roger Pau Monne, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 3264 bytes --]

Hi Ian:

On 05/03/2014 12:08 AM, Ian Jackson wrote:
> Shriram Rajagopalan writes ("Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]"):
>> On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
>> wrote:
>>      I think you have basically two options:
>>
>>      1. Make the remus part of this be a fully self-contained standard
>>         asynchronous callback-based suboperation, like libxl__xswait,
>>         libxl__bootloader, et al.
>>
>>         In this case you should rigorously follow the existing patterns,
>>         defining a clear interface between the two parts, providing a
>>         callback function set by the caller, etc.
>>
>>      2. Integrate the remus part into the suspend/resume code in an
>>         ad hoc fashion, with extremely clear comments everywhere about the
>>         expected interface, and no extraneous moving parts.
>>
>>      At the moment you seem to have mixed these two approaches.
>>
>> Sorry, I missed the previous comment of yours. The two options you
>> note are bit more clearer than the previous comment. And I also
>> agree that the current approach is mixing options 1 & 2.
> Right.
>
>> The entire Remus code (executed from start to end) is one giant
>> async op.  Internally, per checkpoint, the code executes for no more
>> than tens of milliseconds at max, with the exception of sleeping
>> until the next checkpoint needs to be taken.
> Yes.
>
>> Doing checkpoint related work (i.e., syscalls to control
>> disk/network buffers) in an async op is an overkill. So, they are
>> integrated into the suspend/resume infrastructure (option 2)
>>
>> The async op is useful (please correct me if I am wrong) if the op
>> runs for a long time, such that you don't want users of libxl to
>> block. Which is exactly why the setup/teardown and the
>> sleep_until_next_checkpoint operations are ao suboperations. (option
>> 1).
> This isn't quite the right distinction.  You should use an
> asynchronous style for anything which might block.  So it is OK to
> synchronously make fast syscalls.  (I assume that the disk/network
> control syscalls are fast.)
>
> And in this context, blocking includes child processes, and calls to
> (u)sleep.
>
>> All said, perhaps, it may be more clear to add a level of indirection:
>>   make domain_suspend_done a callback field in libxl__domain_state.
>>
>> Change all direct callers of domain_suspend_done to invoke this callback.
>>
>> In this way,
>> * domain_suspend -> domain_suspend_done (for normal suspend/save operations)
>> * domain_remus_start->domain_remus_terminated
>>
>> What do you think?
> I'm not sure I quite follow but I think you are still mixing the two
> approaches I discuss above.  I would like you to pick one.
>
> If you're not sure, you should probably pick option 1.

This patchset is based on the current remus implementation (without netbuffer)
witch is integrated into suspend/resume code, if as you suggested we pick option 1,
the whole remus structure needs refactoring.
we're working on it, may take quiet a while.

>
> Thanks,
> Ian.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
>


[-- Attachment #1.2: Type: text/html, Size: 4433 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-05-07  5:42               ` Hongyang Yang
@ 2014-05-07 13:12                 ` Shriram Rajagopalan
  2014-05-12 13:18                   ` Ian Jackson
  0 siblings, 1 reply; 89+ messages in thread
From: Shriram Rajagopalan @ 2014-05-07 13:12 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: Lai Jiangshan, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monne, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 3416 bytes --]

On Wed, May 7, 2014 at 12:42 AM, Hongyang Yang <yanghy@cn.fujitsu.com>wrote:

>  Hi Ian:
>
>
> On 05/03/2014 12:08 AM, Ian Jackson wrote:
>
> Shriram Rajagopalan writes ("Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]"):
>
>  On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> <Ian.Jackson@eu.citrix.com>
> wrote:
>     I think you have basically two options:
>
>     1. Make the remus part of this be a fully self-contained standard
>        asynchronous callback-based suboperation, like libxl__xswait,
>        libxl__bootloader, et al.
>
>        In this case you should rigorously follow the existing patterns,
>        defining a clear interface between the two parts, providing a
>        callback function set by the caller, etc.
>
>     2. Integrate the remus part into the suspend/resume code in an
>        ad hoc fashion, with extremely clear comments everywhere about the
>        expected interface, and no extraneous moving parts.
>
>     At the moment you seem to have mixed these two approaches.
>
> Sorry, I missed the previous comment of yours. The two options you
> note are bit more clearer than the previous comment. And I also
> agree that the current approach is mixing options 1 & 2.
>
>  Right.
>
>
>  The entire Remus code (executed from start to end) is one giant
> async op.  Internally, per checkpoint, the code executes for no more
> than tens of milliseconds at max, with the exception of sleeping
> until the next checkpoint needs to be taken.
>
>  Yes.
>
>
>  Doing checkpoint related work (i.e., syscalls to control
> disk/network buffers) in an async op is an overkill. So, they are
> integrated into the suspend/resume infrastructure (option 2)
>
> The async op is useful (please correct me if I am wrong) if the op
> runs for a long time, such that you don't want users of libxl to
> block. Which is exactly why the setup/teardown and the
> sleep_until_next_checkpoint operations are ao suboperations. (option
> 1).
>
>  This isn't quite the right distinction.  You should use an
> asynchronous style for anything which might block.  So it is OK to
> synchronously make fast syscalls.  (I assume that the disk/network
> control syscalls are fast.)
>
> And in this context, blocking includes child processes, and calls to
> (u)sleep.
>
>
>  All said, perhaps, it may be more clear to add a level of indirection:
>  make domain_suspend_done a callback field in libxl__domain_state.
>
> Change all direct callers of domain_suspend_done to invoke this callback.
>
> In this way,
> * domain_suspend -> domain_suspend_done (for normal suspend/save operations)
> * domain_remus_start->domain_remus_terminated
>
> What do you think?
>
>  I'm not sure I quite follow but I think you are still mixing the two
> approaches I discuss above.  I would like you to pick one.
>
> If you're not sure, you should probably pick option 1.
>
>
> This patchset is based on the current remus implementation (without
> netbuffer)
> witch is integrated into suspend/resume code, if as you suggested we pick
> option 1,
> the whole remus structure needs refactoring.
> we're working on it, may take quiet a while.
>
>



Just to make sure I am on the same page with you folks,
Ian, when you talked about the two options (1) & (2), did you mean the
entire remus implementation inside libxl or just "this" setup/teardown code
base?

[-- Attachment #1.2: Type: text/html, Size: 4360 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-05-07 13:12                 ` Shriram Rajagopalan
@ 2014-05-12 13:18                   ` Ian Jackson
  2014-05-13  1:41                     ` Hongyang Yang
  0 siblings, 1 reply; 89+ messages in thread
From: Ian Jackson @ 2014-05-12 13:18 UTC (permalink / raw)
  To: rshriram
  Cc: Roger Pau Monne, Lai Jiangshan, FNST-Wen Congyang,
	Stefano Stabellini, Andrew Cooper, Jiang Yunhong, Dong Eddie,
	xen-devel, Hongyang Yang, Ian Campbell

Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]"):
> On Wed, May 7, 2014 at 12:42 AM, Hongyang Yang <yanghy@cn.fujitsu.com> wrote:
...
>     This patchset is based on the current remus implementation
>     (without netbuffer) witch is integrated into suspend/resume
>     code, if as you suggested we pick option 1, the whole remus
>     structure needs refactoring.  we're working on it, may take
>     quiet a while.
...
> Just to make sure I am on the same page with you folks,
> Ian, when you talked about the two options (1) & (2), did you mean the
> entire remus implementation inside libxl or just "this" setup/teardown code
> base?

I think it is OK for some of the Remus code to fit into the rest of
libxl code via method (1) and some via method (2).

But any particular subfunction should do strictly one or the other.  I
think, overall, (1) is better.  It would be nice to do (1) everywhere.
But a series can be acceptable even if it contains some (2).

Thanks,
Ian.

For reference, the (1) and (2) we are referring to are these:

1. Make the remus part of this be a fully self-contained standard
   asynchronous callback-based suboperation, like libxl__xswait,
   libxl__bootloader, et al.

   In this case you should rigorously follow the existing patterns,
   defining a clear interface between the two parts, providing a
   callback function set by the caller, etc.

2. Integrate the remus part into the suspend/resume code in an
   ad hoc fashion, with extremely clear comments everywhere about the
   expected interface, and no extraneous moving parts.

I'm sure you know that but I wanted to save future readers, and anyone
not following this in detail, the effort of chasing it down.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]
  2014-05-12 13:18                   ` Ian Jackson
@ 2014-05-13  1:41                     ` Hongyang Yang
  0 siblings, 0 replies; 89+ messages in thread
From: Hongyang Yang @ 2014-05-13  1:41 UTC (permalink / raw)
  To: Ian Jackson, rshriram
  Cc: Lai Jiangshan, FNST-Wen Congyang, Stefano Stabellini,
	Andrew Cooper, Jiang Yunhong, Dong Eddie, xen-devel,
	Roger Pau Monne, Ian Campbell

On 05/12/2014 09:18 PM, Ian Jackson wrote:
> Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]"):
>> On Wed, May 7, 2014 at 12:42 AM, Hongyang Yang <yanghy@cn.fujitsu.com> wrote:
> ...
>>      This patchset is based on the current remus implementation
>>      (without netbuffer) witch is integrated into suspend/resume
>>      code, if as you suggested we pick option 1, the whole remus
>>      structure needs refactoring.  we're working on it, may take
>>      quiet a while.
> ...
>> Just to make sure I am on the same page with you folks,
>> Ian, when you talked about the two options (1) & (2), did you mean the
>> entire remus implementation inside libxl or just "this" setup/teardown code
>> base?
>
> I think it is OK for some of the Remus code to fit into the rest of
> libxl code via method (1) and some via method (2).
>
> But any particular subfunction should do strictly one or the other.  I
> think, overall, (1) is better.  It would be nice to do (1) everywhere.
> But a series can be acceptable even if it contains some (2).
>

Thanks for the reply, that's more clear and solved my doubts too.

Thanks,
Yang.

> Thanks,
> Ian.
>
> For reference, the (1) and (2) we are referring to are these:
>
> 1. Make the remus part of this be a fully self-contained standard
>     asynchronous callback-based suboperation, like libxl__xswait,
>     libxl__bootloader, et al.
>
>     In this case you should rigorously follow the existing patterns,
>     defining a clear interface between the two parts, providing a
>     callback function set by the caller, etc.
>
> 2. Integrate the remus part into the suspend/resume code in an
>     ad hoc fashion, with extremely clear comments everywhere about the
>     expected interface, and no extraneous moving parts.
>
> I'm sure you know that but I wanted to save future readers, and anyone
> not following this in detail, the effort of chasing it down.
> .
>

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH V9 06/12] remus: implement the API for checkpoint
  2014-04-23 16:04   ` Ian Jackson
@ 2014-05-14  1:46     ` Hongyang Yang
  0 siblings, 0 replies; 89+ messages in thread
From: Hongyang Yang @ 2014-05-14  1:46 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs



On 04/24/2014 12:04 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH V9 06/12] remus: implement the API for checkpoint"):
>> +_hidden int libxl__remus_device_postsuspend(libxl__remus_state *remus_state);
>> +_hidden int libxl__remus_device_preresume(libxl__remus_state *remus_state);
>> +_hidden int libxl__remus_device_commit(libxl__remus_state *remus_state);
>
> This patch seems to introduce these functions with no callers.  The
> functions are then patched in a subsequent patch, again without any
> callers.  See my comments on the previous patch.

Hi Ian, these functions are called in patch 11. I will restructure the patch to
make it easy to review.(merge this func define and the caller)

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2014-05-14  1:46 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-15  5:38 [PATCH V9 00/12] Remus/Libxl: Network buffering support Yang Hongyang
2014-04-15  5:38 ` [PATCH V9 01/12] introduce an API to async exec scripts Yang Hongyang
2014-04-23 15:44   ` Ian Jackson
2014-04-15  5:38 ` [PATCH V9 02/12] libxl_device: use async exec script api Yang Hongyang
2014-04-23 15:48   ` Ian Jackson
2014-04-15  5:38 ` [PATCH V9 03/12] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
2014-04-15  5:38 ` [PATCH V9 04/12] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
2014-04-23 15:50   ` Ian Jackson
2014-04-23 15:51     ` Shriram Rajagopalan
2014-04-30 14:36       ` Ian Jackson
2014-04-15  5:38 ` [PATCH V9 05/12] remus: remus device core and APIs to setup/teardown Yang Hongyang
2014-04-15  5:38 ` [PATCH V9 06/12] remus: implement the API for checkpoint Yang Hongyang
2014-04-23 16:04   ` Ian Jackson
2014-05-14  1:46     ` Hongyang Yang
2014-04-15  5:38 ` [PATCH V9 07/12] remus: Remus network buffering core and APIs to setup/teardown Yang Hongyang
2014-04-15  5:38 ` [PATCH V9 08/12] remus: implement the API to buffer/release packages Yang Hongyang
2014-04-23 16:10   ` Ian Jackson
2014-04-23 17:04     ` Shriram Rajagopalan
2014-05-02 16:10       ` Ian Jackson
2014-04-15  5:38 ` [PATCH V9 09/12] libxl: use the API to setup/teardown network buffering Yang Hongyang
2014-04-23 16:12   ` Ian Jackson
2014-04-16  2:55 ` [PATCH 1/2] drbd: implement replicated checkpointing disk Lai Jiangshan
2014-04-16  2:56   ` [PATCH 2/2] remus: support disk replicated checkpointing Lai Jiangshan
2014-04-23  9:53 ` [PATCH V9 00/12] Remus/Libxl: Network buffering support Hongyang Yang
2014-04-23 15:51 ` Ian Jackson
  -- strict thread matches above, loose matches on Subject: below --
2014-04-02 11:04 [PATCH V8 0/8] " Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 1/8] remus: add libnl3 dependency to autoconf scripts Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 2/8] remus: introduce a function to check whether network buffering is enabled Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 3/8] remus: Remus network buffering core and APIs to setup/teardown Yang Hongyang
2014-02-10  9:19   ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
2014-02-10  9:19     ` [PATCH 01/10 V7] remus: add libnl3 dependency to autoconf scripts Lai Jiangshan
2014-02-10  9:19     ` [PATCH 02/10 V7] tools/libxl: update libxl_domain_remus_info Lai Jiangshan
2014-03-03 16:33       ` Ian Jackson
2014-02-10  9:19     ` [PATCH 03/10 V7] tools/libxl: introduce a new structure libxl__remus_state Lai Jiangshan
2014-03-03 16:38       ` Ian Jackson
2014-02-10  9:19     ` [PATCH 04/10 V7] remus: introduce a function to check whether network buffering is enabled Lai Jiangshan
2014-03-03 16:39       ` Ian Jackson
2014-02-10  9:19     ` [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown Lai Jiangshan
2014-03-03 17:44       ` Ian Jackson
2014-04-03 14:06         ` [PATCH 05/10 V7] remus: Remus network buffering core and APIs to setup/teardown [and 1 more messages] Ian Jackson
2014-02-10  9:19     ` [PATCH 06/10 V7] remus: implement the API to buffer/release packages Lai Jiangshan
2014-03-03 17:48       ` Ian Jackson
2014-02-10  9:19     ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering Lai Jiangshan
2014-03-03 17:51       ` Ian Jackson
2014-04-23 16:02         ` [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages] Ian Jackson
2014-04-23 16:55           ` Shriram Rajagopalan
2014-05-02 16:08             ` Ian Jackson
2014-05-02 21:59               ` Shriram Rajagopalan
2014-05-07  5:42               ` Hongyang Yang
2014-05-07 13:12                 ` Shriram Rajagopalan
2014-05-12 13:18                   ` Ian Jackson
2014-05-13  1:41                     ` Hongyang Yang
2014-02-10  9:19     ` [PATCH 08/10 V7] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Lai Jiangshan
2014-03-03 17:52       ` Ian Jackson
2014-02-10  9:19     ` [PATCH 09/10 V7] libxl: control network buffering in remus callbacks Lai Jiangshan
2014-03-03 17:54       ` Ian Jackson
2014-02-10  9:19     ` [PATCH 10/10 V7] libxl: network buffering cmdline switch Lai Jiangshan
2014-03-03 17:58       ` Ian Jackson
2014-02-26  2:31     ` [PATCH 00/10 V7] Remus/Libxl: Network buffering support Lai Jiangshan
2014-02-26  2:53     ` [PATCH RFC] remus: implement remus replicated checkpointing disk Lai Jiangshan
2014-03-10 11:28       ` Ian Jackson
2014-03-10 12:34         ` Lai Jiangshan
2014-03-10 16:19           ` Ian Jackson
2014-03-11 18:10       ` Shriram Rajagopalan
2014-03-12  2:35         ` Lai Jiangshan
2014-03-12  6:23           ` Shriram Rajagopalan
2014-03-12 10:07           ` Ian Campbell
2014-03-12 11:57             ` Lai Jiangshan
2014-03-12 12:17               ` Ian Campbell
2014-03-12 12:28                 ` Lai Jiangshan
2014-03-12 10:06         ` Ian Campbell
2014-03-12 12:21           ` Lai Jiangshan
2014-04-02 11:04 ` [PATCH V8 4/8] remus: implement the API to buffer/release packages Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 5/8] libxl: use the API to setup/teardown network buffering Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 6/8] libxl: rename remus_failover_cb() to remus_replication_failure_cb() Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 7/8] libxl: control network buffering in remus callbacks Yang Hongyang
2014-04-02 11:04 ` [PATCH V8 8/8] libxl: network buffering cmdline switch Yang Hongyang
2014-04-03 12:22   ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Lai Jiangshan
2014-04-03 12:22     ` [PATCH 2/7] introduce a new function libxl__remus_netbuf_teardown_done() Lai Jiangshan
2014-04-03 12:22     ` [PATCH 3/7] introduce an API to async exec scripts Lai Jiangshan
2014-04-03 12:22     ` [PATCH 4/7] netbuffer: use async exec API to exec the netbuffer script Lai Jiangshan
2014-04-03 12:22     ` [PATCH 5/7] netbuf: move dev_id from remus_state to netbuf_state Lai Jiangshan
2014-04-03 12:22     ` [PATCH 6/7] remus: implement remus replicated checkpointing disk Lai Jiangshan
2014-04-03 16:41       ` Shriram Rajagopalan
2014-04-04  3:04         ` Lai Jiangshan
2014-04-03 12:22     ` [PATCH 7/7] drbd: implement " Lai Jiangshan
2014-04-03 16:07       ` Shriram Rajagopalan
2014-04-03 14:08     ` [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done() Ian Jackson
2014-04-04  8:53       ` Hongyang Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.