All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/5] Remus netbuffer: Network buffering support
@ 2014-06-05  1:34 Yang Hongyang
  2014-06-05  1:34 ` [PATCH v10 1/5] libxl: introduce asynchronous execution API Yang Hongyang
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:34 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

This patch series adds support for network buffering in the Remus
codebase in libxl.

This is a rebased version of v10 series.The first patch was applied,
but can not found on current master, if it's not necessary, please
let me know.

the code is also hosted on github:

url: https://github.com/laijs/xen
branch: remus-0605

Changes V10:
  Restructured the whole patch series.
  Introduce the remus device abstract layer.
  Make remus checkpoint asynchronous.

Changes in V9:
  Use async exec script api to exec scripts.

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks

Shriram Rajagopalan (1):
  libxl: network buffering cmdline switch

Yang Hongyang (4):
  libxl: introduce asynchronous execution API
  remus: add libnl3 dependency for network buffering support
  remus: introduce remus device
  remus: implement remus network buffering for nic devices

 README                                 |   4 +
 config/Tools.mk.in                     |   4 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  11 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  15 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 183 +++++++++++
 tools/libxl/Makefile                   |  15 +
 tools/libxl/libxl.c                    |  52 +++-
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_aoutils.c            |  89 ++++++
 tools/libxl/libxl_device.c             |  78 ++---
 tools/libxl/libxl_dom.c                | 132 +++++++-
 tools/libxl/libxl_internal.h           | 151 ++++++++-
 tools/libxl/libxl_netbuffer.c          | 550 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  98 ++++++
 tools/libxl/libxl_remus_device.c       | 323 +++++++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl     |   2 +-
 tools/libxl/libxl_types.idl            |   3 +
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  28 +-
 tools/libxl/xl_cmdtable.c              |   3 +
 tools/remus/README                     |   6 +
 25 files changed, 1694 insertions(+), 82 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus_device.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v10 1/5] libxl: introduce asynchronous execution API
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
@ 2014-06-05  1:34 ` Yang Hongyang
  2014-06-05 16:01   ` Ian Jackson
  2014-06-05  1:34 ` [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support Yang Hongyang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:34 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

1.introduce asynchronous execution API:
  libxl__async_exec_init
  libxl__async_exec_start
  libxl__async_exec_inuse
2.use the async exec API to execute device hotplug scripts

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_aoutils.c  | 89 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_device.c   | 78 +++++++++++---------------------------
 tools/libxl/libxl_internal.h | 34 +++++++++++++++--
 3 files changed, 141 insertions(+), 60 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 1c9eb9e..b10d2e1 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -451,3 +451,92 @@ int libxl__openptys(libxl__openpty_state *op,
     return rc;
 }
 
+static void async_exec_timeout(libxl__egc *egc,
+                               libxl__ev_time *ev,
+                               const struct timeval *requested_abs)
+{
+    libxl__async_exec_state *aes = CONTAINER_OF(ev, *aes, time);
+    STATE_AO_GC(aes->ao);
+
+    libxl__ev_time_deregister(gc, &aes->time);
+
+    assert(libxl__ev_child_inuse(&aes->child));
+    LOG(ERROR, "killing execution of %s because of timeout", aes->what);
+
+    if (kill(aes->child.pid, SIGKILL)) {
+        LOGEV(ERROR, errno, "unable to kill %s [%ld]",
+              aes->what, (unsigned long)aes->child.pid);
+    }
+
+    return;
+}
+
+static void async_exec_done(libxl__egc *egc,
+                            libxl__ev_child *child,
+                            pid_t pid, int status)
+{
+    libxl__async_exec_state *aes = CONTAINER_OF(child, *aes, child);
+    STATE_AO_GC(aes->ao);
+
+    libxl__ev_time_deregister(gc, &aes->time);
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      aes->what, pid, status);
+    }
+
+    aes->callback(egc, aes, status);
+}
+
+void libxl__async_exec_init(libxl__async_exec_state *aes)
+{
+    libxl__ev_time_init(&aes->time);
+    libxl__ev_child_init(&aes->child);
+}
+
+int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes)
+{
+    pid_t pid;
+
+    /* Convenience aliases */
+    libxl__ev_child *const child = &aes->child;
+    char ** const args = aes->args;
+
+    /* Set execution timeout */
+    if (libxl__ev_time_register_rel(gc, &aes->time,
+                                    async_exec_timeout,
+                                    aes->timeout_ms)) {
+        LOG(ERROR, "unable to register timeout for executing: %s", aes->what);
+        goto out;
+    }
+
+    LOG(DEBUG, "forking to execute: %s ", aes->what);
+
+    /* Fork and exec */
+    pid = libxl__ev_child_fork(gc, child, async_exec_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        libxl__exec(gc, aes->stdfds[0], aes->stdfds[1],
+                    aes->stdfds[2], args[0], args, aes->env);
+        /* notreached */
+        abort();
+    }
+
+    return 0;
+
+out:
+    return ERROR_FAIL;
+}
+
+bool libxl__async_exec_inuse(const libxl__async_exec_state *aes)
+{
+    bool time_inuse = libxl__ev_time_isregistered(&aes->time);
+    bool child_inuse = libxl__ev_child_inuse(&aes->child);
+    assert(time_inuse == child_inuse);
+    return child_inuse;
+}
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index fa99f77..90ae564 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -430,7 +430,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
     aodev->rc = 0;
     aodev->dev = NULL;
     aodev->num_exec = 0;
-    /* Initialize timer for QEMU Bodge and hotplug execution */
+    /* Initialize timer for QEMU Bodge */
     libxl__ev_time_init(&aodev->timeout);
     /*
      * Initialize xs_watch, because it's not used on all possible
@@ -440,7 +440,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
     aodev->active = 1;
     /* We init this here because we might call device_hotplug_done
      * without actually calling any hotplug script */
-    libxl__ev_child_init(&aodev->child);
+    libxl__async_exec_init(&aodev->aes);
 }
 
 /* multidev */
@@ -707,12 +707,9 @@ static void device_backend_cleanup(libxl__gc *gc,
 
 static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev);
 
-static void device_hotplug_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
-                                      const struct timeval *requested_abs);
-
 static void device_hotplug_child_death_cb(libxl__egc *egc,
-                                          libxl__ev_child *child,
-                                          pid_t pid, int status);
+                                          libxl__async_exec_state *aes,
+                                          int status);
 
 static void device_destroy_be_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
                                          const struct timeval *requested_abs);
@@ -953,11 +950,11 @@ static void device_backend_cleanup(libxl__gc *gc, libxl__ao_device *aodev)
 static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
 {
     STATE_AO_GC(aodev->ao);
+    libxl__async_exec_state *aes = &aodev->aes;
     char *be_path = libxl__device_backend_path(gc, aodev->dev);
     char **args = NULL, **env = NULL;
     int rc = 0;
     int hotplug, nullfd = -1;
-    pid_t pid;
     uint32_t domid;
 
     /*
@@ -1009,16 +1006,6 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
     }
 
-    /* Set hotplug timeout */
-    rc = libxl__ev_time_register_rel(gc, &aodev->timeout,
-                                     device_hotplug_timeout_cb,
-                                     LIBXL_HOTPLUG_TIMEOUT * 1000);
-    if (rc) {
-        LOG(ERROR, "unable to register timeout for hotplug device %s", be_path);
-        goto out;
-    }
-
-    aodev->what = GCSPRINTF("%s %s", args[0], args[1]);
     LOG(DEBUG, "calling hotplug script: %s %s", args[0], args[1]);
 
     nullfd = open("/dev/null", O_RDONLY);
@@ -1028,23 +1015,22 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
     }
 
-    /* fork and execute hotplug script */
-    pid = libxl__ev_child_fork(gc, &aodev->child, device_hotplug_child_death_cb);
-    if (pid == -1) {
-        LOG(ERROR, "unable to fork");
-        rc = ERROR_FAIL;
+    aes->ao = ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->callback = device_hotplug_child_death_cb;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = nullfd;
+    aes->stdfds[1] = 2;
+    aes->stdfds[2] = -1;
+
+    rc = libxl__async_exec_start(gc, aes);
+    if (rc)
         goto out;
-    }
-
-    if (!pid) {
-        /* child */
-        libxl__exec(gc, nullfd, 2, -1, args[0], args, env);
-        /* notreached */
-        abort();
-    }
 
     close(nullfd);
-    assert(libxl__ev_child_inuse(&aodev->child));
+    assert(libxl__async_exec_inuse(&aodev->aes));
 
     return;
 
@@ -1055,29 +1041,11 @@ out:
     return;
 }
 
-static void device_hotplug_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
-                                      const struct timeval *requested_abs)
-{
-    libxl__ao_device *aodev = CONTAINER_OF(ev, *aodev, timeout);
-    STATE_AO_GC(aodev->ao);
-
-    libxl__ev_time_deregister(gc, &aodev->timeout);
-
-    assert(libxl__ev_child_inuse(&aodev->child));
-    LOG(DEBUG, "killing hotplug script %s because of timeout", aodev->what);
-    if (kill(aodev->child.pid, SIGKILL)) {
-        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
-                            aodev->what, (unsigned long)aodev->child.pid);
-    }
-
-    return;
-}
-
 static void device_hotplug_child_death_cb(libxl__egc *egc,
-                                          libxl__ev_child *child,
-                                          pid_t pid, int status)
+                                          libxl__async_exec_state *aes,
+                                          int status)
 {
-    libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
     STATE_AO_GC(aodev->ao);
     char *be_path = libxl__device_backend_path(gc, aodev->dev);
     char *hotplug_error;
@@ -1085,8 +1053,6 @@ static void device_hotplug_child_death_cb(libxl__egc *egc,
     device_hotplug_clean(gc, aodev);
 
     if (status) {
-        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
-                                      aodev->what, pid, status);
         hotplug_error = libxl__xs_read(gc, XBT_NULL,
                                        GCSPRINTF("%s/hotplug-error", be_path));
         if (hotplug_error)
@@ -1178,7 +1144,7 @@ static void device_hotplug_clean(libxl__gc *gc, libxl__ao_device *aodev)
     /* Clean events and check reentrancy */
     libxl__ev_time_deregister(gc, &aodev->timeout);
     libxl__ev_xswatch_deregister(gc, &aodev->xs_watch);
-    assert(!libxl__ev_child_inuse(&aodev->child));
+    assert(!libxl__async_exec_inuse(&aodev->aes));
 }
 
 static void devices_remove_callback(libxl__egc *egc,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 082749e..5968485 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2050,6 +2050,33 @@ _hidden const char *libxl__xen_script_dir_path(void);
 _hidden const char *libxl__lock_dir_path(void);
 _hidden const char *libxl__run_dir_path(void);
 
+/*----- subprocess execution with timeout -----*/
+
+typedef struct libxl__async_exec_state libxl__async_exec_state;
+
+typedef void libxl__async_exec_callback(libxl__egc *egc,
+                        libxl__async_exec_state *aes, int status);
+
+struct libxl__async_exec_state {
+    /* caller must fill these in */
+    libxl__ao *ao;
+    const char *what; /* for error msgs, what we're executing */
+    int timeout_ms;
+    libxl__async_exec_callback *callback;
+    /* caller must fill in; as for libxl__exec */
+    int stdfds[3];
+    char **args; /* execution arguments */
+    char **env; /* execution environment */
+
+    /* private */
+    libxl__ev_time time;
+    libxl__ev_child child;
+};
+
+void libxl__async_exec_init(libxl__async_exec_state *aes);
+int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes);
+bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);
+
 /*----- device addition/removal -----*/
 
 typedef struct libxl__ao_device libxl__ao_device;
@@ -2086,14 +2113,13 @@ struct libxl__ao_device {
     libxl__multidev *multidev; /* reference to the containing multidev */
     /* private for add/remove implementation */
     libxl__ev_devstate backend_ds;
-    /* Bodge for Qemu devices, also used for timeout of hotplug execution */
+    /* Bodge for Qemu devices */
     libxl__ev_time timeout;
     /* xenstore watch for backend path of driver domains */
     libxl__ev_xswatch xs_watch;
-    /* device hotplug execution */
-    const char *what;
     int num_exec;
-    libxl__ev_child child;
+    /* for calling hotplug scripts */
+    libxl__async_exec_state aes;
 };
 
 /*
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
  2014-06-05  1:34 ` [PATCH v10 1/5] libxl: introduce asynchronous execution API Yang Hongyang
@ 2014-06-05  1:34 ` Yang Hongyang
  2014-06-05 16:18   ` Ian Jackson
  2014-06-05  1:34 ` [PATCH v10 3/5] remus: introduce remus device Yang Hongyang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:34 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

when there's no network buffering support,libxl__netbuffer_enabled()
returns 0, otherwise returns 1.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 README                          |  4 ++++
 config/Tools.mk.in              |  4 ++++
 tools/configure.ac              | 15 +++++++++++++++
 tools/libxl/Makefile            | 13 +++++++++++++
 tools/libxl/libxl_internal.h    |  1 +
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 tools/remus/README              |  6 ++++++
 8 files changed, 105 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/README b/README
index 9bbe734..e770932 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
     * cmake (if building vtpm stub domains)
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 84b2612..06c9d25 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -38,6 +38,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
+
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -56,6 +59,7 @@ CONFIG_QEMU_XEN     := @qemu_xen@
 CONFIG_BLKTAP1      := @blktap1@
 CONFIG_VTPM         := @vtpm@
 CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff --git a/tools/configure.ac b/tools/configure.ac
index 25d7ca3..38d2d05 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -247,5 +247,20 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+    [libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+    AC_MSG_WARN([Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher])
+    AC_SUBST(remus_netbuf, [n])
+    ],[
+    AC_SUBST(LIBNL3_LIBS)
+    AC_SUBST(LIBNL3_CFLAGS)
+    AC_SUBST(remus_netbuf, [y])
+])
+
 AC_OUTPUT()
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 4cfa275..a572dca 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,17 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
@@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5968485..2b46121 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2462,6 +2462,7 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
 
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..8e23d75
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..6aa4bf1
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/remus/README b/tools/remus/README
index 9e8140b..4736252 100644
--- a/tools/remus/README
+++ b/tools/remus/README
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://nss.cs.ubc.ca/remus/ for details.
+
+Using Remus with libxl on Xen 4.4 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v10 3/5] remus: introduce remus device
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
  2014-06-05  1:34 ` [PATCH v10 1/5] libxl: introduce asynchronous execution API Yang Hongyang
  2014-06-05  1:34 ` [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support Yang Hongyang
@ 2014-06-05  1:34 ` Yang Hongyang
  2014-06-05 17:06   ` Ian Jackson
  2014-06-05  1:34 ` [PATCH v10 4/5] remus: implement remus network buffering for nic devices Yang Hongyang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:34 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

introduce remus device, an abstract layer of remus devices(nic, disk,
etc).It provide the following APIs for libxl:
  >libxl__remus_device_setup
    setup remus devices, like attach qdisc, enable disk buffering, etc
  >libxl__remus_device_teardown
    teardown devices
  >libxl__remus_device_postsuspend
  >libxl__remus_device_preresume
  >libxl__remus_device_commit
    above three are for checkpoint.
through remus device layer, the remus execution flow will be like
this:
  xl remus -> remus device setup
                |-> remus checkpoint(postsuspend, commit, preresume)
                      ...
                       |-> remus device teardown,failover or abort
the remus device layer provide an interface
  libxl__remus_device_ops
which a remus device must implement.the whole remus structure:
                            |remus|
                               |
                        |remus device|
                               |
                |nic| |drbd disks| |qemu disks| ...
a device(nic, drbd disks, qemu disks, etc) must implement
libxl__remus_device_ops to support remus.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 tools/libxl/Makefile               |   2 +
 tools/libxl/libxl.c                |  34 ++++-
 tools/libxl/libxl_dom.c            | 132 ++++++++++++++--
 tools/libxl/libxl_internal.h       | 113 ++++++++++++++
 tools/libxl/libxl_remus_device.c   | 303 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl |   2 +-
 tools/libxl/libxl_types.idl        |   1 +
 7 files changed, 572 insertions(+), 15 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index a572dca..7a722a8 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,6 +56,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus_device.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 900b8d4..0cdf348 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -709,6 +709,31 @@ out:
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_state *rs, int rc)
+{
+    STATE_AO_GC(rs->ao);
+    libxl__ao_complete(egc,ao,rc);
+}
+
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    STATE_AO_GC(rs->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
+        dss->domid);
+    rs->saved_rc = rc;
+    rs->callback = libxl__remus_setup_failed;
+    libxl__remus_device_teardown(egc, rs);
+}
+
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
                              uint32_t domid, int send_fd, int recv_fd,
@@ -737,10 +762,15 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* Convenience aliases */
+    libxl__remus_state *const rs = &dss->rs;
+    rs->ao = ao;
+    rs->domid = domid;
+    rs->saved_rc = 0;
+    rs->callback = libxl__remus_setup_done;
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_device_setup(egc, rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 661999c..70765a3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1444,31 +1444,65 @@ static void libxl__remus_domain_suspend_callback(void *data)
     domain_suspend_callback_common(egc, dss);
 }
 
+static void remus_device_postsuspend_cb(libxl__egc *egc,
+                                        libxl__remus_state *rs, int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    if (!rc)
+        ok = 1;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
+    if (!ok)
+        goto out;
+
+    libxl__remus_state *const rs = &dss->rs;
+    rs->callback = remus_device_postsuspend_cb;
+    libxl__remus_device_postsuspend(egc, rs);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void remus_device_preresume_cb(libxl__egc *egc,
+                                        libxl__remus_state *rs, int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        /* Resumes the domain and the device model */
+        if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+            ok = 1;
+    }
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-static int libxl__remus_domain_resume_callback(void *data)
+static void libxl__remus_domain_resume_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    libxl__egc *egc = shs->egc;
     STATE_AO_GC(dss->ao);
 
-    /* Resumes the domain and the device model */
-    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
-        return 0;
-
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
-    return 1;
+    libxl__remus_state *const rs = &dss->rs;
+    rs->callback = remus_device_preresume_cb;
+    libxl__remus_device_preresume(egc, rs);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
 
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1485,13 +1519,67 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
     }
 }
 
+static void remus_device_commit_cb(libxl__egc *egc,
+                                   libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    } else {
+        /* Set checkpoint interval timeout */
+        rc = libxl__ev_time_register_rel(gc, &rs->timeout,
+                                         remus_next_checkpoint,
+                                         dss->interval);
+        if (rc) {
+            LOG(ERROR, "unable to register timeout for next epoch."
+                " Terminating Remus..");
+            goto out;
+        }
+    }
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* Convenience aliases */
+    libxl__remus_state *const rs = &dss->rs;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rs->callback = remus_device_commit_cb;
+    libxl__remus_device_commit(egc, rs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__remus_state *rs = CONTAINER_OF(ev, *rs, timeout);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dss = CONTAINER_OF(rs, *dss, rs);
+
+    STATE_AO_GC(dss->ao);
+
+    libxl__ev_time_deregister(gc, &rs->timeout);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1716,6 +1804,13 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
+static void libxl__remus_teardown_done(libxl__egc *egc,
+                                           libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    dss->callback(egc, dss, rc);
+}
+
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
@@ -1730,6 +1825,19 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (dss->remus) {
+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        dss->rs.saved_rc = rc;
+        dss->rs.callback = libxl__remus_teardown_done;
+        libxl__remus_device_teardown(egc, &dss->rs);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2b46121..20601b2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2462,6 +2462,118 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+/*----- remus device related state structure -----*/
+
+typedef enum libxl__remus_device_kind {
+    LIBXL__REMUS_DEVICE_NIC,
+    LIBXL__REMUS_DEVICE_DISK,
+} libxl__remus_device_kind;
+
+typedef struct libxl__remus_state libxl__remus_state;
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_device_state libxl__remus_device_state;
+typedef struct libxl__remus_device_ops libxl__remus_device_ops;
+
+struct libxl__remus_device_ops {
+    /*
+     * init device ops private data, etc. must implenment
+     */
+    int (*init)(libxl__remus_device_ops *self,
+                libxl__remus_state *rs);
+    /*
+     * free device ops private data, etc. must implenment
+     */
+    void (*destroy)(libxl__remus_device_ops *self);
+    /* device ops's private data */
+    void *data;
+
+    /*
+     * checkpoint callbacks, async ops. may not implemented
+     */
+    void (*postsuspend)(libxl__remus_device *dev);
+    void (*preresume)(libxl__remus_device *dev);
+    void (*commit)(libxl__remus_device *dev);
+
+    /*
+     * check whether device ops match the device, async op. must implement
+     */
+    void (*match)(libxl__remus_device_ops *self,
+                  libxl__remus_device *dev);
+    /*
+     * setup the remus device, async op. must implement
+     */
+    void (*setup)(libxl__remus_device *dev);
+
+    /*
+     * teardown the remus device, async op. must implement
+     */
+    void (*teardown)(libxl__remus_device *dev);
+};
+
+struct libxl__remus_device_state {
+    libxl__ao *ao;
+    libxl__egc *egc;
+
+    /* devices that have been setuped */
+    libxl__remus_device **dev;
+
+    int num_nics;
+    int num_disks;
+
+    /* for counting devices that have been handled */
+    int num_devices;
+    /* for counting devices that matched and setuped */
+    int num_setuped;
+};
+
+typedef void libxl__remus_device_callback(libxl__egc *,
+                                          libxl__remus_device *,
+                                          int rc);
+
+struct libxl__remus_device {
+    int devid;
+    /* libxl__device_* which this remus device related to */
+    const void *backend_dev;
+    libxl__remus_device_kind kind;
+    int ops_index;
+    libxl__remus_device_ops *ops;
+    libxl__remus_device_callback *callback;
+
+    /* *kind* of device's private data */
+    void *data;
+    libxl__remus_device_state *rds;
+    /* for calling scripts */
+    libxl__async_exec_state aes;
+    /* for async func calls */
+    libxl__ev_child child;
+};
+
+typedef void libxl__remus_callback(libxl__egc *,
+                                   libxl__remus_state *, int rc);
+
+struct libxl__remus_state {
+    libxl__ao *ao;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+
+    /* private */
+    int saved_rc;
+    /* context containing device related stuff */
+    libxl__remus_device_state dev_state;
+
+    libxl__ev_time timeout; /* used for checkpoint */
+};
+
+_hidden void libxl__remus_device_setup(libxl__egc *egc,
+                                       libxl__remus_state *rs);
+_hidden void libxl__remus_device_teardown(libxl__egc *egc,
+                                          libxl__remus_state *rs);
+_hidden void libxl__remus_device_postsuspend(libxl__egc *egc,
+                                             libxl__remus_state *rs);
+_hidden void libxl__remus_device_preresume(libxl__egc *egc,
+                                           libxl__remus_state *rs);
+_hidden void libxl__remus_device_commit(libxl__egc *egc,
+                                        libxl__remus_state *rs);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2492,6 +2604,7 @@ struct libxl__domain_suspend_state {
     int live;
     int debug;
     const libxl_domain_remus_info *remus;
+    libxl__remus_state rs;
     /* private */
     libxl__ev_evtchn guest_evtchn;
     int guest_evtchn_lockfd;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..62c0614
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,303 @@
+/*
+ * Copyright (C) 2014
+ * Author: Lai Jiangshan <laijs@cn.fujitsu.com>
+ *         Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static libxl__remus_device_ops *dev_ops[] = {
+};
+
+static void device_common_cb(libxl__egc *egc,
+                             libxl__remus_device *dev,
+                             int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *const rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    rds->num_devices++;
+
+    if (rc)
+        rs->saved_rc = ERROR_FAIL;
+
+    if (rds->num_devices == rds->num_setuped)
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->postsuspend) {
+            dev->ops->postsuspend(dev);
+        } else {
+            rds->num_devices++;
+        }
+    }
+
+    if (rds->num_devices == rds->num_setuped)
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->preresume) {
+            dev->ops->preresume(dev);
+        } else {
+            rds->num_devices++;
+        }
+    }
+
+    if (rds->num_devices == rds->num_setuped)
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /*
+     * REMUS TODO: Wait for disk and explicit memory ack (through restore
+     * callback from remote) before releasing network buffer.
+     */
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->commit) {
+            dev->ops->commit(dev);
+        } else {
+            rds->num_devices++;
+        }
+    }
+
+    if (rds->num_devices == rds->num_setuped)
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+static void device_setup_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *const rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    rds->num_devices++;
+    if (!rc) {
+        /* remus device has been setuped */
+        rds->dev[rds->num_setuped++] = dev;
+    } else {
+        /* setup failed */
+        rs->saved_rc = ERROR_FAIL;
+    }
+
+    if (rds->num_devices == (rds->num_nics + rds->num_disks))
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+static void device_match_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+{
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    if (rc) {
+        if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||
+            rc != ERROR_NOT_MATCH) {
+            /* the device can not be matched */
+            rds->num_devices++;
+            rs->saved_rc = ERROR_FAIL;
+            goto out;
+        }
+        /* the ops does not match, try next ops */
+        dev->ops = dev_ops[dev->ops_index];
+        dev->ops->match(dev->ops, dev);
+    } else {
+        /* the ops matched, setup the device */
+        dev->callback = device_setup_cb;
+        dev->ops->setup(dev);
+    }
+
+out:
+    if (rds->num_devices == (rds->num_nics + rds->num_disks))
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+static void device_teardown_cb(libxl__egc *egc,
+                               libxl__remus_device *dev,
+                               int rc)
+{
+    int i;
+    libxl__remus_device_ops *ops;
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    /* ignore teardown errors to teardown as many devs as possible*/
+    rds->num_setuped--;
+
+    if (rds->num_setuped == 0) {
+        /* clean device ops */
+        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+            ops = dev_ops[i];
+            ops->destroy(ops);
+        }
+        rs->callback(egc, rs, rs->saved_rc);
+    }
+}
+
+static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+                                     libxl__remus_device_state *rds,
+                                     libxl__remus_device_kind kind,
+                                     void *libxl_dev)
+{
+    libxl__remus_device *dev = NULL;
+    libxl_device_nic *nic = NULL;
+    libxl_device_disk *disk = NULL;
+
+    STATE_AO_GC(rds->ao);
+    GCNEW(dev);
+    dev->ops_index = 0; /* we will match the ops later */
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->rds = rds;
+
+    switch (kind) {
+        case LIBXL__REMUS_DEVICE_NIC:
+            nic = libxl_dev;
+            dev->devid = nic->devid;
+            break;
+        case LIBXL__REMUS_DEVICE_DISK:
+            disk = libxl_dev;
+            /* there are no dev id for disk devices */
+            dev->devid = -1;
+            break;
+        default:
+            return;
+    }
+
+    libxl__async_exec_init(&dev->aes);
+    libxl__ev_child_init(&dev->child);
+
+    /* match the ops begin */
+    dev->callback = device_match_cb;
+    dev->ops = dev_ops[dev->ops_index];
+    dev->ops->match(dev->ops, dev);
+}
+
+void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device_ops *ops;
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    STATE_AO_GC(rs->ao);
+
+    if (ARRAY_SIZE(dev_ops) == 0)
+        goto out;
+
+    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+        ops = dev_ops[i];
+        if (ops->init(ops, rs)) {
+            rs->saved_rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
+    rds->ao = rs->ao;
+    rds->egc = egc;
+    rds->num_devices = 0;
+    rds->num_nics = 0;
+    rds->num_disks = 0;
+
+    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
+
+    /* TBD: CALL libxl__remus_device_init to init remus devices */
+
+    if (rds->num_nics == 0 && rds->num_disks == 0)
+        goto out;
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+    return;
+}
+
+void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    if (rds->num_setuped == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_teardown_cb;
+        dev->ops->teardown(dev);
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+    return;
+}
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 745e2ac..36bae04 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -24,7 +24,7 @@ our @msgs = (
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
     [  3, 'scxA',   "suspend", [] ],
-    [  4, 'scxW',   "postcopy", [] ],
+    [  4, 'scxA',   "postcopy", [] ],
     [  5, 'scxA',   "checkpoint", [] ],
     [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 52f1aa9..4278a6b 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
     (-12, "OSEVENT_REG_FAIL"),
     (-13, "BUFFERFULL"),
     (-14, "UNKNOWN_CHILD"),
+    (-15, "NOT_MATCH"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-06-05  1:34 ` [PATCH v10 3/5] remus: introduce remus device Yang Hongyang
@ 2014-06-05  1:34 ` Yang Hongyang
  2014-06-05 16:50   ` Shriram Rajagopalan
  2014-06-05 17:24   ` Ian Jackson
  2014-06-05  1:34 ` [PATCH v10 5/5] libxl: network buffering cmdline switch Yang Hongyang
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:34 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

1.Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
       and tear down the guest's interface.
2.introduces remus-netbuf-setup hotplug script responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering in Remus.  This script is intended to be invoked
  by libxl for each guest interface, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

  The following steps are taken during init:
    a) establish a dedicated remus context containing libnl related
       state (netlink sockets, qdisc caches, etc.,)

  The following steps are taken for each vif during setup:
    a) call the hotplug script to setup its network buffer

    b) Obtain handles to plug qdiscs installed on the IFB devices
       chosen by the hotplug scripts.

  And during teardown, the netlink resources are released, followed by
  invocation of hotplug scripts to remove the ifb devices.
3.implement the remus device interface. setup, teardown, etc.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
 tools/libxl/libxl.c                    |  18 ++
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_internal.h           |   3 +
 tools/libxl/libxl_netbuffer.c          | 519 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  67 +++++
 tools/libxl/libxl_remus_device.c       |  22 +-
 tools/libxl/libxl_types.idl            |   2 +
 10 files changed, 831 insertions(+), 1 deletion(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 70ab7f4..039eaea 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 4874ec5..13e1f5f 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -15,6 +15,7 @@ XEN_SCRIPTS += vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..aed2583
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,183 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB         ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        local installed=`nl-qdisc-list -d $ifb`
+        [ -n "$installed" ] && continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$IFB"
+        add_plug_qdisc "$vifname" "$IFB"
+        release_lock "pickifb"
+
+        #not using xenstore_write that automatically exits on error
+        #because we need to cleanup
+        _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0cdf348..2701ebe 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -764,6 +764,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Convenience aliases */
     libxl__remus_state *const rs = &dss->rs;
+
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
+        if (info->netbufscript) {
+            rs->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            rs->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                libxl__xen_script_dir_path());
+        }
+    }
+
     rs->ao = ao;
     rs->domid = domid;
     rs->saved_rc = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 80947c3..db30a97 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -437,6 +437,19 @@
 #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
 
 /*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
+/*
  * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
  *
  * If this is defined:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 20601b2..f221f97 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2517,6 +2517,7 @@ struct libxl__remus_device_state {
     /* devices that have been setuped */
     libxl__remus_device **dev;
 
+    libxl_device_nic *nics;
     int num_nics;
     int num_disks;
 
@@ -2555,6 +2556,8 @@ struct libxl__remus_state {
     libxl__ao *ao;
     uint32_t domid;
     libxl__remus_callback *callback;
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
 
     /* private */
     int saved_rc;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 8e23d75..8729a3f 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,530 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_state {
+    libxl__ao *ao;
+    uint32_t domid;
+    const char *netbufscript;
+
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+} libxl__remus_netbuf_state;
+
+typedef struct libxl__remus_device_nic {
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+                               const libxl_device_nic *nic)
+{
+    libxl__remus_netbuf_state *netbuf_state = dev->ops->data;
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = netbuf_state->domid;
+
+    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        /* use the default name */
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    /* free qdiscs */
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Now that we have brought up IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    ret = nl_cache_refill(netbuf_state->nlsock, netbuf_state->qdisc_cache);
+    if (ret < 0) {
+        LOG(ERROR, "cannot refill qdisc cache");
+        goto out;
+    }
+
+    /* get a handle to the IFB interface */
+    ifb = NULL;
+    ret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+                               remus_nic->ifb, &ifb);
+    if (ret) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(ret));
+        ret = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = ERROR_FAIL;
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+                                     TC_H_ROOT);
+
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            nl_object_put((struct nl_object *)qdisc);
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+        ret = 0;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+    }
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    return ret;
+}
+
+/*
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *netbuf_state = dev->ops->data;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    /* Convenience aliases */
+    const uint32_t domid = netbuf_state->domid;
+    const int devid = dev->devid;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            netbuf_state->netbufscript, vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(netbuf_state, remus_nic);
+
+out:
+    dev->callback(egc, dev, rc);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    int rc;
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__remus_device_nic *remus_nic = dev->data;
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    free_qdisc(remus_nic);
+
+    dev->callback(egc, dev, rc);
+}
+
+/* the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__async_exec_state *aes,
+                             char *op, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    STATE_AO_GC(ns->ao);
+
+    /* Convenience aliases */
+    char *const script = libxl__strdup(gc, ns->netbufscript);
+    const uint32_t domid = ns->domid;
+    const int dev_id = dev->devid;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = ns->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = netbuf_teardown_script_cb;
+    else
+        aes->callback = netbuf_setup_script_cb;
+}
+
+static int nic_init(libxl__remus_device_ops *self,
+                    libxl__remus_state *rs)
+{
+    int rc;
+    libxl__remus_netbuf_state *ns;
+
+    STATE_AO_GC(rs->ao);
+
+    GCNEW(ns);
+    self->data = ns;
+
+    ns->nlsock = nl_socket_alloc();
+    if (!ns->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        goto out;
+    }
+
+    rc = nl_connect(ns->nlsock, NETLINK_ROUTE);
+    if (rc) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(rc));
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    rc = rtnl_qdisc_alloc_cache(ns->nlsock, &ns->qdisc_cache);
+    if (rc) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(rc));
+        goto out;
+    }
+
+    ns->ao = rs->ao;
+    ns->domid = rs->domid;
+    ns->netbufscript = rs->netbufscript;
+
+    return 0;
+
+out:
+    return ERROR_FAIL;
+}
+
+static void nic_destroy(libxl__remus_device_ops *self)
+{
+    libxl__remus_netbuf_state *ns = self->data;
+
+    if (!self->data)
+        return;
+
+    /* free qdisc cache */
+    if (ns->qdisc_cache) {
+        nl_cache_clear(ns->qdisc_cache);
+        nl_cache_free(ns->qdisc_cache);
+        ns->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (ns->nlsock) {
+        nl_close(ns->nlsock);
+        nl_socket_free(ns->nlsock);
+        ns->nlsock = NULL;
+    }
+}
+
+static void async_call_done(libxl__egc *egc,
+                            libxl__ev_child *child,
+                            pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_device_state *rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+
+    if (WIFEXITED(status)) {
+        dev->callback(egc, dev, -WEXITSTATUS(status));
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void nic_match_async(const libxl__remus_device_ops *self,
+                            libxl__remus_device *dev)
+{
+    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+        _exit(0);
+
+    _exit(-ERROR_NOT_MATCH);
+}
+
+static void nic_match(libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        nic_match_async(self, dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void nic_setup(libxl__remus_device *dev)
+{
+    libxl__remus_device_nic *remus_nic;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(ns->ao);
+
+    GCNEW(remus_nic);
+    dev->data = remus_nic;
+    remus_nic->vif = get_vifname(dev, nic);
+
+    setup_async_exec(&dev->aes, "setup", dev);
+    if (libxl__async_exec_start(gc, &dev->aes)) {
+        goto out;
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/*
+ * Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+static void nic_teardown(libxl__remus_device *dev)
+{
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+
+    STATE_AO_GC(ns->ao);
+
+    setup_async_exec(&dev->aes, "teardown", dev);
+
+    if (libxl__async_exec_start(gc, &dev->aes)) {
+        goto out;
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+static void remus_netbuf_op_async(libxl__remus_device_nic *remus_nic,
+                                  libxl__remus_netbuf_state *netbuf_state,
+                                  int buffer_op)
+{
+    int ret;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    if (buffer_op == tc_buffer_start)
+        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (!ret) {
+        ret = rtnl_qdisc_add(netbuf_state->nlsock,
+                             remus_nic->qdisc,
+                             NLM_F_REQUEST);
+        if (ret)
+            goto out;
+    }
+
+    _exit(0);
+
+out:
+    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+        ((buffer_op == tc_buffer_start) ?
+        "start_new_epoch" : "release_prev_epoch"),
+        remus_nic->ifb, nl_geterror(ret));
+    _exit(-ERROR_FAIL);
+}
+
+static void netbuf_epoch_op(libxl__remus_device *dev, int buffer_op)
+{
+    int pid = -1;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        remus_netbuf_op_async(remus_nic, ns, buffer_op);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void nic_postsuspend(libxl__remus_device *dev)
+{
+    netbuf_epoch_op(dev, tc_buffer_start);
+}
+
+static void nic_commit(libxl__remus_device *dev)
+{
+    netbuf_epoch_op(dev, tc_buffer_release);
+}
+
+libxl__remus_device_ops remus_device_nic = {
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .postsuspend = nic_postsuspend,
+    .commit = nic_commit,
+    .match = nic_match,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 6aa4bf1..7fe288a 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,73 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+static void async_call_done(libxl__egc *egc,
+                            libxl__ev_child *child,
+                            pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_device_state *rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+
+    if (WIFEXITED(status)) {
+        dev->callback(egc, dev, -WEXITSTATUS(status));
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void nic_match_async(const libxl__remus_device_ops *self,
+                            libxl__remus_device *dev)
+{
+    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+        _exit(-ERROR_FAIL);
+
+    _exit(-ERROR_NOT_MATCH);
+}
+
+static void nic_match(libxl__remus_device_ops *self,
+                     libxl__remus_device *dev)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        nic_match_async(self, dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static int nic_init(libxl__remus_device_ops *self,
+                    libxl__remus_state *rs)
+{
+    return 0;
+}
+
+static void nic_destroy(libxl__remus_device_ops *self)
+{
+    return;
+}
+
+libxl__remus_device_ops remus_device_nic = {
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .match = nic_match,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 62c0614..5f07266 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -18,7 +18,9 @@
 
 #include "libxl_internal.h"
 
+extern libxl__remus_device_ops remus_device_nic;
 static libxl__remus_device_ops *dev_ops[] = {
+    &remus_device_nic,
 };
 
 static void device_common_cb(libxl__egc *egc,
@@ -185,6 +187,13 @@ static void device_teardown_cb(libxl__egc *egc,
     rds->num_setuped--;
 
     if (rds->num_setuped == 0) {
+        /* clean nic */
+        for (i = 0; i < rds->num_nics; i++)
+            libxl_device_nic_dispose(&rds->nics[i]);
+        free(rds->nics);
+        rds->nics = NULL;
+        rds->num_nics = 0;
+
         /* clean device ops */
         for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
             ops = dev_ops[i];
@@ -194,7 +203,7 @@ static void device_teardown_cb(libxl__egc *egc,
     }
 }
 
-static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+static void libxl__remus_device_init(libxl__egc *egc,
                                      libxl__remus_device_state *rds,
                                      libxl__remus_device_kind kind,
                                      void *libxl_dev)
@@ -262,10 +271,21 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
 
     /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
 
+    if (rs->netbufscript) {
+        rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
+    }
+
     GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
 
     /* TBD: CALL libxl__remus_device_init to init remus devices */
 
+    if (rs->netbufscript && rds->nics) {
+        for (i = 0; i < rds->num_nics; i++) {
+            libxl__remus_device_init(egc, rds,
+                                     LIBXL__REMUS_DEVICE_NIC, &rds->nics[i]);
+        }
+    }
+
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4278a6b..50bf1ef 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -566,6 +566,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v10 5/5] libxl: network buffering cmdline switch
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-06-05  1:34 ` [PATCH v10 4/5] remus: implement remus network buffering for nic devices Yang Hongyang
@ 2014-06-05  1:34 ` Yang Hongyang
  2014-06-05  1:39   ` [PATCH v10] remus drbd: Implement remus drbd replicated disk Yang Hongyang
  2014-06-05 17:30   ` [PATCH v10 5/5] libxl: network buffering cmdline switch Ian Jackson
  2014-06-05 10:47 ` [PATCH v10 0/5] Remus netbuffer: Network buffering support George Dunlap
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:34 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Command line switch to 'xl remus' command, to enable network buffering.
Pass on this flag to libxl so that it can act accordingly.
Also update man pages to reflect the addition of a new option to
'xl remus' command.

Note: the network buffering is enabled as default. If you want to
disable it, please use -n option.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5    |  6 ++++++
 docs/man/xl.pod.1         | 11 ++++++++++-
 tools/libxl/xl.c          |  4 ++++
 tools/libxl/xl.h          |  1 +
 tools/libxl/xl_cmdimpl.c  | 28 ++++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c |  3 +++
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 30bd4bf..8b0c012 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -431,7 +431,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     There is no support for disk buffering at the moment.
 
 B<OPTIONS>
 
@@ -450,6 +450,15 @@ Generally useful for debugging.
 
 Disable memory checkpoint compression.
 
+=item B<-n>
+
+Disable network output buffering.
+
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the instead of
+the default (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 4c5a5ee..f014306 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -176,6 +177,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+        &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..087eb8c 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -170,6 +170,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5195914..ce06e82 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7284,8 +7284,9 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     r_info.blackhole = 0;
     r_info.compression = 1;
+    r_info.netbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7295,6 +7296,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         r_info.compression = 0;
         break;
+    case 'n':
+        r_info.netbuf = 0;
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7306,6 +7313,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (r_info.blackhole) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7343,13 +7353,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4279b9f..3f7520d 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -487,6 +487,9 @@ struct cmd_spec cmd_table[] = {
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
+      "-n                      Disable network output buffering.\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05  1:34 ` [PATCH v10 5/5] libxl: network buffering cmdline switch Yang Hongyang
@ 2014-06-05  1:39   ` Yang Hongyang
  2014-06-05 16:25     ` Shriram Rajagopalan
  2014-06-05 17:30   ` [PATCH v10 5/5] libxl: network buffering cmdline switch Ian Jackson
  1 sibling, 1 reply; 44+ messages in thread
From: Yang Hongyang @ 2014-06-05  1:39 UTC (permalink / raw)
  To: rshriram, ian.jackson, xen-devel, wency, ian.campbell,
	eddie.dong, laijs, andrew.cooper3, yunhong.jiang, roger.pau
  Cc: Yang Hongyang

Implement remus-drbd-replicated-checkpointing-disk based on
generic remus devices framework.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/block-drbd-probe |  84 ++++++++++
 tools/libxl/Makefile                 |   2 +-
 tools/libxl/libxl_internal.h         |   1 +
 tools/libxl/libxl_remus_device.c     |  23 ++-
 tools/libxl/libxl_remus_disk_drbd.c  | 290 +++++++++++++++++++++++++++++++++++
 6 files changed, 394 insertions(+), 7 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 13e1f5f..5dd8599 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -23,6 +23,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..163ad04
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,84 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+#     2: unkown error
+#     3: the drbd device does not use protocol D
+#     4: the drbd device is not ready
+
+drbd_res=
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            drbd_res="$temp_drbd_res"
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+if [[ $? -ne 0 ]]; then
+    exit $?
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+    exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+    exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+    exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 7a722a8..6f4d9b4 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f221f97..47a4ab9 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2519,6 +2519,7 @@ struct libxl__remus_device_state {
 
     libxl_device_nic *nics;
     int num_nics;
+    libxl_device_disk *disks;
     int num_disks;
 
     /* for counting devices that have been handled */
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 5f07266..040441a 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -19,8 +19,10 @@
 #include "libxl_internal.h"
 
 extern libxl__remus_device_ops remus_device_nic;
+extern libxl__remus_device_ops remus_device_drbd_disk;
 static libxl__remus_device_ops *dev_ops[] = {
     &remus_device_nic,
+    &remus_device_drbd_disk,
 };
 
 static void device_common_cb(libxl__egc *egc,
@@ -194,6 +196,13 @@ static void device_teardown_cb(libxl__egc *egc,
         rds->nics = NULL;
         rds->num_nics = 0;
 
+        /* clean disk */
+        for (i = 0; i < rds->num_disks; i++)
+            libxl_device_disk_dispose(&rds->disks[i]);
+        free(rds->disks);
+        rds->disks = NULL;
+        rds->num_disks = 0;
+
         /* clean device ops */
         for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
             ops = dev_ops[i];
@@ -269,15 +278,15 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     rds->num_nics = 0;
     rds->num_disks = 0;
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
-
     if (rs->netbufscript) {
         rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
     }
+    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
 
-    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
+    if (rds->num_nics == 0 && rds->num_disks == 0)
+        goto out;
 
-    /* TBD: CALL libxl__remus_device_init to init remus devices */
+    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
 
     if (rs->netbufscript && rds->nics) {
         for (i = 0; i < rds->num_nics; i++) {
@@ -286,8 +295,10 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
         }
     }
 
-    if (rds->num_nics == 0 && rds->num_disks == 0)
-        goto out;
+    for (i = 0; i < rds->num_disks; i++) {
+        libxl__remus_device_init(egc, rds,
+                                 LIBXL__REMUS_DEVICE_DISK, &rds->disks[i]);
+    }
 
     return;
 
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..f35a406
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+typedef struct libxl__remus_drbd_disk {
+    libxl__remus_device remus_dev;
+    int ctl_fd;
+    int ackwait;
+    const char *path;
+} libxl__remus_drbd_disk;
+
+typedef struct libxl__remus_drbd_state {
+    libxl__ao *ao;
+    char *drbd_probe_script;
+} libxl__remus_drbd_state;
+
+static void drbd_async_call(libxl__remus_device *dev,
+                            void func(libxl__remus_device *),
+                            libxl__ev_child_callback callback)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void chekpoint_async_call_done(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_drbd_disk *rdd = dev->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    if (WIFEXITED(status)) {
+        rdd->ackwait = WEXITSTATUS(status);
+        dev->callback(egc, dev, 0);
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void drbd_postsuspend_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->data;
+    int ackwait = rdd->ackwait;
+
+    if (!ackwait) {
+        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            ackwait = 1;
+    }
+
+    _exit(ackwait);
+}
+
+static void drbd_postsuspend(libxl__remus_device *dev)
+{
+    drbd_async_call(dev, drbd_postsuspend_async, chekpoint_async_call_done);
+}
+
+static void drbd_preresume_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->data;
+    int ackwait = rdd->ackwait;
+
+    if (ackwait) {
+        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        ackwait = 0;
+    }
+
+    _exit(ackwait);
+}
+
+static void drbd_preresume(libxl__remus_device *dev)
+{
+    drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
+}
+
+static int drbd_init(libxl__remus_device_ops *self,
+                     libxl__remus_state *rs)
+{
+    libxl__remus_drbd_state *drbd_state;
+
+    STATE_AO_GC(rs->ao);
+
+    GCNEW(drbd_state);
+    self->data = drbd_state;
+    drbd_state->ao = ao;
+    drbd_state->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                              libxl__xen_script_dir_path());
+
+
+    return 0;
+}
+
+static void drbd_destroy(libxl__remus_device_ops *self)
+{
+    return;
+}
+
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+
+    if (status) {
+        dev->callback(egc, dev, ERROR_NOT_MATCH);
+    } else {
+        dev->callback(egc, dev, 0);
+    }
+}
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0;
+    const libxl_device_disk *disk = dev->backend_dev;
+    libxl__remus_drbd_state *drbd_state = dev->ops->data;
+    libxl__async_exec_state *aes = &dev->aes;
+    STATE_AO_GC(drbd_state->ao);
+
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(aes->env, arraysize);
+    aes->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(aes->args, arraysize);
+    aes->args[nr++] = drbd_state->drbd_probe_script;
+    aes->args[nr++] = disk->pdev_path;
+    aes->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    aes->ao = drbd_state->ao;
+    aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->callback = match_async_exec_cb;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (libxl__async_exec_start(gc, aes))
+        goto out;
+
+    return;
+
+out:
+    dev->callback(egc, dev, ERROR_FAIL);
+}
+
+static void match_async_call_done(libxl__egc *egc,
+                                  libxl__ev_child *child,
+                                  pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    STATE_AO_GC(dev->rds->ao);
+
+    if (WIFEXITED(status)) {
+        if (-WEXITSTATUS(status) == ERROR_NOT_MATCH) {
+            dev->callback(egc, dev, ERROR_NOT_MATCH);
+        } else {
+            match_async_exec(egc, dev);
+        }
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void drbd_match_async(libxl__remus_device *dev)
+{
+    if (dev->kind != LIBXL__REMUS_DEVICE_DISK)
+        _exit(-ERROR_NOT_MATCH);
+
+    _exit(0);
+}
+
+static void drbd_match(libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+{
+    drbd_async_call(dev, drbd_match_async, match_async_call_done);
+}
+
+static void setup_async_call_done(libxl__egc *egc,
+                                 libxl__ev_child *child,
+                                 pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    STATE_AO_GC(dev->rds->ao);
+
+    if (WIFEXITED(status)) {
+        dev->callback(egc, dev, 0);
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void drbd_setup_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk = dev->data;
+
+    if (drbd_disk->ctl_fd < 0)
+        abort();
+
+    _exit(0);
+}
+
+static void drbd_setup(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk;
+    const libxl_device_disk *disk = dev->backend_dev;
+    STATE_AO_GC(dev->rds->ao);
+
+    GCNEW(drbd_disk);
+    dev->data = drbd_disk;
+    drbd_disk->path = disk->pdev_path;
+    drbd_disk->ackwait = 0;
+    drbd_disk->ctl_fd = open(drbd_disk->path, O_RDONLY);
+    drbd_async_call(dev, drbd_setup_async, setup_async_call_done);
+}
+
+static void teardown_async_call_done(libxl__egc *egc,
+                                     libxl__ev_child *child,
+                                     pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    STATE_AO_GC(dev->rds->ao);
+
+    dev->callback(egc, dev, 0);
+}
+
+static void drbd_teardown_async(libxl__remus_device *dev)
+{
+    _exit(0);
+}
+
+static void drbd_teardown(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk = dev->data;
+
+    close(drbd_disk->ctl_fd);
+    drbd_async_call(dev, drbd_teardown_async, teardown_async_call_done);
+}
+
+libxl__remus_device_ops remus_device_drbd_disk = {
+    .init = drbd_init,
+    .destroy = drbd_destroy,
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+    .match = drbd_match,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 0/5] Remus netbuffer: Network buffering support
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-06-05  1:34 ` [PATCH v10 5/5] libxl: network buffering cmdline switch Yang Hongyang
@ 2014-06-05 10:47 ` George Dunlap
  2014-06-06  2:17   ` Hongyang Yang
  2014-06-05 16:01 ` Ian Jackson
  2014-06-05 16:12 ` Ian Jackson
  7 siblings, 1 reply; 44+ messages in thread
From: George Dunlap @ 2014-06-05 10:47 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang, Yunhong, Ian Jackson, xen-devel, Dong, Eddie,
	Shriram Rajagopalan, Lai Jiangshan, Roger Pau Monné

On Thu, Jun 5, 2014 at 2:34 AM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:
> This patch series adds support for network buffering in the Remus
> codebase in libxl.

Hongyang,

This is a resend of the series you sent May 21, right?

When you resend a patch series, you can add RESEND to the title (e.g.,
[PATCH v10 RESEND 0/5]) to highlight the fact that it hasn't received
any attention.

 -George

>
> This is a rebased version of v10 series.The first patch was applied,
> but can not found on current master, if it's not necessary, please
> let me know.
>
> the code is also hosted on github:
>
> url: https://github.com/laijs/xen
> branch: remus-0605
>
> Changes V10:
>   Restructured the whole patch series.
>   Introduce the remus device abstract layer.
>   Make remus checkpoint asynchronous.
>
> Changes in V9:
>   Use async exec script api to exec scripts.
>
> Changes in V8:
>   Applied some comments(by IanJ).
>   Merge some struct definitions to it's implementation.
>   (2/3/5 in V7 => 3 in V8)
>
> Changes in V7:
>   Applied missing comments(by IanJ).
>   Applied Shriram comments.
>
>   merge netbufering tangled setup/teardown code into one patch.
>   (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
>
> Changes in V6:
>   Applied Ian Jackson's comments of V5 series.
>   the [PATCH 2/4 V5] is split by small functionalities.
>
>   [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
>
> Changes in V5:
>
> Merge hotplug script patch (2/5) and hotplug script setup/teardown
> patch (3/5) into a single patch.
>
> Changes in V4:
>
> [1/5] Remove check for libnl command line utils in autoconf checks
>
> [2/5] minor nits
>
> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
>
> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
>
> [5/5] minor nits
>
> Changes in V3:
> [1/5] Fix redundant checks in configure scripts
>       (based on Ian Campbell's suggestions)
>
> [2/5] Introduce locking in the script, during IFB setup.
>       Add xenstore paths used by netbuf scripts
>       to xenstore-paths.markdown
>
> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
>       following IanJ's feedback.  However, the invocations are still
>       sequential.
>
> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
>       commmand.
>
> And minor nits throughout the series based on feedback from
> the last version
>
> Changes in V2:
> [1/5] Configure script will automatically enable/disable network
>       buffer support depending on the availability of the appropriate
>       libnl3 version. [If libnl3 is unavailable, a warning message will be
>       printed to let the user know that the feature has been disabled.]
>
>       use macros from pkg.m4 instead of pkg-config commands
>       removed redundant checks for libnl3 libraries.
>
> [3,4/5] - Minor nits.
>
> Version 1:
>
> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
>       to libxl Makefile.
>
> [2/5] External script to setup/teardown network buffering using libnl3's
>       CLI. This script will be invoked by libxl before starting Remus.
>       The script's main job is to bring up an IFB device with plug qdisc
>       attached to it.  It then re-routes egress traffic from the guest's
>       vif to the IFB device.
>
> [3/5] Libxl code to invoke the external setup script, followed by netlink
>       related setup to obtain a handle on the output buffers attached
>       to each vif.
>
> [4/5] Libxl interaction with network buffer module in the kernel via
>       libnl3 API.
>
> [5/5] xl cmdline switch to explicitly enable network buffering when
>       starting remus.
>
>
>   Few things to note(by shriram):
>
>     a) Based on previous email discussions, the setup/teardown task has
>     been moved to a hotplug style shell script which can be customized as
>     desired, instead of implementing it as C code inside libxl.
>
>     b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
>    (Linux).  So I have made network buffering support an optional feature
>    so that it can be disabled if desired.
>
>    c) NetBSD does not have libnl3. So I have put the setup script under
>    tools/hotplug/Linux folder.
>
> thanks
>
> Shriram Rajagopalan (1):
>   libxl: network buffering cmdline switch
>
> Yang Hongyang (4):
>   libxl: introduce asynchronous execution API
>   remus: add libnl3 dependency for network buffering support
>   remus: introduce remus device
>   remus: implement remus network buffering for nic devices
>
>  README                                 |   4 +
>  config/Tools.mk.in                     |   4 +
>  docs/man/xl.conf.pod.5                 |   6 +
>  docs/man/xl.pod.1                      |  11 +-
>  docs/misc/xenstore-paths.markdown      |   4 +
>  tools/configure.ac                     |  15 +
>  tools/hotplug/Linux/Makefile           |   1 +
>  tools/hotplug/Linux/remus-netbuf-setup | 183 +++++++++++
>  tools/libxl/Makefile                   |  15 +
>  tools/libxl/libxl.c                    |  52 +++-
>  tools/libxl/libxl.h                    |  13 +
>  tools/libxl/libxl_aoutils.c            |  89 ++++++
>  tools/libxl/libxl_device.c             |  78 ++---
>  tools/libxl/libxl_dom.c                | 132 +++++++-
>  tools/libxl/libxl_internal.h           | 151 ++++++++-
>  tools/libxl/libxl_netbuffer.c          | 550 +++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nonetbuffer.c        |  98 ++++++
>  tools/libxl/libxl_remus_device.c       | 323 +++++++++++++++++++
>  tools/libxl/libxl_save_msgs_gen.pl     |   2 +-
>  tools/libxl/libxl_types.idl            |   3 +
>  tools/libxl/xl.c                       |   4 +
>  tools/libxl/xl.h                       |   1 +
>  tools/libxl/xl_cmdimpl.c               |  28 +-
>  tools/libxl/xl_cmdtable.c              |   3 +
>  tools/remus/README                     |   6 +
>  25 files changed, 1694 insertions(+), 82 deletions(-)
>  create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>  create mode 100644 tools/libxl/libxl_netbuffer.c
>  create mode 100644 tools/libxl/libxl_nonetbuffer.c
>  create mode 100644 tools/libxl/libxl_remus_device.c
>
> --
> 1.9.1
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 1/5] libxl: introduce asynchronous execution API
  2014-06-05  1:34 ` [PATCH v10 1/5] libxl: introduce asynchronous execution API Yang Hongyang
@ 2014-06-05 16:01   ` Ian Jackson
  0 siblings, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 16:01 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 1/5] libxl: introduce asynchronous execution API"):
> 1.introduce asynchronous execution API:
>   libxl__async_exec_init
>   libxl__async_exec_start
>   libxl__async_exec_inuse
> 2.use the async exec API to execute device hotplug scripts
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

This patch was accepted into xen-unstable on the 2nd of June.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 0/5] Remus netbuffer: Network buffering support
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-06-05 10:47 ` [PATCH v10 0/5] Remus netbuffer: Network buffering support George Dunlap
@ 2014-06-05 16:01 ` Ian Jackson
  2014-06-05 16:12 ` Ian Jackson
  7 siblings, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 16:01 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 0/5] Remus netbuffer: Network buffering support"):
> This patch series adds support for network buffering in the Remus
> codebase in libxl.

Thanks.  Sorry for not giving this attention sooner.

> the code is also hosted on github:
> 
> url: https://github.com/laijs/xen
> branch: remus-0605

Thanks, that is also very convenient.

I'll go through the individual patches now.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 0/5] Remus netbuffer: Network buffering support
  2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
                   ` (6 preceding siblings ...)
  2014-06-05 16:01 ` Ian Jackson
@ 2014-06-05 16:12 ` Ian Jackson
  2014-06-06  2:26   ` Hongyang Yang
  7 siblings, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 16:12 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 0/5] Remus netbuffer: Network buffering support"):
> the code is also hosted on github:
> 
> url: https://github.com/laijs/xen
> branch: remus-0605

I see that this branch contains a commit "remus drbd: Implement remus
drbd replicated disk" which is not in the 5-patch series you posted.
I guess that that commit was added later - I can see it in my inbox.

If you find the need to add a new patch to a series and update the
already-published git branch, you should send a heads-up to the
recipients of your 0/N email.  When you do this it would be better not
to reuse the existing published git ref name, but to make a new one,
unless you do the followup very quickly after the initial email.

This is useful because the committers and reviewers rely on the two
versions of the series (available via git, and via email) being
identical.

Also, if you send the new patch as N+1/N, eg, in this case
  Subject: [PATCH v10 6/5] remus drbd: Implement remus drbd replicated disk
then it becomes a bit clear that it's supposed to tie into the series.

I will review your drbd patch as part of the series.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support
  2014-06-05  1:34 ` [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support Yang Hongyang
@ 2014-06-05 16:18   ` Ian Jackson
  2014-06-06  1:48     ` Hongyang Yang
  0 siblings, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 16:18 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 2/5] remus: add libnl3 dependency for network buffering support"):
> Libnl3 is required for controlling Remus network buffering.
> This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
> Also provide ability to configure tools without libnl3 support, that
> is without network buffering support.

This patch looks broadly good to me.  I have some very minor comments
about the details.

> when there's no network buffering support,libxl__netbuffer_enabled()
> returns 0, otherwise returns 1.

The commit message should explicitly state that callers will be
introduced in the rest of the series.

> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

For a patch which changes configure.ac, it would be helpful to add a
reminder (for the commiter) to rerun autogen.sh.  This should ideally
appear just before the first Signed-off-by.  The committer should
delete the note, and rerun autogen.sh, as they apply the patch.

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
...
> +# Check for libnl3 >=3.2.8. If present enable remus network buffering.
> +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
> +    [libnl3_lib="y"], [libnl3_lib="n"])
> +
> +AS_IF([test "x$libnl3_lib" = "xn" ], [
> +    AC_MSG_WARN([Disabling support for Remus network buffering.
> +    Please install libnl3 libraries, command line tools and devel
> +    headers - version 3.2.8 or higher])
> +    AC_SUBST(remus_netbuf, [n])
> +    ],[
> +    AC_SUBST(LIBNL3_LIBS)
> +    AC_SUBST(LIBNL3_CFLAGS)

It might be better to put these AC_SUBSTs into the main body of
configure.ac ?  Like this:

   diff --git a/tools/configure.ac b/tools/configure.ac
   index 38d2d05..ee36707 100644
   --- a/tools/configure.ac
   +++ b/tools/configure.ac
   @@ -257,10 +257,11 @@ AS_IF([test "x$libnl3_lib" = "xn" ], [
        headers - version 3.2.8 or higher])
        AC_SUBST(remus_netbuf, [n])
        ],[
   -    AC_SUBST(LIBNL3_LIBS)
   -    AC_SUBST(LIBNL3_CFLAGS)
        AC_SUBST(remus_netbuf, [y])
    ])

   +AC_SUBST(LIBNL3_LIBS)
   +AC_SUBST(LIBNL3_CFLAGS)
   +
    AC_OUTPUT()

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05  1:39   ` [PATCH v10] remus drbd: Implement remus drbd replicated disk Yang Hongyang
@ 2014-06-05 16:25     ` Shriram Rajagopalan
  2014-06-05 17:41       ` Ian Jackson
  2014-06-06  2:21       ` Hongyang Yang
  0 siblings, 2 replies; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-05 16:25 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Lai Jiangshan, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, xen-devel, Andrew Cooper, Roger Pau Monné,
	Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 10848 bytes --]

On Wed, Jun 4, 2014 at 8:39 PM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:

> Implement remus-drbd-replicated-checkpointing-disk based on
> generic remus devices framework.
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/hotplug/Linux/Makefile         |   1 +
>  tools/hotplug/Linux/block-drbd-probe |  84 ++++++++++
>  tools/libxl/Makefile                 |   2 +-
>  tools/libxl/libxl_internal.h         |   1 +
>  tools/libxl/libxl_remus_device.c     |  23 ++-
>  tools/libxl/libxl_remus_disk_drbd.c  | 290
> +++++++++++++++++++++++++++++++++++
>  6 files changed, 394 insertions(+), 7 deletions(-)
>  create mode 100755 tools/hotplug/Linux/block-drbd-probe
>  create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
>
> diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
> index 13e1f5f..5dd8599 100644
> --- a/tools/hotplug/Linux/Makefile
> +++ b/tools/hotplug/Linux/Makefile
> @@ -23,6 +23,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
>  XEN_SCRIPTS += external-device-migrate
>  XEN_SCRIPTS += vscsi
>  XEN_SCRIPTS += block-iscsi
> +XEN_SCRIPTS += block-drbd-probe
>  XEN_SCRIPTS += $(XEN_SCRIPTS-y)
>
>  XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
> diff --git a/tools/hotplug/Linux/block-drbd-probe
> b/tools/hotplug/Linux/block-drbd-probe
> new file mode 100755
> index 0000000..163ad04
> --- /dev/null
> +++ b/tools/hotplug/Linux/block-drbd-probe
> @@ -0,0 +1,84 @@
> +#! /bin/bash
> +#
> +# Copyright (C) 2014 FUJITSU LIMITED
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of version 2.1 of the GNU Lesser General
> Public
> +# License as published by the Free Software Foundation.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
>  USA
> +#
> +# Usage:
> +#     block-drbd-probe devicename
> +#
> +# Return value:
> +#     0: the device is drbd device
> +#     1: the device is not drbd device
> +#     2: unkown error
> +#     3: the drbd device does not use protocol D
> +#     4: the drbd device is not ready
> +
> +drbd_res=
> +
> +function get_res_name()
> +{
> +    local drbd_dev=$1
> +    local drbd_dev_list=($(drbdadm sh-dev all))
> +    local drbd_res_list=($(drbdadm sh-resource all))
> +    local temp_drbd_dev temp_drbd_res
> +    local found=0
> +
> +    for temp_drbd_dev in ${drbd_dev_list[@]}; do
> +        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
> +            found=1
> +            break
> +        fi
> +    done
> +
> +    if [[ $found -eq 0 ]]; then
> +        return 1
> +    fi
> +
> +    for temp_drbd_res in ${drbd_res_list[@]}; do
> +        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
> +        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
> +            drbd_res="$temp_drbd_res"
> +            return 0
> +        fi
> +    done
> +
> +    # OOPS
> +    return 2
> +}
> +
> +get_res_name $1
> +if [[ $? -ne 0 ]]; then
> +    exit $?
> +fi
> +
> +# check protocol
> +drbdsetup $1 show | grep -q "protocol D;"
> +if [[ $? -ne 0 ]]; then
> +    exit 3
> +fi
> +
> +# check connect status
> +state=$(drbdadm cstate "$drbd_res")
> +if [[ "$state" != "Connected" ]]; then
> +    exit 4
> +fi
> +
> +# check role
> +role=$(drbdadm role "$drbd_res")
> +if [[ "$role" != "Primary/Secondary" ]]; then
> +    exit 4
> +fi
> +
> +exit 0
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 7a722a8..6f4d9b4 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -56,7 +56,7 @@ else
>  LIBXL_OBJS-y += libxl_nonetbuffer.o
>  endif
>
> -LIBXL_OBJS-y += libxl_remus_device.o
> +LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
>
>  LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
>  LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f221f97..47a4ab9 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2519,6 +2519,7 @@ struct libxl__remus_device_state {
>
>      libxl_device_nic *nics;
>      int num_nics;
> +    libxl_device_disk *disks;
>      int num_disks;
>
>      /* for counting devices that have been handled */
> diff --git a/tools/libxl/libxl_remus_device.c
> b/tools/libxl/libxl_remus_device.c
> index 5f07266..040441a 100644
> --- a/tools/libxl/libxl_remus_device.c
> +++ b/tools/libxl/libxl_remus_device.c
> @@ -19,8 +19,10 @@
>  #include "libxl_internal.h"
>
>  extern libxl__remus_device_ops remus_device_nic;
> +extern libxl__remus_device_ops remus_device_drbd_disk;
>  static libxl__remus_device_ops *dev_ops[] = {
>      &remus_device_nic,
> +    &remus_device_drbd_disk,
>  };
>
>  static void device_common_cb(libxl__egc *egc,
> @@ -194,6 +196,13 @@ static void device_teardown_cb(libxl__egc *egc,
>          rds->nics = NULL;
>          rds->num_nics = 0;
>
> +        /* clean disk */
> +        for (i = 0; i < rds->num_disks; i++)
> +            libxl_device_disk_dispose(&rds->disks[i]);
> +        free(rds->disks);
> +        rds->disks = NULL;
> +        rds->num_disks = 0;
> +
>          /* clean device ops */
>          for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>              ops = dev_ops[i];
> @@ -269,15 +278,15 @@ void libxl__remus_device_setup(libxl__egc *egc,
> libxl__remus_state *rs)
>      rds->num_nics = 0;
>      rds->num_disks = 0;
>
> -    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
> -
>      if (rs->netbufscript) {
>          rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
>      }
> +    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
>
> -    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
> +    if (rds->num_nics == 0 && rds->num_disks == 0)
> +        goto out;
>
> -    /* TBD: CALL libxl__remus_device_init to init remus devices */
> +    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
>
>      if (rs->netbufscript && rds->nics) {
>          for (i = 0; i < rds->num_nics; i++) {
> @@ -286,8 +295,10 @@ void libxl__remus_device_setup(libxl__egc *egc,
> libxl__remus_state *rs)
>          }
>      }
>
> -    if (rds->num_nics == 0 && rds->num_disks == 0)
> -        goto out;
> +    for (i = 0; i < rds->num_disks; i++) {
> +        libxl__remus_device_init(egc, rds,
> +                                 LIBXL__REMUS_DEVICE_DISK,
> &rds->disks[i]);
> +    }
>
>      return;
>
> diff --git a/tools/libxl/libxl_remus_disk_drbd.c
> b/tools/libxl/libxl_remus_disk_drbd.c
> new file mode 100644
> index 0000000..f35a406
> --- /dev/null
> +++ b/tools/libxl/libxl_remus_disk_drbd.c
> @@ -0,0 +1,290 @@
> +/*
> + * Copyright (C) 2014 FUJITSU LIMITED
> + * Author Lai Jiangshan <laijs@cn.fujitsu.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as
> published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +
> +#include "libxl_internal.h"
> +
> +/*** drbd implementation ***/
> +const int DRBD_SEND_CHECKPOINT = 20;
> +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
> +
> +typedef struct libxl__remus_drbd_disk {
> +    libxl__remus_device remus_dev;
> +    int ctl_fd;
> +    int ackwait;
> +    const char *path;
> +} libxl__remus_drbd_disk;
> +
> +typedef struct libxl__remus_drbd_state {
> +    libxl__ao *ao;
> +    char *drbd_probe_script;
> +} libxl__remus_drbd_state;
> +
> +static void drbd_async_call(libxl__remus_device *dev,
> +                            void func(libxl__remus_device *),
> +                            libxl__ev_child_callback callback)
> +{
> +    int pid = -1;
> +    STATE_AO_GC(dev->rds->ao);
> +
> +    /* Fork and call */
> +    pid = libxl__ev_child_fork(gc, &dev->child, callback);
> +    if (pid == -1) {
> +        LOG(ERROR, "unable to fork");
> +        goto out;
> +    }
> +
> +    if (!pid) {
> +        /* child */
> +        func(dev);
> +        /* notreached */
> +        abort();
> +    }
> +
> +    return;
> +
> +out:
> +    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
> +}
> +
> +static void chekpoint_async_call_done(libxl__egc *egc,
> +                                      libxl__ev_child *child,
> +                                      pid_t pid, int status)
> +{
> +    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
> +    libxl__remus_drbd_disk *rdd = dev->data;
> +    STATE_AO_GC(dev->rds->ao);
> +
> +    if (WIFEXITED(status)) {
> +        rdd->ackwait = WEXITSTATUS(status);
> +        dev->callback(egc, dev, 0);
> +    } else {
> +        dev->callback(egc, dev, ERROR_FAIL);
> +    }
> +}
> +
> +static void drbd_postsuspend_async(libxl__remus_device *dev)
> +{
> +    libxl__remus_drbd_disk *rdd = dev->data;
> +    int ackwait = rdd->ackwait;
> +
> +    if (!ackwait) {
> +        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
> +            ackwait = 1;
> +    }
> +
> +    _exit(ackwait);
> +}
> +
> +static void drbd_postsuspend(libxl__remus_device *dev)
> +{
> +    drbd_async_call(dev, drbd_postsuspend_async,
> chekpoint_async_call_done);
> +}
> +
> +static void drbd_preresume_async(libxl__remus_device *dev)
> +{
> +    libxl__remus_drbd_disk *rdd = dev->data;
> +    int ackwait = rdd->ackwait;
> +
> +    if (ackwait) {
> +        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
> +        ackwait = 0;
> +    }
> +
> +    _exit(ackwait);
> +}
> +
> +static void drbd_preresume(libxl__remus_device *dev)
> +{
> +    drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
> +}
> +
>


Please get rid of the async execution just to execute a sys call. Not to
mention
a fork & exec per sys call, per checkpoint would just add more overhead
than what
can be gleaned through async execution.

But the setup and teardown can use the async execution drbd_async_call as
they involve
invoking the scripts.

Apart from that, the rest of the code looks fine structurally.


shriram

[-- Attachment #1.2: Type: text/html, Size: 13331 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05  1:34 ` [PATCH v10 4/5] remus: implement remus network buffering for nic devices Yang Hongyang
@ 2014-06-05 16:50   ` Shriram Rajagopalan
  2014-06-05 17:37     ` Ian Jackson
  2014-06-06  1:59     ` Hongyang Yang
  2014-06-05 17:24   ` Ian Jackson
  1 sibling, 2 replies; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-05 16:50 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monné,
	Lai Jiangshan


[-- Attachment #1.1: Type: text/plain, Size: 28311 bytes --]

On Wed, Jun 4, 2014 at 8:34 PM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:

> 1.Add two members in libxl_domain_remus_info:
>     netbuf: whether netbuf is enabled
>     netbufscript: the path of the script which will be run to setup
>        and tear down the guest's interface.
> 2.introduces remus-netbuf-setup hotplug script responsible for
>   setting up and tearing down the necessary infrastructure required for
>   network output buffering in Remus.  This script is intended to be invoked
>   by libxl for each guest interface, when starting or stopping Remus.
>
>   Apart from returning success/failure indication via the usual hotplug
>   entries in xenstore, this script also writes to xenstore, the name of
>   the IFB device to be used to control the vif's network output.
>
>   The script relies on libnl3 command line utilities to perform various
>   setup/teardown functions. The script is confined to Linux platforms only
>   since NetBSD does not seem to have libnl3.
>
>   The following steps are taken during init:
>     a) establish a dedicated remus context containing libnl related
>        state (netlink sockets, qdisc caches, etc.,)
>
>   The following steps are taken for each vif during setup:
>     a) call the hotplug script to setup its network buffer
>
>     b) Obtain handles to plug qdiscs installed on the IFB devices
>        chosen by the hotplug scripts.
>
>   And during teardown, the netlink resources are released, followed by
>   invocation of hotplug scripts to remove the ifb devices.
> 3.implement the remus device interface. setup, teardown, etc.
>
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  docs/misc/xenstore-paths.markdown      |   4 +
>  tools/hotplug/Linux/Makefile           |   1 +
>  tools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
>  tools/libxl/libxl.c                    |  18 ++
>  tools/libxl/libxl.h                    |  13 +
>  tools/libxl/libxl_internal.h           |   3 +
>  tools/libxl/libxl_netbuffer.c          | 519
> +++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nonetbuffer.c        |  67 +++++
>  tools/libxl/libxl_remus_device.c       |  22 +-
>  tools/libxl/libxl_types.idl            |   2 +
>  10 files changed, 831 insertions(+), 1 deletion(-)
>  create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>
> diff --git a/docs/misc/xenstore-paths.markdown
> b/docs/misc/xenstore-paths.markdown
> index 70ab7f4..039eaea 100644
> --- a/docs/misc/xenstore-paths.markdown
> +++ b/docs/misc/xenstore-paths.markdown
> @@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
>
>  The device model version for a domain.
>
> +#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
> +
> +ifb device used by Remus to buffer network output from the associated vif.
> +
>  [BLKIF]:
> http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
>  [FBIF]:
> http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
>  [HVMPARAMS]:
> http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
> diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
> index 4874ec5..13e1f5f 100644
> --- a/tools/hotplug/Linux/Makefile
> +++ b/tools/hotplug/Linux/Makefile
> @@ -15,6 +15,7 @@ XEN_SCRIPTS += vif-nat
>  XEN_SCRIPTS += vif-openvswitch
>  XEN_SCRIPTS += vif2
>  XEN_SCRIPTS += vif-setup
> +XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
>  XEN_SCRIPTS += block
>  XEN_SCRIPTS += block-enbd block-nbd
>  XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
> diff --git a/tools/hotplug/Linux/remus-netbuf-setup
> b/tools/hotplug/Linux/remus-netbuf-setup
> new file mode 100644
> index 0000000..aed2583
> --- /dev/null
> +++ b/tools/hotplug/Linux/remus-netbuf-setup
> @@ -0,0 +1,183 @@
> +#!/bin/bash
>
> +#============================================================================
> +# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
> +#
> +# Script for attaching a network buffer to the specified vif (in any
> mode).
> +# The hotplugging system will call this script when starting remus via
> libxl
> +# API, libxl_domain_remus_start.
> +#
> +# Usage:
> +# remus-netbuf-setup (setup|teardown)
> +#
> +# Environment vars:
> +# vifname     vif interface name (required).
> +# XENBUS_PATH path in Xenstore, where the IFB device details will be
> stored
> +#                      or read from (required).
> +#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
> +# IFB         ifb interface to be cleaned up (required). [for teardown op
> only]
> +
> +# Written to the store: (setup operation)
> +# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
> +#  as the intermediate buffer through which the interface's network output
> +#  can be controlled.
> +#
> +# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
> +# we need to do the following
> +#
> +#  ip link set dev ifb0 up
> +#  tc qdisc add dev vif1.0 ingress
> +#  tc filter add dev vif1.0 parent ffff: proto ip \
> +#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
> +#  nl-qdisc-add --dev=ifb0 --parent root plug
> +#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
> +#                                                (10MB limit on buffer)
> +#
> +# So order of operations when installing a network buffer on vif1.0
> +# 1. find a free ifb and bring up the device
> +# 2. redirect traffic from vif1.0 to ifb:
> +#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from
> guest)
> +#   2.2 use tc filter command with actions mirred egress + redirect
> +# 3. install plug_qdisc on ifb device, with which we can buffer/release
> +#    guest's network output from vif1.0
> +#
> +#
> +
>
> +#============================================================================
> +
> +# Unlike other vif scripts, vif-common is not needed here as it executes
> vif
> +#specific setup code such as renaming.
> +dir=$(dirname "$0")
> +. "$dir/xen-hotplug-common.sh"
> +
> +findCommand "$@"
> +
> +if [ "$command" != "setup" -a  "$command" != "teardown" ]
> +then
> +  echo "Invalid command: $command"
> +  log err "Invalid command: $command"
> +  exit 1
> +fi
> +
> +evalVariables "$@"
> +
> +: ${vifname:?}
> +: ${XENBUS_PATH:?}
> +
> +check_libnl_tools() {
> +    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
> +        fatal "Unable to find nl-qdisc-list tool"
> +    fi
> +    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
> +        fatal "Unable to find nl-qdisc-add tool"
> +    fi
> +    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
> +        fatal "Unable to find nl-qdisc-delete tool"
> +    fi
> +}
> +
> +# We only check for modules. We don't load them.
> +# User/Admin is supposed to load ifb during boot time,
> +# ensuring that there are enough free ifbs in the system.
> +# Other modules will be loaded automatically by tc commands.
> +check_modules() {
> +    for m in ifb sch_plug sch_ingress act_mirred cls_u32
> +    do
> +        if ! modinfo $m > /dev/null 2>&1; then
> +            fatal "Unable to find $m kernel module"
> +        fi
> +    done
> +}
> +
> +setup_ifb() {
> +
> +    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
> +    do
> +        local installed=`nl-qdisc-list -d $ifb`
> +        [ -n "$installed" ] && continue
> +        IFB="$ifb"
> +        break
> +    done
> +
> +    if [ -z "$IFB" ]
> +    then
> +        fatal "Unable to find a free IFB device for $vifname"
> +    fi
> +
> +    do_or_die ip link set dev "$IFB" up
> +}
> +
> +redirect_vif_traffic() {
> +    local vif=$1
> +    local ifb=$2
> +
> +    do_or_die tc qdisc add dev "$vif" ingress
> +
> +    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
> +        u32 match u32 0 0 action mirred egress redirect dev "$ifb"
> >/dev/null 2>&1
> +
> +    if [ $? -ne 0 ]
> +    then
> +        do_without_error tc qdisc del dev "$vif" ingress
> +        fatal "Failed to redirect traffic from $vif to $ifb"
> +    fi
> +}
> +
> +add_plug_qdisc() {
> +    local vif=$1
> +    local ifb=$2
> +
> +    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
> +    if [ $? -ne 0 ]
> +    then
> +        do_without_error tc qdisc del dev "$vif" ingress
> +        fatal "Failed to add plug qdisc to $ifb"
> +    fi
> +
> +    #set ifb buffering limit in bytes. Its okay if this command fails
> +    nl-qdisc-add --dev="$ifb" --parent root \
> +        --update plug --limit=10000000 >/dev/null 2>&1 || true
> +}
> +
> +teardown_netbuf() {
> +    local vif=$1
> +    local ifb=$2
> +
> +    if [ "$ifb" ]; then
> +        do_without_error ip link set dev "$ifb" down
> +        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug
> >/dev/null 2>&1
> +        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
> +    fi
> +    do_without_error tc qdisc del dev "$vif" ingress
> +    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
> +    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
> +}
> +
> +xs_write_failed() {
> +    local vif=$1
> +    local ifb=$2
> +    teardown_netbuf "$vifname" "$IFB"
> +    fatal "failed to write ifb name to xenstore"
> +}
> +
> +case "$command" in
> +    setup)
> +        check_libnl_tools
> +        check_modules
> +
> +        claim_lock "pickifb"
> +        setup_ifb
> +        redirect_vif_traffic "$vifname" "$IFB"
> +        add_plug_qdisc "$vifname" "$IFB"
> +        release_lock "pickifb"
> +
> +        #not using xenstore_write that automatically exits on error
> +        #because we need to cleanup
> +        _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed
> "$vifname" "$IFB"
> +        success
> +        ;;
> +    teardown)
> +        teardown_netbuf "$vifname" "$IFB"
> +        ;;
> +esac
> +
> +log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 0cdf348..2701ebe 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -764,6 +764,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx,
> libxl_domain_remus_info *info,
>
>      /* Convenience aliases */
>      libxl__remus_state *const rs = &dss->rs;
> +
> +    /* Setup network buffering */
> +    if (info->netbuf) {
> +        if (!libxl__netbuffer_enabled(gc)) {
> +            LOG(ERROR, "Remus: No support for network buffering");
> +            goto out;
> +        }
> +
> +        if (info->netbufscript) {
> +            rs->netbufscript =
> +                libxl__strdup(gc, info->netbufscript);
> +        } else {
> +            rs->netbufscript =
> +                GCSPRINTF("%s/remus-netbuf-setup",
> +                libxl__xen_script_dir_path());
> +        }
> +    }
> +
>      rs->ao = ao;
>      rs->domid = domid;
>      rs->saved_rc = 0;
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 80947c3..db30a97 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -437,6 +437,19 @@
>  #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
>
>  /*
> + * LIBXL_HAVE_REMUS_NETBUF 1
> + *
> + * If this is defined, then the libxl_domain_remus_info structure will
> + * have a boolean field (netbuf) and a string field (netbufscript).
> + *
> + * netbuf, if true, indicates that network buffering should be enabled.
> + *
> + * netbufscript, if set, indicates the path to the hotplug script to
> + * setup or teardown network buffers.
> + */
> +#define LIBXL_HAVE_REMUS_NETBUF 1
> +
> +/*
>   * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
>   *
>   * If this is defined:
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 20601b2..f221f97 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2517,6 +2517,7 @@ struct libxl__remus_device_state {
>      /* devices that have been setuped */
>      libxl__remus_device **dev;
>
> +    libxl_device_nic *nics;
>      int num_nics;
>      int num_disks;
>
> @@ -2555,6 +2556,8 @@ struct libxl__remus_state {
>      libxl__ao *ao;
>      uint32_t domid;
>      libxl__remus_callback *callback;
> +    /* Script to setup/teardown network buffers */
> +    const char *netbufscript;
>
>      /* private */
>      int saved_rc;
> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
> index 8e23d75..8729a3f 100644
> --- a/tools/libxl/libxl_netbuffer.c
> +++ b/tools/libxl/libxl_netbuffer.c
> @@ -17,11 +17,530 @@
>
>  #include "libxl_internal.h"
>
> +#include <netlink/cache.h>
> +#include <netlink/socket.h>
> +#include <netlink/attr.h>
> +#include <netlink/route/link.h>
> +#include <netlink/route/route.h>
> +#include <netlink/route/qdisc.h>
> +#include <netlink/route/qdisc/plug.h>
> +
> +typedef struct libxl__remus_netbuf_state {
> +    libxl__ao *ao;
> +    uint32_t domid;
> +    const char *netbufscript;
> +
> +    struct nl_sock *nlsock;
> +    struct nl_cache *qdisc_cache;
> +} libxl__remus_netbuf_state;
> +
> +typedef struct libxl__remus_device_nic {
> +    const char *vif;
> +    const char *ifb;
> +    struct rtnl_qdisc *qdisc;
> +} libxl__remus_device_nic;
> +
>  int libxl__netbuffer_enabled(libxl__gc *gc)
>  {
>      return 1;
>  }
>
> +/* If the device has a vifname, then use that instead of
> + * the vifX.Y format.
> + */
> +static const char *get_vifname(libxl__remus_device *dev,
> +                               const libxl_device_nic *nic)
> +{
> +    libxl__remus_netbuf_state *netbuf_state = dev->ops->data;
> +    const char *vifname = NULL;
> +    const char *path;
> +    int rc;
> +
> +    STATE_AO_GC(netbuf_state->ao);
> +
> +    /* Convenience aliases */
> +    const uint32_t domid = netbuf_state->domid;
> +
> +    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
> +                          libxl__xs_get_dompath(gc, 0), domid,
> nic->devid);
> +    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
> +    if (!rc && !vifname) {
> +        /* use the default name */
> +        vifname = libxl__device_nic_devname(gc, domid,
> +                                            nic->devid,
> +                                            nic->nictype);
> +    }
> +
> +    return vifname;
> +}
> +
> +static void free_qdisc(libxl__remus_device_nic *remus_nic)
> +{
> +    /* free qdiscs */
> +    if (remus_nic->qdisc == NULL)
> +        return;
> +
> +    nl_object_put((struct nl_object *)(remus_nic->qdisc));
> +    remus_nic->qdisc = NULL;
> +}
> +
> +static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
> +                      libxl__remus_device_nic *remus_nic)
> +{
> +    int ret, ifindex;
> +    struct rtnl_link *ifb = NULL;
> +    struct rtnl_qdisc *qdisc = NULL;
> +
> +    STATE_AO_GC(netbuf_state->ao);
> +
> +    /* Now that we have brought up IFB device with plug qdisc for
> +     * this vif, so we need to refill the qdisc cache.
> +     */
> +    ret = nl_cache_refill(netbuf_state->nlsock,
> netbuf_state->qdisc_cache);
> +    if (ret < 0) {
> +        LOG(ERROR, "cannot refill qdisc cache");
> +        goto out;
> +    }
> +
> +    /* get a handle to the IFB interface */
> +    ifb = NULL;
> +    ret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
> +                               remus_nic->ifb, &ifb);
> +    if (ret) {
> +        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
> +            nl_geterror(ret));
> +        ret = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    ret = ERROR_FAIL;
> +    ifindex = rtnl_link_get_ifindex(ifb);
> +    if (!ifindex) {
> +        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
> +        goto out;
> +    }
> +
> +    /* Get a reference to the root qdisc installed on the IFB, by
> +     * querying the qdisc list we obtained earlier. The netbufscript
> +     * sets up the plug qdisc as the root qdisc, so we don't have to
> +     * search the entire qdisc tree on the IFB dev.
> +
> +     * There is no need to explicitly free this qdisc as its just a
> +     * reference from the qdisc cache we allocated earlier.
> +     */
> +    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
> +                                     TC_H_ROOT);
> +
> +    if (qdisc) {
> +        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
> +        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
> +        if (!tc_kind || strcmp(tc_kind, "plug")) {
> +            nl_object_put((struct nl_object *)qdisc);
> +            LOG(ERROR, "plug qdisc is not installed on %s",
> remus_nic->ifb);
> +            goto out;
> +        }
> +        remus_nic->qdisc = qdisc;
> +        ret = 0;
> +    } else {
> +        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
> +    }
> +
> +out:
> +    if (ifb)
> +        rtnl_link_put(ifb);
> +
> +    return ret;
> +}
> +
> +/*
> + * In return, the script writes the name of IFB device (during setup) to
> be
> + * used for output buffering into XENBUS_PATH/ifb
> + */
> +static void netbuf_setup_script_cb(libxl__egc *egc,
> +                                   libxl__async_exec_state *aes,
> +                                   int status)
> +{
> +    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
> +    libxl__remus_device_nic *remus_nic = dev->data;
> +    libxl__remus_netbuf_state *netbuf_state = dev->ops->data;
> +    const char *out_path_base, *hotplug_error = NULL;
> +    int rc;
> +
> +    /* Convenience aliases */
> +    const uint32_t domid = netbuf_state->domid;
> +    const int devid = dev->devid;
> +    const char *const vif = remus_nic->vif;
> +    const char **const ifb = &remus_nic->ifb;
> +
> +    STATE_AO_GC(netbuf_state->ao);
> +
> +    if (status) {
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
> +                              libxl__xs_libxl_path(gc, domid), devid);
> +
> +    rc = libxl__xs_read_checked(gc, XBT_NULL,
> +                                GCSPRINTF("%s/hotplug-error",
> out_path_base),
> +                                &hotplug_error);
> +    if (rc) {
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    if (hotplug_error) {
> +        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
> +            netbuf_state->netbufscript, vif, hotplug_error);
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    rc = libxl__xs_read_checked(gc, XBT_NULL,
> +                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
> +                                          libxl__xs_libxl_path(gc, domid),
> +                                          devid),
> +                                ifb);
> +    if (rc) {
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    if (!(*ifb)) {
> +        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
> +            domid, vif);
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
> +    rc = init_qdisc(netbuf_state, remus_nic);
> +
> +out:
> +    dev->callback(egc, dev, rc);
> +}
> +
> +static void netbuf_teardown_script_cb(libxl__egc *egc,
> +                                      libxl__async_exec_state *aes,
> +                                      int status)
> +{
> +    int rc;
> +    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
> +    libxl__remus_device_nic *remus_nic = dev->data;
> +
> +    if (status)
> +        rc = ERROR_FAIL;
> +    else
> +        rc = 0;
> +
> +    free_qdisc(remus_nic);
> +
> +    dev->callback(egc, dev, rc);
> +}
> +
> +/* the script needs the following env & args
> + * $vifname
> + * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
> + * $IFB (for teardown)
> + * setup/teardown as command line arg.
> + */
> +static void setup_async_exec(libxl__async_exec_state *aes,
> +                             char *op, libxl__remus_device *dev)
> +{
> +    int arraysize, nr = 0;
> +    char **env = NULL, **args = NULL;
> +    libxl__remus_device_nic *remus_nic = dev->data;
> +    libxl__remus_netbuf_state *ns = dev->ops->data;
> +    STATE_AO_GC(ns->ao);
> +
> +    /* Convenience aliases */
> +    char *const script = libxl__strdup(gc, ns->netbufscript);
> +    const uint32_t domid = ns->domid;
> +    const int dev_id = dev->devid;
> +    const char *const vif = remus_nic->vif;
> +    const char *const ifb = remus_nic->ifb;
> +
> +    arraysize = 7;
> +    GCNEW_ARRAY(env, arraysize);
> +    env[nr++] = "vifname";
> +    env[nr++] = libxl__strdup(gc, vif);
> +    env[nr++] = "XENBUS_PATH";
> +    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
> +                          libxl__xs_libxl_path(gc, domid), dev_id);
> +    if (!strcmp(op, "teardown") && ifb) {
> +        env[nr++] = "IFB";
> +        env[nr++] = libxl__strdup(gc, ifb);
> +    }
> +    env[nr++] = NULL;
> +    assert(nr <= arraysize);
> +
> +    arraysize = 3; nr = 0;
> +    GCNEW_ARRAY(args, arraysize);
> +    args[nr++] = script;
> +    args[nr++] = op;
> +    args[nr++] = NULL;
> +    assert(nr == arraysize);
> +
> +    aes->ao = ns->ao;
> +    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
> +    aes->env = env;
> +    aes->args = args;
> +    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
> +    aes->stdfds[0] = -1;
> +    aes->stdfds[1] = -1;
> +    aes->stdfds[2] = -1;
> +
> +    if (!strcmp(op, "teardown"))
> +        aes->callback = netbuf_teardown_script_cb;
> +    else
> +        aes->callback = netbuf_setup_script_cb;
> +}
> +
> +static int nic_init(libxl__remus_device_ops *self,
> +                    libxl__remus_state *rs)
> +{
> +    int rc;
> +    libxl__remus_netbuf_state *ns;
> +
> +    STATE_AO_GC(rs->ao);
> +
> +    GCNEW(ns);
> +    self->data = ns;
> +
> +    ns->nlsock = nl_socket_alloc();
> +    if (!ns->nlsock) {
> +        LOG(ERROR, "cannot allocate nl socket");
> +        goto out;
> +    }
> +
> +    rc = nl_connect(ns->nlsock, NETLINK_ROUTE);
> +    if (rc) {
> +        LOG(ERROR, "failed to open netlink socket: %s",
> +            nl_geterror(rc));
> +        goto out;
> +    }
> +
> +    /* get list of all qdiscs installed on network devs. */
> +    rc = rtnl_qdisc_alloc_cache(ns->nlsock, &ns->qdisc_cache);
> +    if (rc) {
> +        LOG(ERROR, "failed to allocate qdisc cache: %s",
> +            nl_geterror(rc));
> +        goto out;
> +    }
> +
> +    ns->ao = rs->ao;
> +    ns->domid = rs->domid;
> +    ns->netbufscript = rs->netbufscript;
> +
> +    return 0;
> +
> +out:
> +    return ERROR_FAIL;
> +}
> +
> +static void nic_destroy(libxl__remus_device_ops *self)
> +{
> +    libxl__remus_netbuf_state *ns = self->data;
> +
> +    if (!self->data)
> +        return;
> +
> +    /* free qdisc cache */
> +    if (ns->qdisc_cache) {
> +        nl_cache_clear(ns->qdisc_cache);
> +        nl_cache_free(ns->qdisc_cache);
> +        ns->qdisc_cache = NULL;
> +    }
> +
> +    /* close & free nlsock */
> +    if (ns->nlsock) {
> +        nl_close(ns->nlsock);
> +        nl_socket_free(ns->nlsock);
> +        ns->nlsock = NULL;
> +    }
> +}
> +
> +static void async_call_done(libxl__egc *egc,
> +                            libxl__ev_child *child,
> +                            pid_t pid, int status)
> +{
> +    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
> +    libxl__remus_device_state *rds = dev->rds;
> +    STATE_AO_GC(rds->ao);
> +
> +    if (WIFEXITED(status)) {
> +        dev->callback(egc, dev, -WEXITSTATUS(status));
> +    } else {
> +        dev->callback(egc, dev, ERROR_FAIL);
> +    }
> +}
> +
> +static void nic_match_async(const libxl__remus_device_ops *self,
> +                            libxl__remus_device *dev)
> +{
> +    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
> +        _exit(0);
> +
> +    _exit(-ERROR_NOT_MATCH);
> +}
> +
> +static void nic_match(libxl__remus_device_ops *self,
> +                      libxl__remus_device *dev)
> +{
> +    int pid = -1;
> +    STATE_AO_GC(dev->rds->ao);
> +
> +    /* Fork and call */
> +    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
> +    if (pid == -1) {
> +        LOG(ERROR, "unable to fork");
> +        goto out;
> +    }
> +
> +    if (!pid) {
> +        /* child */
> +        nic_match_async(self, dev);
> +        /* notreached */
> +        abort();
> +    }
> +
> +    return;
> +
> +out:
> +    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
> +}
> +
> +static void nic_setup(libxl__remus_device *dev)
> +{
> +    libxl__remus_device_nic *remus_nic;
> +    libxl__remus_netbuf_state *ns = dev->ops->data;
> +    const libxl_device_nic *nic = dev->backend_dev;
> +
> +    STATE_AO_GC(ns->ao);
> +
> +    GCNEW(remus_nic);
> +    dev->data = remus_nic;
> +    remus_nic->vif = get_vifname(dev, nic);
> +
> +    setup_async_exec(&dev->aes, "setup", dev);
> +    if (libxl__async_exec_start(gc, &dev->aes)) {
> +        goto out;
> +    }
> +
> +    return;
> +
> +out:
> +    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
> +}
> +
> +/*
> + * Note: This function will be called in the same gc context as
> + * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
> + * API call.
> + */
> +static void nic_teardown(libxl__remus_device *dev)
> +{
> +    libxl__remus_netbuf_state *ns = dev->ops->data;
> +
> +    STATE_AO_GC(ns->ao);
> +
> +    setup_async_exec(&dev->aes, "teardown", dev);
> +
> +    if (libxl__async_exec_start(gc, &dev->aes)) {
> +        goto out;
> +    }
> +
> +    return;
> +
> +out:
> +    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
> +}
> +
> +/* The buffer_op's value, not the value passed to kernel */
> +enum {
> +    tc_buffer_start,
> +    tc_buffer_release
> +};
> +
> +static void remus_netbuf_op_async(libxl__remus_device_nic *remus_nic,
> +                                  libxl__remus_netbuf_state *netbuf_state,
> +                                  int buffer_op)
> +{
> +    int ret;
> +
> +    STATE_AO_GC(netbuf_state->ao);
> +
> +    if (buffer_op == tc_buffer_start)
> +        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
> +    else
> +        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
> +
> +    if (!ret) {
> +        ret = rtnl_qdisc_add(netbuf_state->nlsock,
> +                             remus_nic->qdisc,
> +                             NLM_F_REQUEST);
> +        if (ret)
> +            goto out;
> +    }
> +
> +    _exit(0);
> +
> +out:
> +    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
> +        ((buffer_op == tc_buffer_start) ?
> +        "start_new_epoch" : "release_prev_epoch"),
> +        remus_nic->ifb, nl_geterror(ret));
> +    _exit(-ERROR_FAIL);
> +}
> +
> +static void netbuf_epoch_op(libxl__remus_device *dev, int buffer_op)
> +{
> +    int pid = -1;
> +    libxl__remus_device_nic *remus_nic = dev->data;
> +    libxl__remus_netbuf_state *ns = dev->ops->data;
> +    STATE_AO_GC(dev->rds->ao);
> +
> +    /* Fork and call */
> +    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
> +    if (pid == -1) {
> +        LOG(ERROR, "unable to fork");
> +        goto out;
> +    }
> +
> +    if (!pid) {
> +        /* child */
> +        remus_netbuf_op_async(remus_nic, ns, buffer_op);
> +        /* notreached */
> +        abort();
> +    }
> +
> +    return;
> +
> +out:
> +    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
> +}
> +
> +static void nic_postsuspend(libxl__remus_device *dev)
> +{
> +    netbuf_epoch_op(dev, tc_buffer_start);
> +}
> +
> +static void nic_commit(libxl__remus_device *dev)
> +{
> +    netbuf_epoch_op(dev, tc_buffer_release);
> +}
> +


The async execution for each netlink call is an overkill.  These rtnl calls
complete
in a matter of few microseconds utmost. On the other hand, this code
structure,
fork/execs a new process for every checkpoint just to execute a single
library call
(netbuf_epoch_op), which in turn issues just a syscall.

Correct me if I am wrong. I am assuming that the libxl__ev_child_fork
eventually
leads to a fork() and exec() call.

Per remus checkpoint
 2 ops for netbuf, 2 for disk.
 1 fork & exec per op for a total of 4 forks per checkpoint. (based on this
patch and the drbd patch)

 At 25 checkpoints per second, you are looking at roughly a 100 fork/execs
per second.

[-- Attachment #1.2: Type: text/html, Size: 35273 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 3/5] remus: introduce remus device
  2014-06-05  1:34 ` [PATCH v10 3/5] remus: introduce remus device Yang Hongyang
@ 2014-06-05 17:06   ` Ian Jackson
  2014-06-06  1:54     ` Hongyang Yang
  2014-06-09  2:08     ` Hongyang Yang
  0 siblings, 2 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 17:06 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 3/5] remus: introduce remus device"):
> introduce remus device, an abstract layer of remus devices(nic, disk,
> etc).It provide the following APIs for libxl:

Thanks.

>   >libxl__remus_device_setup
>     setup remus devices, like attach qdisc, enable disk buffering, etc
>   >libxl__remus_device_teardown
>     teardown devices
>   >libxl__remus_device_postsuspend
>   >libxl__remus_device_preresume
>   >libxl__remus_device_commit
>     above three are for checkpoint.

I started reviewing this patch by reading the commit message and the
changes to libxl_internal.h.

As far as I can tell what's going on, it mostly looks plausible.  But
the new parts of libxl_internal.h are missing important information
about the new interfaces.

I'd like the documentation in libxl_internal.h to be sufficient to
read, understand, and check the code on one side of those interfaces,
without having to go and read the code on the other side.

I'll go into more detail about this below.

Because of this, I haven't read the actual implementation code yet.

> through remus device layer, the remus execution flow will be like
> this:
>   xl remus -> remus device setup
>                 |-> remus checkpoint(postsuspend, commit, preresume)
>                       ...
>                        |-> remus device teardown,failover or abort

This diagram could usefully be transferred into a comment in the code,
probably in libxl_internal.h.

> the remus device layer provide an interface
>   libxl__remus_device_ops
> which a remus device must implement.the whole remus structure:
>                             |remus|
>                                |
>                         |remus device|
>                                |
>                 |nic| |drbd disks| |qemu disks| ...
> a device(nic, drbd disks, qemu disks, etc) must implement
> libxl__remus_device_ops to support remus.

Again, this diagram too.

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 2b46121..20601b2 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
...
> +/*----- remus device related state structure -----*/
> 
> +typedef enum libxl__remus_device_kind {
> +    LIBXL__REMUS_DEVICE_NIC,
> +    LIBXL__REMUS_DEVICE_DISK,
> +} libxl__remus_device_kind;
> +
> +typedef struct libxl__remus_state libxl__remus_state;
> +typedef struct libxl__remus_device libxl__remus_device;
> +typedef struct libxl__remus_device_state libxl__remus_device_state;
> +typedef struct libxl__remus_device_ops libxl__remus_device_ops;

All fine.

> +struct libxl__remus_device_ops {

Who produces and consumes this ?  I _think_ from your diagram above
that this is produced by a device type and consumed by the main remus
code.  Is that right ?  I think this should be documented.

> +    /*
> +     * init device ops private data, etc. must implenment
> +     */

What does "must implement" mean ?  Do you mean that some of these
function pointers can be 0 ?  If so, you should say this explicitly
somewhere and say what 0 means.  (No-op?)

> +    int (*init)(libxl__remus_device_ops *self,
> +                libxl__remus_state *rs);
> +    /*
> +     * free device ops private data, etc. must implenment
> +     */
> +    void (*destroy)(libxl__remus_device_ops *self);
> +    /* device ops's private data */
> +    void *data;

In libxl we often embed structs inside other structs, rather than
chaining them with void*s.  Is tehre some reason why that's not a good
idea here ?

Also, I assume this should be "const void*", since I think this can
only refer to static data ?

> +    /*
> +     * checkpoint callbacks, async ops. may not implemented
> +     */

If these are async ops, what happens when they complete ?

I see a libxl__remus_device_callback typedef below, and a callback
field in libxl__remus_device.  Is that it ?

If so this should be written down.

> +    /*
> +     * check whether device ops match the device, async op. must implement
> +     */
> +    void (*match)(libxl__remus_device_ops *self,
> +                  libxl__remus_device *dev);

I don't understand the purpose of this, but perhaps this is because I
don't understand the lifecycle of a libxl__remus_device and what
fields in it are for which bits of code.

> +    /*
> +     * setup the remus device, async op. must implement
> +     */
> +    void (*setup)(libxl__remus_device *dev);
> +
> +    /*
> +     * teardown the remus device, async op. must implement
> +     */
> +    void (*teardown)(libxl__remus_device *dev);

When we say "setup" and "teardown" we refer to the actual device, not
merely the libxl data structures, which are setup and torn down with
"init" and "destroy" ?  This could be clearer.

> +struct libxl__remus_device_state {
> +    libxl__ao *ao;
> +    libxl__egc *egc;
> +
> +    /* devices that have been setuped */
> +    libxl__remus_device **dev;
> +
> +    int num_nics;
> +    int num_disks;
> +
> +    /* for counting devices that have been handled */
> +    int num_devices;
> +    /* for counting devices that matched and setuped */
> +    int num_setuped;
> +};

We need to know who owns which fields in this structure.  Or who is
supposed to set them.  Also, I'm not sure what this structure is for.
It appears only to be in libxl__remus_state.  What is the reason for
it being separated out ?

> +struct libxl__remus_device {

Again, we need to know who owns/sets which fields in this structure.
(If the struct is shared between different layers of code, it is
normally easiest to do this by reordering the fields into groups
according to their ownership.)

If the whole struct is owned by the same set of code, then we need to
know which set of code that is.  Perhaps by specifying a pattern on
the function name (libxl__remus_device_*?)

> +    int devid;
> +    /* libxl__device_* which this remus device related to */
> +    const void *backend_dev;
> +    libxl__remus_device_kind kind;
> +    int ops_index;

What is ops_index ?

> +    libxl__remus_device_ops *ops;

I think these ops structs are vtables so should be const.

> +    /* for calling scripts */
> +    libxl__async_exec_state aes;

Conversely, I'm not sure that particular comments add anything.
libxl__async_exec_state is always for executing scripts.

> +    /* for async func calls */
> +    libxl__ev_child child;

Is this an ownership comment ?  If so are you sure that the ownership
isn't "this is owned by device-specific ops methods" ?  (It is obvious
that only an /asynchronous/ device-specific ops method would be able
to make use of it.)

> +typedef void libxl__remus_callback(libxl__egc *,
> +                                   libxl__remus_state *, int rc);
> +
> +struct libxl__remus_state {
> +    libxl__ao *ao;
> +    uint32_t domid;
> +    libxl__remus_callback *callback;

You should say that these must be set by the caller.  (Looking at the
API I assume that's the case.)

> +    /* private */
> +    int saved_rc;
> +    /* context containing device related stuff */
> +    libxl__remus_device_state dev_state;
> +
> +    libxl__ev_time timeout; /* used for checkpoint */
> +};
> +
> +_hidden void libxl__remus_device_setup(libxl__egc *egc,
> +                                       libxl__remus_state *rs);
> +_hidden void libxl__remus_device_teardown(libxl__egc *egc,
> +                                          libxl__remus_state *rs);
> +_hidden void libxl__remus_device_postsuspend(libxl__egc *egc,
> +                                             libxl__remus_state *rs);
> +_hidden void libxl__remus_device_preresume(libxl__egc *egc,
> +                                           libxl__remus_state *rs);
> +_hidden void libxl__remus_device_commit(libxl__egc *egc,
> +                                        libxl__remus_state *rs);
>  _hidden int libxl__netbuffer_enabled(libxl__gc *gc);

These functions all call rs->callback() when done ?  If so there
should be a comment to say so.

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 52f1aa9..4278a6b 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>      (-12, "OSEVENT_REG_FAIL"),
>      (-13, "BUFFERFULL"),
>      (-14, "UNKNOWN_CHILD"),
> +    (-15, "NOT_MATCH"),
>      ], value_namespace = "")

It is good that you introduce a new error code for your new error
case.  But I think it needs to have a better name.  What does it
mean ?

In fact, I grepped the whole of this patch for NOT_MATCH and this
new error status is checked for somewhere but never generated!

If it forms a part of the new internal API then that should be
documented.

> diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
> index 745e2ac..36bae04 100755
> --- a/tools/libxl/libxl_save_msgs_gen.pl
> +++ b/tools/libxl/libxl_save_msgs_gen.pl
> @@ -24,7 +24,7 @@ our @msgs = (
>                                                  'unsigned long', 'done',
>                                                  'unsigned long', 'total'] ],
>      [  3, 'scxA',   "suspend", [] ],
> -    [  4, 'scxW',   "postcopy", [] ],
> +    [  4, 'scxA',   "postcopy", [] ],
>      [  5, 'scxA',   "checkpoint", [] ],
>      [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
>                                                unsigned enable)] ],

I think this change (and its consequential changes to the handwritten
parts) should be split out into a "no functional change" pre-patch.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05  1:34 ` [PATCH v10 4/5] remus: implement remus network buffering for nic devices Yang Hongyang
  2014-06-05 16:50   ` Shriram Rajagopalan
@ 2014-06-05 17:24   ` Ian Jackson
  2014-06-10  7:33     ` Hongyang Yang
  1 sibling, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 17:24 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
> 1.Add two members in libxl_domain_remus_info:

Thanks for this patch.

I'm deferring reviewing the parts of this inside libxl which use the
new libxl device interface, until we have the API documentation
comments which I discussed in my last email.  I hope that's OK.

But there i

>     netbuf: whether netbuf is enabled
>     netbufscript: the path of the script which will be run to setup
>        and tear down the guest's interface.
> 2.introduces remus-netbuf-setup hotplug script responsible for
>   setting up and tearing down the necessary infrastructure required for
>   network output buffering in Remus.  This script is intended to be invoked
>   by libxl for each guest interface, when starting or stopping Remus.
> 
>   Apart from returning success/failure indication via the usual hotplug
>   entries in xenstore, this script also writes to xenstore, the name of
>   the IFB device to be used to control the vif's network output.
> 
>   The script relies on libnl3 command line utilities to perform various
>   setup/teardown functions. The script is confined to Linux platforms only
>   since NetBSD does not seem to have libnl3.
> 
>   The following steps are taken during init:
>     a) establish a dedicated remus context containing libnl related
>        state (netlink sockets, qdisc caches, etc.,)
> 
>   The following steps are taken for each vif during setup:
>     a) call the hotplug script to setup its network buffer
> 
>     b) Obtain handles to plug qdiscs installed on the IFB devices
>        chosen by the hotplug scripts.
> 
>   And during teardown, the netlink resources are released, followed by
>   invocation of hotplug scripts to remove the ifb devices.
> 3.implement the remus device interface. setup, teardown, etc.
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  docs/misc/xenstore-paths.markdown      |   4 +
>  tools/hotplug/Linux/Makefile           |   1 +
>  tools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
>  tools/libxl/libxl.c                    |  18 ++
>  tools/libxl/libxl.h                    |  13 +
>  tools/libxl/libxl_internal.h           |   3 +
>  tools/libxl/libxl_netbuffer.c          | 519 +++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nonetbuffer.c        |  67 +++++
>  tools/libxl/libxl_remus_device.c       |  22 +-
>  tools/libxl/libxl_types.idl            |   2 +
>  10 files changed, 831 insertions(+), 1 deletion(-)
>  create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
> 



> diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
> index 70ab7f4..039eaea 100644
> --- a/docs/misc/xenstore-paths.markdown
> +++ b/docs/misc/xenstore-paths.markdown
> @@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
>  
>  The device model version for a domain.
>  
> +#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
> +
> +ifb device used by Remus to buffer network output from the associated vif.
> +

Thanks for updating the doc.  Your changes to the hotplug Makefile
look good too.

> diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
> new file mode 100644
> index 0000000..aed2583
> --- /dev/null
> +++ b/tools/hotplug/Linux/remus-netbuf-setup
> @@ -0,0 +1,183 @@
> +#!/bin/bash
> +#============================================================================
> +# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
> +#
> +# Script for attaching a network buffer to the specified vif (in any mode).
> +# The hotplugging system will call this script when starting remus via libxl
> +# API, libxl_domain_remus_start.

Right.  Thanks for the comprehensive head comment.

> +#============================================================================
...
> +setup_ifb() {
> +
> +    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
> +    do
> +        local installed=`nl-qdisc-list -d $ifb`
> +        [ -n "$installed" ] && continue
> +        IFB="$ifb"
> +        break
> +    done

As far as I can see this attempts to search for an ifb which is not in
use.

I see you claim a lock to ensure that you don't fail due to races with
other copies of this script.

But are there potentially other things (not Xen related, parhaps) in
the system which might try to allocate an ifb using a similar
approach ?  How do we deal with the potential race with them ?

Also: I think you should:
 - write the IFB name to xenstore _before_ starting to configure it
 - in the loop I quote above, check in xenstore that the ifb is not
   in use by another domain

Otherwise there seems to be the following risk:
 1. You pick ifbX using the loop above
 2. You start to configure ifbX, eventually resulting in a
    configuration which makes it not show up as free
 3. Something bad happens and you fail, before writing the
    ifb name to xenstore

In this case, the ifb would be leaked.  (I see you do try to avoid
this with xs_write_failed, but scripts can fail for other reasons.)

> +    do_or_die tc qdisc add dev "$vif" ingress

I'm not qualified to review these tc manipulations.  I guess I'm going
to trust that they're correct.

> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 80947c3..db30a97 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -437,6 +437,19 @@
>  #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
>  
>  /*
> + * LIBXL_HAVE_REMUS_NETBUF 1
> + *
> + * If this is defined, then the libxl_domain_remus_info structure will
> + * have a boolean field (netbuf) and a string field (netbufscript).
> + *
> + * netbuf, if true, indicates that network buffering should be enabled.
> + *
> + * netbufscript, if set, indicates the path to the hotplug script to
> + * setup or teardown network buffers.
> + */
> +#define LIBXL_HAVE_REMUS_NETBUF 1

Good.

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 4278a6b..50bf1ef 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -566,6 +566,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
>      ("interval",     integer),
>      ("blackhole",    bool),
>      ("compression",  bool),
> +    ("netbuf",       bool),
> +    ("netbufscript", string),

I think netbuf should be a defbool, not a bool.  Indeed, perhaps this
is true of the other options too.  Is there some reason it shouldn't
default to enabled ?

You should mention in your commit message that this is going to be
plumbed into xl and the documentation in the next patch.

Regarding the other remus options here (and perhaps changing their
types), I think it would be OK to break API compatibility, since the
previous versions of remus exposed via xl have not been suitable for
deployment.  Do you agree ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 5/5] libxl: network buffering cmdline switch
  2014-06-05  1:34 ` [PATCH v10 5/5] libxl: network buffering cmdline switch Yang Hongyang
  2014-06-05  1:39   ` [PATCH v10] remus drbd: Implement remus drbd replicated disk Yang Hongyang
@ 2014-06-05 17:30   ` Ian Jackson
  2014-06-06  6:34     ` Hongyang Yang
  1 sibling, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 17:30 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v10 5/5] libxl: network buffering cmdline switch"):
> Command line switch to 'xl remus' command, to enable network buffering.
> Pass on this flag to libxl so that it can act accordingly.

You provide a global option to control the script, but no per-domain
config option.  Why ?

A similar question arises about the network buffering boolean.

Wouldn't it be better if these were options on the devices, in the
domain configuration ?

Feel free to tell me I'm wrong and it is better this way, if that's
true - just explain it.

> +     * TODO: Split-Brain check.

What are your plans for the split brain check ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05 16:50   ` Shriram Rajagopalan
@ 2014-06-05 17:37     ` Ian Jackson
  2014-06-05 17:44       ` Ian Jackson
  2014-06-06  1:59     ` Hongyang Yang
  1 sibling, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 17:37 UTC (permalink / raw)
  To: rshriram
  Cc: Roger Pau Monné,
	Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen-devel, Yang Hongyang,
	Lai Jiangshan

Shriram Rajagopalan writes ("Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
...
> The async execution for each netlink call is an overkill.  These
> rtnl calls complete in a matter of few microseconds utmost. On the
> other hand, this code structure, fork/execs a new process for every
> checkpoint just to execute a single library call (netbuf_epoch_op),
> which in turn issues just a syscall.

I haven't read the code to check whether this criticism is accurate,
but if it is I think it would be justified.

There is no need to use the async machinery for fast system calls.

> Correct me if I am wrong. I am assuming that the
> libxl__ev_child_fork eventually leads to a fork() and exec() call.

libxl__ev_child_fork leads to a fork() but not necessarily an exec().
The libxl code which uses it is supposed to make the child either exec
soon, or exit soon.  ("Soon" is defined in more detail in the doc
comment.)

If in fact the proposed patch forks/execs a netlink tool for each
checkpoint, it would probably be better for it to make the relevant
netlink calls directly.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05 16:25     ` Shriram Rajagopalan
@ 2014-06-05 17:41       ` Ian Jackson
  2014-06-05 18:14         ` Shriram Rajagopalan
  2014-06-06  2:21       ` Hongyang Yang
  1 sibling, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 17:41 UTC (permalink / raw)
  To: rshriram
  Cc: Roger Pau Monné,
	Lai Jiangshan, Wen Congyang, Andrew Cooper, Jiang Yunhong,
	Dong Eddie, xen-devel, Yang Hongyang, Ian Campbell

Shriram Rajagopalan writes ("Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk"):
> On Wed, Jun 4, 2014 at 8:39 PM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:
>     +    if (ackwait) {
>     +        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
>     +        ackwait = 0;
>     +    }
...
> Please get rid of the async execution just to execute a sys
> call.

Are you sure ?  Does this syscall not await network traffic ?

What if the network is broken ?  Might it not then delay indefinitely ?

> Not to mention a fork & exec per sys call,

In fact there is no exec, only a fork.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05 17:37     ` Ian Jackson
@ 2014-06-05 17:44       ` Ian Jackson
  2014-06-05 17:56         ` Shriram Rajagopalan
  0 siblings, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 17:44 UTC (permalink / raw)
  To: rshriram, Yang Hongyang, xen-devel, Ian Campbell, Wen Congyang,
	Stefano Stabellini, Andrew Cooper, Jiang Yunhong, Lai Jiangshan,
	Dong Eddie, Roger Pau Monné

Ian Jackson writes ("Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
> Shriram Rajagopalan writes ("Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
> ...
> > The async execution for each netlink call is an overkill.  These
> > rtnl calls complete in a matter of few microseconds utmost. On the
> > other hand, this code structure, fork/execs a new process for every
> > checkpoint just to execute a single library call (netbuf_epoch_op),
> > which in turn issues just a syscall.
> 
> I haven't read the code to check whether this criticism is accurate,
> but if it is I think it would be justified.
> 
> There is no need to use the async machinery for fast system calls.

Having read Shriram's other mail, I feel the need to emphasise the
qualification "fast".

"Fast" means "cannot ever, even in error conditions, take a
significant amount of time".  In particular anything that waits for
incoming network traffic is not "fast".

But AFAICT by looking at the code we are talking only about these
calls:
  rtnl_qdisc_plug_buffer
  rtnl_qdisc_plug_release_one
  rtnl_qdisc_add
Surely these always complete immediately.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05 17:44       ` Ian Jackson
@ 2014-06-05 17:56         ` Shriram Rajagopalan
  2014-06-06  2:08           ` Hongyang Yang
  0 siblings, 1 reply; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-05 17:56 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lai Jiangshan, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen-devel, Ian Campbell,
	Yang Hongyang, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1491 bytes --]

On Jun 5, 2014 11:14 PM, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote:
>
> Ian Jackson writes ("Re: [PATCH v10 4/5] remus: implement remus network
buffering for nic devices"):
> > Shriram Rajagopalan writes ("Re: [PATCH v10 4/5] remus: implement remus
network buffering for nic devices"):
> > ...
> > > The async execution for each netlink call is an overkill.  These
> > > rtnl calls complete in a matter of few microseconds utmost. On the
> > > other hand, this code structure, fork/execs a new process for every
> > > checkpoint just to execute a single library call (netbuf_epoch_op),
> > > which in turn issues just a syscall.
> >
> > I haven't read the code to check whether this criticism is accurate,
> > but if it is I think it would be justified.
> >
> > There is no need to use the async machinery for fast system calls.
>
> Having read Shriram's other mail, I feel the need to emphasise the
> qualification "fast".
>
> "Fast" means "cannot ever, even in error conditions, take a
> significant amount of time".  In particular anything that waits for
> incoming network traffic is not "fast".
>
> But AFAICT by looking at the code we are talking only about these
> calls:
>   rtnl_qdisc_plug_buffer
>   rtnl_qdisc_plug_release_one
>   rtnl_qdisc_add
> Surely these always complete immediately.
>

Yes. They boil down to a netlink syscall that simply communicates with
qdisc manipulation routines in the kernel. They have nothing to do with
waiting for network traffic.

Shriram

[-- Attachment #1.2: Type: text/html, Size: 1978 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05 17:41       ` Ian Jackson
@ 2014-06-05 18:14         ` Shriram Rajagopalan
  2014-06-05 18:26           ` Ian Jackson
  2014-06-06  5:38           ` Hongyang Yang
  0 siblings, 2 replies; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-05 18:14 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lai Jiangshan, Wen Congyang, Andrew Cooper, Jiang Yunhong,
	Dong Eddie, xen-devel, Ian Campbell, Yang Hongyang,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --]

On Jun 5, 2014 11:11 PM, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote:
>
> Shriram Rajagopalan writes ("Re: [PATCH v10] remus drbd: Implement remus
drbd replicated disk"):
> > On Wed, Jun 4, 2014 at 8:39 PM, Yang Hongyang <yanghy@cn.fujitsu.com>
wrote:
> >     +    if (ackwait) {
> >     +        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
> >     +        ackwait = 0;
> >     +    }
> ...
> > Please get rid of the async execution just to execute a sys
> > call.
>
> Are you sure ?  Does this syscall not await network traffic ?
>

It does. But the design is such that the disk and memory checkpoints are
simultaneously transmitted. So by the time this call is made, the ack is
already in the system.
-- this is the common case. Covers about 90% of the calls (since disk
traffic is pretty low compared to memory checkpoint).

> What if the network is broken ?  Might it not then delay indefinitely ?

Nope.  I designed the relevant drbd code such that the ioctl wait times out
(configurable) in worst case, returning an error. The time out is generally
about 300ms. This code path is exercised only during failures.

So, a one-time error condition and few slow checkpoints out of an
indefinite number of checkpoints don't warrant a fork per ioctl call (which
usually returns immediately).

>
> > Not to mention a fork & exec per sys call,
>
> In fact there is no exec, only a fork.
>
> Ian.
>

[-- Attachment #1.2: Type: text/html, Size: 1865 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05 18:14         ` Shriram Rajagopalan
@ 2014-06-05 18:26           ` Ian Jackson
  2014-06-06 11:23             ` Ian Jackson
  2014-06-06  5:38           ` Hongyang Yang
  1 sibling, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-06-05 18:26 UTC (permalink / raw)
  To: rshriram
  Cc: Lai Jiangshan, Wen Congyang, Andrew Cooper, Jiang Yunhong,
	Dong Eddie, xen-devel, Ian Campbell, Yang Hongyang,
	Roger Pau Monné

Shriram Rajagopalan writes ("Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk"):
> It does. But the design is such that the disk and memory checkpoints are
> simultaneously transmitted. So by the time this call is made, the ack is
> already in the system.

One packet might get lost while the other gets through.

Risking locking up the whole of the process is unfortunately not
acceptable.

> -- this is the common case. Covers about 90% of the calls (since disk traffic
> is pretty low compared to memory checkpoint).
> 
> > What if the network is broken ?  Might it not then delay indefinitely ?
> 
> Nope.  I designed the relevant drbd code such that the ioctl wait times out
> (configurable) in worst case, returning an error. The time out is generally
> about 300ms. This code path is exercised only during failures.

If you think this ioctl will, when there is no error, complete
immediately, can we have a non-blocking versiion, and fall back to the
fork trick ?

Or better still, is there something we could poll() on to find out
when the ioctl will definitely complete ?

> So, a one-time error condition and few slow checkpoints out of an indefinite
> number of checkpoints don't warrant a fork per ioctl call (which usually
> returns immediately).

libxl might be handling a large number of domains.  If libxl blocks, a
lot of everything else might stall.  (Including checkpoints of other
domains, if you care about that.)

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support
  2014-06-05 16:18   ` Ian Jackson
@ 2014-06-06  1:48     ` Hongyang Yang
  2014-06-06  6:45       ` Shriram Rajagopalan
  2014-06-06 11:04       ` Ian Jackson
  0 siblings, 2 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  1:48 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

On 06/06/2014 12:18 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v10 2/5] remus: add libnl3 dependency for network buffering support"):
>> Libnl3 is required for controlling Remus network buffering.
>> This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
>> Also provide ability to configure tools without libnl3 support, that
>> is without network buffering support.
>
> This patch looks broadly good to me.  I have some very minor comments
> about the details.
>
>> when there's no network buffering support,libxl__netbuffer_enabled()
>> returns 0, otherwise returns 1.
>
> The commit message should explicitly state that callers will be
> introduced in the rest of the series.
>
>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>
> For a patch which changes configure.ac, it would be helpful to add a
> reminder (for the commiter) to rerun autogen.sh.  This should ideally
> appear just before the first Signed-off-by.  The committer should
> delete the note, and rerun autogen.sh, as they apply the patch.

Thanks, will add the comment.

>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> ...
>> +# Check for libnl3 >=3.2.8. If present enable remus network buffering.
>> +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
>> +    [libnl3_lib="y"], [libnl3_lib="n"])
>> +
>> +AS_IF([test "x$libnl3_lib" = "xn" ], [
>> +    AC_MSG_WARN([Disabling support for Remus network buffering.
>> +    Please install libnl3 libraries, command line tools and devel
>> +    headers - version 3.2.8 or higher])
>> +    AC_SUBST(remus_netbuf, [n])
>> +    ],[
>> +    AC_SUBST(LIBNL3_LIBS)
>> +    AC_SUBST(LIBNL3_CFLAGS)
>
> It might be better to put these AC_SUBSTs into the main body of
> configure.ac ?  Like this:
>
>     diff --git a/tools/configure.ac b/tools/configure.ac
>     index 38d2d05..ee36707 100644
>     --- a/tools/configure.ac
>     +++ b/tools/configure.ac
>     @@ -257,10 +257,11 @@ AS_IF([test "x$libnl3_lib" = "xn" ], [
>          headers - version 3.2.8 or higher])
>          AC_SUBST(remus_netbuf, [n])
>          ],[
>     -    AC_SUBST(LIBNL3_LIBS)
>     -    AC_SUBST(LIBNL3_CFLAGS)
>          AC_SUBST(remus_netbuf, [y])
>      ])
>
>     +AC_SUBST(LIBNL3_LIBS)
>     +AC_SUBST(LIBNL3_CFLAGS)
>     +
>      AC_OUTPUT()

yes, i'll try that, if we do these, then the following check of 
CONFIG_REMUS_NETBUF in libxl Makefile will no longer need:


  LIBXL_LIBS =
  LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) 
$(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif

  CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
  CFLAGS_LIBXL += $(CFLAGS_libxenguest)
  CFLAGS_LIBXL += $(CFLAGS_libxenstore)
  CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
  CFLAGS_LIBXL += -Wshadow


>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 3/5] remus: introduce remus device
  2014-06-05 17:06   ` Ian Jackson
@ 2014-06-06  1:54     ` Hongyang Yang
  2014-06-09  2:08     ` Hongyang Yang
  1 sibling, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  1:54 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

On 06/06/2014 01:06 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v10 3/5] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provide the following APIs for libxl:
>
> Thanks.
>
>>    >libxl__remus_device_setup
>>      setup remus devices, like attach qdisc, enable disk buffering, etc
>>    >libxl__remus_device_teardown
>>      teardown devices
>>    >libxl__remus_device_postsuspend
>>    >libxl__remus_device_preresume
>>    >libxl__remus_device_commit
>>      above three are for checkpoint.
>
> I started reviewing this patch by reading the commit message and the
> changes to libxl_internal.h.
>
> As far as I can tell what's going on, it mostly looks plausible.  But
> the new parts of libxl_internal.h are missing important information
> about the new interfaces.
>
> I'd like the documentation in libxl_internal.h to be sufficient to
> read, understand, and check the code on one side of those interfaces,
> without having to go and read the code on the other side.

Thanks, I will improve the doc in the next version.

>
> I'll go into more detail about this below.
>
> Because of this, I haven't read the actual implementation code yet.
>
>> through remus device layer, the remus execution flow will be like
>> this:
>>    xl remus -> remus device setup
>>                  |-> remus checkpoint(postsuspend, commit, preresume)
>>                        ...
>>                         |-> remus device teardown,failover or abort
>
> This diagram could usefully be transferred into a comment in the code,
> probably in libxl_internal.h.
>
>> the remus device layer provide an interface
>>    libxl__remus_device_ops
>> which a remus device must implement.the whole remus structure:
>>                              |remus|
>>                                 |
>>                          |remus device|
>>                                 |
>>                  |nic| |drbd disks| |qemu disks| ...
>> a device(nic, drbd disks, qemu disks, etc) must implement
>> libxl__remus_device_ops to support remus.
>
> Again, this diagram too.
>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 2b46121..20601b2 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
> ...
>> +/*----- remus device related state structure -----*/
>>
>> +typedef enum libxl__remus_device_kind {
>> +    LIBXL__REMUS_DEVICE_NIC,
>> +    LIBXL__REMUS_DEVICE_DISK,
>> +} libxl__remus_device_kind;
>> +
>> +typedef struct libxl__remus_state libxl__remus_state;
>> +typedef struct libxl__remus_device libxl__remus_device;
>> +typedef struct libxl__remus_device_state libxl__remus_device_state;
>> +typedef struct libxl__remus_device_ops libxl__remus_device_ops;
>
> All fine.
>
>> +struct libxl__remus_device_ops {
>
> Who produces and consumes this ?  I _think_ from your diagram above
> that this is produced by a device type and consumed by the main remus
> code.  Is that right ?  I think this should be documented.
>
>> +    /*
>> +     * init device ops private data, etc. must implenment
>> +     */
>
> What does "must implement" mean ?  Do you mean that some of these
> function pointers can be 0 ?  If so, you should say this explicitly
> somewhere and say what 0 means.  (No-op?)
>
>> +    int (*init)(libxl__remus_device_ops *self,
>> +                libxl__remus_state *rs);
>> +    /*
>> +     * free device ops private data, etc. must implenment
>> +     */
>> +    void (*destroy)(libxl__remus_device_ops *self);
>> +    /* device ops's private data */
>> +    void *data;
>
> In libxl we often embed structs inside other structs, rather than
> chaining them with void*s.  Is tehre some reason why that's not a good
> idea here ?
>
> Also, I assume this should be "const void*", since I think this can
> only refer to static data ?
>
>> +    /*
>> +     * checkpoint callbacks, async ops. may not implemented
>> +     */
>
> If these are async ops, what happens when they complete ?
>
> I see a libxl__remus_device_callback typedef below, and a callback
> field in libxl__remus_device.  Is that it ?
>
> If so this should be written down.
>
>> +    /*
>> +     * check whether device ops match the device, async op. must implement
>> +     */
>> +    void (*match)(libxl__remus_device_ops *self,
>> +                  libxl__remus_device *dev);
>
> I don't understand the purpose of this, but perhaps this is because I
> don't understand the lifecycle of a libxl__remus_device and what
> fields in it are for which bits of code.
>
>> +    /*
>> +     * setup the remus device, async op. must implement
>> +     */
>> +    void (*setup)(libxl__remus_device *dev);
>> +
>> +    /*
>> +     * teardown the remus device, async op. must implement
>> +     */
>> +    void (*teardown)(libxl__remus_device *dev);
>
> When we say "setup" and "teardown" we refer to the actual device, not
> merely the libxl data structures, which are setup and torn down with
> "init" and "destroy" ?  This could be clearer.
>
>> +struct libxl__remus_device_state {
>> +    libxl__ao *ao;
>> +    libxl__egc *egc;
>> +
>> +    /* devices that have been setuped */
>> +    libxl__remus_device **dev;
>> +
>> +    int num_nics;
>> +    int num_disks;
>> +
>> +    /* for counting devices that have been handled */
>> +    int num_devices;
>> +    /* for counting devices that matched and setuped */
>> +    int num_setuped;
>> +};
>
> We need to know who owns which fields in this structure.  Or who is
> supposed to set them.  Also, I'm not sure what this structure is for.
> It appears only to be in libxl__remus_state.  What is the reason for
> it being separated out ?
>
>> +struct libxl__remus_device {
>
> Again, we need to know who owns/sets which fields in this structure.
> (If the struct is shared between different layers of code, it is
> normally easiest to do this by reordering the fields into groups
> according to their ownership.)
>
> If the whole struct is owned by the same set of code, then we need to
> know which set of code that is.  Perhaps by specifying a pattern on
> the function name (libxl__remus_device_*?)
>
>> +    int devid;
>> +    /* libxl__device_* which this remus device related to */
>> +    const void *backend_dev;
>> +    libxl__remus_device_kind kind;
>> +    int ops_index;
>
> What is ops_index ?
>
>> +    libxl__remus_device_ops *ops;
>
> I think these ops structs are vtables so should be const.
>
>> +    /* for calling scripts */
>> +    libxl__async_exec_state aes;
>
> Conversely, I'm not sure that particular comments add anything.
> libxl__async_exec_state is always for executing scripts.
>
>> +    /* for async func calls */
>> +    libxl__ev_child child;
>
> Is this an ownership comment ?  If so are you sure that the ownership
> isn't "this is owned by device-specific ops methods" ?  (It is obvious
> that only an /asynchronous/ device-specific ops method would be able
> to make use of it.)
>
>> +typedef void libxl__remus_callback(libxl__egc *,
>> +                                   libxl__remus_state *, int rc);
>> +
>> +struct libxl__remus_state {
>> +    libxl__ao *ao;
>> +    uint32_t domid;
>> +    libxl__remus_callback *callback;
>
> You should say that these must be set by the caller.  (Looking at the
> API I assume that's the case.)
>
>> +    /* private */
>> +    int saved_rc;
>> +    /* context containing device related stuff */
>> +    libxl__remus_device_state dev_state;
>> +
>> +    libxl__ev_time timeout; /* used for checkpoint */
>> +};
>> +
>> +_hidden void libxl__remus_device_setup(libxl__egc *egc,
>> +                                       libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_teardown(libxl__egc *egc,
>> +                                          libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_postsuspend(libxl__egc *egc,
>> +                                             libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_preresume(libxl__egc *egc,
>> +                                           libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_commit(libxl__egc *egc,
>> +                                        libxl__remus_state *rs);
>>   _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>
> These functions all call rs->callback() when done ?  If so there
> should be a comment to say so.
>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 52f1aa9..4278a6b 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "NOT_MATCH"),
>>       ], value_namespace = "")
>
> It is good that you introduce a new error code for your new error
> case.  But I think it needs to have a better name.  What does it
> mean ?
>
> In fact, I grepped the whole of this patch for NOT_MATCH and this
> new error status is checked for somewhere but never generated!
>
> If it forms a part of the new internal API then that should be
> documented.
>
>> diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
>> index 745e2ac..36bae04 100755
>> --- a/tools/libxl/libxl_save_msgs_gen.pl
>> +++ b/tools/libxl/libxl_save_msgs_gen.pl
>> @@ -24,7 +24,7 @@ our @msgs = (
>>                                                   'unsigned long', 'done',
>>                                                   'unsigned long', 'total'] ],
>>       [  3, 'scxA',   "suspend", [] ],
>> -    [  4, 'scxW',   "postcopy", [] ],
>> +    [  4, 'scxA',   "postcopy", [] ],
>>       [  5, 'scxA',   "checkpoint", [] ],
>>       [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
>>                                                 unsigned enable)] ],
>
> I think this change (and its consequential changes to the handwritten
> parts) should be split out into a "no functional change" pre-patch.
>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05 16:50   ` Shriram Rajagopalan
  2014-06-05 17:37     ` Ian Jackson
@ 2014-06-06  1:59     ` Hongyang Yang
  1 sibling, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  1:59 UTC (permalink / raw)
  To: rshriram
  Cc: Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monné,
	Lai Jiangshan

On 06/06/2014 12:50 AM, Shriram Rajagopalan wrote:
> On Wed, Jun 4, 2014 at 8:34 PM, Yang Hongyang <yanghy@cn.fujitsu.com
> <mailto:yanghy@cn.fujitsu.com>> wrote:
>
>     1.Add two members in libxl_domain_remus_info:
>          netbuf: whether netbuf is enabled
>          netbufscript: the path of the script which will be run to setup
>             and tear down the guest's interface.
>     2.introduces remus-netbuf-setup hotplug script responsible for
>        setting up and tearing down the necessary infrastructure required for
>        network output buffering in Remus.  This script is intended to be invoked
>        by libxl for each guest interface, when starting or stopping Remus.
>
>        Apart from returning success/failure indication via the usual hotplug
>        entries in xenstore, this script also writes to xenstore, the name of
>        the IFB device to be used to control the vif's network output.
>
>        The script relies on libnl3 command line utilities to perform various

...snip...

>     +
>     +static void nic_postsuspend(libxl__remus_device *dev)
>     +{
>     +    netbuf_epoch_op(dev, tc_buffer_start);
>     +}
>     +
>     +static void nic_commit(libxl__remus_device *dev)
>     +{
>     +    netbuf_epoch_op(dev, tc_buffer_release);
>     +}
>     +
>
>
> The async execution for each netlink call is an overkill.  These rtnl calls complete
> in a matter of few microseconds utmost. On the other hand, this code structure,
> fork/execs a new process for every checkpoint just to execute a single library call
> (netbuf_epoch_op), which in turn issues just a syscall.
>
> Correct me if I am wrong. I am assuming that the libxl__ev_child_fork eventually
> leads to a fork() and exec() call.
>
> Per remus checkpoint
>   2 ops for netbuf, 2 for disk.
>   1 fork & exec per op for a total of 4 forks per checkpoint. (based on this
> patch and the drbd patch)
>
>   At 25 checkpoints per second, you are looking at roughly a 100 fork/execs per
> second.
>
>

Hi Shriram, thanks for review, actually the remus checkpoint do fork, and exit 
soon, assume the syscalls are quick.

>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05 17:56         ` Shriram Rajagopalan
@ 2014-06-06  2:08           ` Hongyang Yang
  0 siblings, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  2:08 UTC (permalink / raw)
  To: rshriram, Ian Jackson
  Cc: Lai Jiangshan, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Dong Eddie, xen-devel, Ian Campbell,
	Roger Pau Monné

On 06/06/2014 01:56 AM, Shriram Rajagopalan wrote:
>
> On Jun 5, 2014 11:14 PM, "Ian Jackson" <Ian.Jackson@eu.citrix.com
> <mailto:Ian.Jackson@eu.citrix.com>> wrote:
>  >
>  > Ian Jackson writes ("Re: [PATCH v10 4/5] remus: implement remus network
> buffering for nic devices"):
>  > > Shriram Rajagopalan writes ("Re: [PATCH v10 4/5] remus: implement remus
> network buffering for nic devices"):
>  > > ...
>  > > > The async execution for each netlink call is an overkill.  These
>  > > > rtnl calls complete in a matter of few microseconds utmost. On the
>  > > > other hand, this code structure, fork/execs a new process for every
>  > > > checkpoint just to execute a single library call (netbuf_epoch_op),
>  > > > which in turn issues just a syscall.
>  > >
>  > > I haven't read the code to check whether this criticism is accurate,
>  > > but if it is I think it would be justified.
>  > >
>  > > There is no need to use the async machinery for fast system calls.
>  >
>  > Having read Shriram's other mail, I feel the need to emphasise the
>  > qualification "fast".
>  >
>  > "Fast" means "cannot ever, even in error conditions, take a
>  > significant amount of time".  In particular anything that waits for
>  > incoming network traffic is not "fast".
>  >
>  > But AFAICT by looking at the code we are talking only about these
>  > calls:
>  >   rtnl_qdisc_plug_buffer
>  >   rtnl_qdisc_plug_release_one
>  >   rtnl_qdisc_add
>  > Surely these always complete immediately.
>  >
>
> Yes. They boil down to a netlink syscall that simply communicates with qdisc
> manipulation routines in the kernel. They have nothing to do with waiting for
> network traffic.

The question is whether these checkpoints need to be async ops. for netbuffer 
and drbd calls, may not necessary since they are quick. but we do not sure other 
remus device type's ops(which we do not implemented currently) are quick 
enough.So for the interface, make it an async op is a good choice for now?

>
> Shriram
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 0/5] Remus netbuffer: Network buffering support
  2014-06-05 10:47 ` [PATCH v10 0/5] Remus netbuffer: Network buffering support George Dunlap
@ 2014-06-06  2:17   ` Hongyang Yang
  0 siblings, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  2:17 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang, Yunhong, Ian Jackson, xen-devel, Dong, Eddie,
	Shriram Rajagopalan, Lai Jiangshan, Roger Pau Monné

On 06/05/2014 06:47 PM, George Dunlap wrote:
> On Thu, Jun 5, 2014 at 2:34 AM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:
>> This patch series adds support for network buffering in the Remus
>> codebase in libxl.
>
> Hongyang,
>
> This is a resend of the series you sent May 21, right?
>
> When you resend a patch series, you can add RESEND to the title (e.g.,
> [PATCH v10 RESEND 0/5]) to highlight the fact that it hasn't received
> any attention.

Hi George, thanks for the comment, it's a Rebased patch, and yes it's
the same patch series as the series sent May 21, will pay attention on
the subject in the following series.

>
>   -George
>
>>
>> This is a rebased version of v10 series.The first patch was applied,
>> but can not found on current master, if it's not necessary, please
>> let me know.
>>
>> the code is also hosted on github:
>>
>> url: https://github.com/laijs/xen
>> branch: remus-0605
>>
>> Changes V10:
>>    Restructured the whole patch series.
>>    Introduce the remus device abstract layer.
>>    Make remus checkpoint asynchronous.
>>
>> Changes in V9:
>>    Use async exec script api to exec scripts.
>>
>> Changes in V8:
>>    Applied some comments(by IanJ).
>>    Merge some struct definitions to it's implementation.
>>    (2/3/5 in V7 => 3 in V8)
>>
>> Changes in V7:
>>    Applied missing comments(by IanJ).
>>    Applied Shriram comments.
>>
>>    merge netbufering tangled setup/teardown code into one patch.
>>    (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
>>
>> Changes in V6:
>>    Applied Ian Jackson's comments of V5 series.
>>    the [PATCH 2/4 V5] is split by small functionalities.
>>
>>    [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
>>
>> Changes in V5:
>>
>> Merge hotplug script patch (2/5) and hotplug script setup/teardown
>> patch (3/5) into a single patch.
>>
>> Changes in V4:
>>
>> [1/5] Remove check for libnl command line utils in autoconf checks
>>
>> [2/5] minor nits
>>
>> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
>>
>> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
>>
>> [5/5] minor nits
>>
>> Changes in V3:
>> [1/5] Fix redundant checks in configure scripts
>>        (based on Ian Campbell's suggestions)
>>
>> [2/5] Introduce locking in the script, during IFB setup.
>>        Add xenstore paths used by netbuf scripts
>>        to xenstore-paths.markdown
>>
>> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
>>        following IanJ's feedback.  However, the invocations are still
>>        sequential.
>>
>> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
>>        commmand.
>>
>> And minor nits throughout the series based on feedback from
>> the last version
>>
>> Changes in V2:
>> [1/5] Configure script will automatically enable/disable network
>>        buffer support depending on the availability of the appropriate
>>        libnl3 version. [If libnl3 is unavailable, a warning message will be
>>        printed to let the user know that the feature has been disabled.]
>>
>>        use macros from pkg.m4 instead of pkg-config commands
>>        removed redundant checks for libnl3 libraries.
>>
>> [3,4/5] - Minor nits.
>>
>> Version 1:
>>
>> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
>>        to libxl Makefile.
>>
>> [2/5] External script to setup/teardown network buffering using libnl3's
>>        CLI. This script will be invoked by libxl before starting Remus.
>>        The script's main job is to bring up an IFB device with plug qdisc
>>        attached to it.  It then re-routes egress traffic from the guest's
>>        vif to the IFB device.
>>
>> [3/5] Libxl code to invoke the external setup script, followed by netlink
>>        related setup to obtain a handle on the output buffers attached
>>        to each vif.
>>
>> [4/5] Libxl interaction with network buffer module in the kernel via
>>        libnl3 API.
>>
>> [5/5] xl cmdline switch to explicitly enable network buffering when
>>        starting remus.
>>
>>
>>    Few things to note(by shriram):
>>
>>      a) Based on previous email discussions, the setup/teardown task has
>>      been moved to a hotplug style shell script which can be customized as
>>      desired, instead of implementing it as C code inside libxl.
>>
>>      b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
>>     (Linux).  So I have made network buffering support an optional feature
>>     so that it can be disabled if desired.
>>
>>     c) NetBSD does not have libnl3. So I have put the setup script under
>>     tools/hotplug/Linux folder.
>>
>> thanks
>>
>> Shriram Rajagopalan (1):
>>    libxl: network buffering cmdline switch
>>
>> Yang Hongyang (4):
>>    libxl: introduce asynchronous execution API
>>    remus: add libnl3 dependency for network buffering support
>>    remus: introduce remus device
>>    remus: implement remus network buffering for nic devices
>>
>>   README                                 |   4 +
>>   config/Tools.mk.in                     |   4 +
>>   docs/man/xl.conf.pod.5                 |   6 +
>>   docs/man/xl.pod.1                      |  11 +-
>>   docs/misc/xenstore-paths.markdown      |   4 +
>>   tools/configure.ac                     |  15 +
>>   tools/hotplug/Linux/Makefile           |   1 +
>>   tools/hotplug/Linux/remus-netbuf-setup | 183 +++++++++++
>>   tools/libxl/Makefile                   |  15 +
>>   tools/libxl/libxl.c                    |  52 +++-
>>   tools/libxl/libxl.h                    |  13 +
>>   tools/libxl/libxl_aoutils.c            |  89 ++++++
>>   tools/libxl/libxl_device.c             |  78 ++---
>>   tools/libxl/libxl_dom.c                | 132 +++++++-
>>   tools/libxl/libxl_internal.h           | 151 ++++++++-
>>   tools/libxl/libxl_netbuffer.c          | 550 +++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_nonetbuffer.c        |  98 ++++++
>>   tools/libxl/libxl_remus_device.c       | 323 +++++++++++++++++++
>>   tools/libxl/libxl_save_msgs_gen.pl     |   2 +-
>>   tools/libxl/libxl_types.idl            |   3 +
>>   tools/libxl/xl.c                       |   4 +
>>   tools/libxl/xl.h                       |   1 +
>>   tools/libxl/xl_cmdimpl.c               |  28 +-
>>   tools/libxl/xl_cmdtable.c              |   3 +
>>   tools/remus/README                     |   6 +
>>   25 files changed, 1694 insertions(+), 82 deletions(-)
>>   create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>>   create mode 100644 tools/libxl/libxl_netbuffer.c
>>   create mode 100644 tools/libxl/libxl_nonetbuffer.c
>>   create mode 100644 tools/libxl/libxl_remus_device.c
>>
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05 16:25     ` Shriram Rajagopalan
  2014-06-05 17:41       ` Ian Jackson
@ 2014-06-06  2:21       ` Hongyang Yang
  1 sibling, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  2:21 UTC (permalink / raw)
  To: rshriram
  Cc: Lai Jiangshan, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, xen-devel, Andrew Cooper, Roger Pau Monné,
	Ian Campbell

On 06/06/2014 12:25 AM, Shriram Rajagopalan wrote:
> On Wed, Jun 4, 2014 at 8:39 PM, Yang Hongyang <yanghy@cn.fujitsu.com
> <mailto:yanghy@cn.fujitsu.com>> wrote:
>
>     Implement remus-drbd-replicated-checkpointing-disk based on
>     generic remus devices framework.
>
>     Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com
>     <mailto:laijs@cn.fujitsu.com>>
>     Signed-off-by: Wen Congyang <wency@cn.fujitsu.com <mailto:wency@cn.fujitsu.com>>
>     Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com
>     <mailto:yanghy@cn.fujitsu.com>>
>     ---
>       tools/hotplug/Linux/Makefile         |   1 +
>       tools/hotplug/Linux/block-drbd-probe |  84 ++++++++++
>       tools/libxl/Makefile                 |   2 +-
>       tools/libxl/libxl_internal.h         |   1 +
>       tools/libxl/libxl_remus_device.c     |  23 ++-
>       tools/libxl/libxl_remus_disk_drbd.c  | 290 +++++++++++++++++++++++++++++++++++
>       6 files changed, 394 insertions(+), 7 deletions(-)
>       create mode 100755 tools/hotplug/Linux/block-drbd-probe
>       create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
>
>     diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
>     index 13e1f5f..5dd8599 100644
>     --- a/tools/hotplug/Linux/Makefile
>     +++ b/tools/hotplug/Linux/Makefile
>     @@ -23,6 +23,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
>       XEN_SCRIPTS += external-device-migrate
>       XEN_SCRIPTS += vscsi
>       XEN_SCRIPTS += block-iscsi
>     +XEN_SCRIPTS += block-drbd-probe
>       XEN_SCRIPTS += $(XEN_SCRIPTS-y)
>
>       XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
>     diff --git a/tools/hotplug/Linux/block-drbd-probe
>     b/tools/hotplug/Linux/block-drbd-probe
>     new file mode 100755
>     index 0000000..163ad04
>     --- /dev/null
>     +++ b/tools/hotplug/Linux/block-drbd-probe
>     @@ -0,0 +1,84 @@
>     +#! /bin/bash
>     +#
>     +# Copyright (C) 2014 FUJITSU LIMITED
>     +#
>     +# This library is free software; you can redistribute it and/or
>     +# modify it under the terms of version 2.1 of the GNU Lesser General Public
>     +# License as published by the Free Software Foundation.
>     +#
>     +# This library is distributed in the hope that it will be useful,
>     +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>     +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>     +# Lesser General Public License for more details.
>     +#
>     +# You should have received a copy of the GNU Lesser General Public
>     +# License along with this library; if not, write to the Free Software
>     +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>     +#
>     +# Usage:
>     +#     block-drbd-probe devicename
>     +#
>     +# Return value:
>     +#     0: the device is drbd device
>     +#     1: the device is not drbd device
>     +#     2: unkown error
>     +#     3: the drbd device does not use protocol D
>     +#     4: the drbd device is not ready
>     +
>     +drbd_res=
>     +
>     +function get_res_name()
>     +{
>     +    local drbd_dev=$1
>     +    local drbd_dev_list=($(drbdadm sh-dev all))
>     +    local drbd_res_list=($(drbdadm sh-resource all))
>     +    local temp_drbd_dev temp_drbd_res
>     +    local found=0
>     +
>     +    for temp_drbd_dev in ${drbd_dev_list[@]}; do
>     +        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
>     +            found=1
>     +            break
>     +        fi
>     +    done
>     +
>     +    if [[ $found -eq 0 ]]; then
>     +        return 1
>     +    fi
>     +
>     +    for temp_drbd_res in ${drbd_res_list[@]}; do
>     +        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
>     +        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
>     +            drbd_res="$temp_drbd_res"
>     +            return 0
>     +        fi
>     +    done
>     +
>     +    # OOPS
>     +    return 2
>     +}
>     +
>     +get_res_name $1
>     +if [[ $? -ne 0 ]]; then
>     +    exit $?
>     +fi
>     +
>     +# check protocol
>     +drbdsetup $1 show | grep -q "protocol D;"
>     +if [[ $? -ne 0 ]]; then
>     +    exit 3
>     +fi
>     +
>     +# check connect status
>     +state=$(drbdadm cstate "$drbd_res")
>     +if [[ "$state" != "Connected" ]]; then
>     +    exit 4
>     +fi
>     +
>     +# check role
>     +role=$(drbdadm role "$drbd_res")
>     +if [[ "$role" != "Primary/Secondary" ]]; then
>     +    exit 4
>     +fi
>     +
>     +exit 0
>     diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
>     index 7a722a8..6f4d9b4 100644
>     --- a/tools/libxl/Makefile
>     +++ b/tools/libxl/Makefile
>     @@ -56,7 +56,7 @@ else
>       LIBXL_OBJS-y += libxl_nonetbuffer.o
>       endif
>
>     -LIBXL_OBJS-y += libxl_remus_device.o
>     +LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
>
>       LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
>       LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
>     diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>     index f221f97..47a4ab9 100644
>     --- a/tools/libxl/libxl_internal.h
>     +++ b/tools/libxl/libxl_internal.h
>     @@ -2519,6 +2519,7 @@ struct libxl__remus_device_state {
>
>           libxl_device_nic *nics;
>           int num_nics;
>     +    libxl_device_disk *disks;
>           int num_disks;
>
>           /* for counting devices that have been handled */
>     diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
>     index 5f07266..040441a 100644
>     --- a/tools/libxl/libxl_remus_device.c
>     +++ b/tools/libxl/libxl_remus_device.c
>     @@ -19,8 +19,10 @@
>       #include "libxl_internal.h"
>
>       extern libxl__remus_device_ops remus_device_nic;
>     +extern libxl__remus_device_ops remus_device_drbd_disk;
>       static libxl__remus_device_ops *dev_ops[] = {
>           &remus_device_nic,
>     +    &remus_device_drbd_disk,
>       };
>
>       static void device_common_cb(libxl__egc *egc,
>     @@ -194,6 +196,13 @@ static void device_teardown_cb(libxl__egc *egc,
>               rds->nics = NULL;
>               rds->num_nics = 0;
>
>     +        /* clean disk */
>     +        for (i = 0; i < rds->num_disks; i++)
>     +            libxl_device_disk_dispose(&rds->disks[i]);
>     +        free(rds->disks);
>     +        rds->disks = NULL;
>     +        rds->num_disks = 0;
>     +
>               /* clean device ops */
>               for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>                   ops = dev_ops[i];
>     @@ -269,15 +278,15 @@ void libxl__remus_device_setup(libxl__egc *egc,
>     libxl__remus_state *rs)
>           rds->num_nics = 0;
>           rds->num_disks = 0;
>
>     -    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
>     -
>           if (rs->netbufscript) {
>               rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
>           }
>     +    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
>
>     -    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
>     +    if (rds->num_nics == 0 && rds->num_disks == 0)
>     +        goto out;
>
>     -    /* TBD: CALL libxl__remus_device_init to init remus devices */
>     +    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
>
>           if (rs->netbufscript && rds->nics) {
>               for (i = 0; i < rds->num_nics; i++) {
>     @@ -286,8 +295,10 @@ void libxl__remus_device_setup(libxl__egc *egc,
>     libxl__remus_state *rs)
>               }
>           }
>
>     -    if (rds->num_nics == 0 && rds->num_disks == 0)
>     -        goto out;
>     +    for (i = 0; i < rds->num_disks; i++) {
>     +        libxl__remus_device_init(egc, rds,
>     +                                 LIBXL__REMUS_DEVICE_DISK, &rds->disks[i]);
>     +    }
>
>           return;
>
>     diff --git a/tools/libxl/libxl_remus_disk_drbd.c
>     b/tools/libxl/libxl_remus_disk_drbd.c
>     new file mode 100644
>     index 0000000..f35a406
>     --- /dev/null
>     +++ b/tools/libxl/libxl_remus_disk_drbd.c
>     @@ -0,0 +1,290 @@
>     +/*
>     + * Copyright (C) 2014 FUJITSU LIMITED
>     + * Author Lai Jiangshan <laijs@cn.fujitsu.com <mailto:laijs@cn.fujitsu.com>>
>     + *
>     + * This program is free software; you can redistribute it and/or modify
>     + * it under the terms of the GNU Lesser General Public License as published
>     + * by the Free Software Foundation; version 2.1 only. with the special
>     + * exception on linking described in file LICENSE.
>     + *
>     + * This program is distributed in the hope that it will be useful,
>     + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>     + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>     + * GNU Lesser General Public License for more details.
>     + */
>     +
>     +#include "libxl_osdeps.h" /* must come before any other headers */
>     +
>     +#include "libxl_internal.h"
>     +
>     +/*** drbd implementation ***/
>     +const int DRBD_SEND_CHECKPOINT = 20;
>     +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
>     +
>     +typedef struct libxl__remus_drbd_disk {
>     +    libxl__remus_device remus_dev;
>     +    int ctl_fd;
>     +    int ackwait;
>     +    const char *path;
>     +} libxl__remus_drbd_disk;
>     +
>     +typedef struct libxl__remus_drbd_state {
>     +    libxl__ao *ao;
>     +    char *drbd_probe_script;
>     +} libxl__remus_drbd_state;
>     +
>     +static void drbd_async_call(libxl__remus_device *dev,
>     +                            void func(libxl__remus_device *),
>     +                            libxl__ev_child_callback callback)
>     +{
>     +    int pid = -1;
>     +    STATE_AO_GC(dev->rds->ao);
>     +
>     +    /* Fork and call */
>     +    pid = libxl__ev_child_fork(gc, &dev->child, callback);
>     +    if (pid == -1) {
>     +        LOG(ERROR, "unable to fork");
>     +        goto out;
>     +    }
>     +
>     +    if (!pid) {
>     +        /* child */
>     +        func(dev);
>     +        /* notreached */
>     +        abort();
>     +    }
>     +
>     +    return;
>     +
>     +out:
>     +    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
>     +}
>     +
>     +static void chekpoint_async_call_done(libxl__egc *egc,
>     +                                      libxl__ev_child *child,
>     +                                      pid_t pid, int status)
>     +{
>     +    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
>     +    libxl__remus_drbd_disk *rdd = dev->data;
>     +    STATE_AO_GC(dev->rds->ao);
>     +
>     +    if (WIFEXITED(status)) {
>     +        rdd->ackwait = WEXITSTATUS(status);
>     +        dev->callback(egc, dev, 0);
>     +    } else {
>     +        dev->callback(egc, dev, ERROR_FAIL);
>     +    }
>     +}
>     +
>     +static void drbd_postsuspend_async(libxl__remus_device *dev)
>     +{
>     +    libxl__remus_drbd_disk *rdd = dev->data;
>     +    int ackwait = rdd->ackwait;
>     +
>     +    if (!ackwait) {
>     +        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
>     +            ackwait = 1;
>     +    }
>     +
>     +    _exit(ackwait);
>     +}
>     +
>     +static void drbd_postsuspend(libxl__remus_device *dev)
>     +{
>     +    drbd_async_call(dev, drbd_postsuspend_async, chekpoint_async_call_done);
>     +}
>     +
>     +static void drbd_preresume_async(libxl__remus_device *dev)
>     +{
>     +    libxl__remus_drbd_disk *rdd = dev->data;
>     +    int ackwait = rdd->ackwait;
>     +
>     +    if (ackwait) {
>     +        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
>     +        ackwait = 0;
>     +    }
>     +
>     +    _exit(ackwait);
>     +}
>     +
>     +static void drbd_preresume(libxl__remus_device *dev)
>     +{
>     +    drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
>     +}
>     +
>
>
>
> Please get rid of the async execution just to execute a sys call. Not to mention
> a fork & exec per sys call, per checkpoint would just add more overhead than what
> can be gleaned through async execution.
>
> But the setup and teardown can use the async execution drbd_async_call as they
> involve
> invoking the scripts.
>
> Apart from that, the rest of the code looks fine structurally.

Hi Shriram, again thanks for review, per checkpoint, we only do fork and
exit soon, no script execution.

>
>
> shriram

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 0/5] Remus netbuffer: Network buffering support
  2014-06-05 16:12 ` Ian Jackson
@ 2014-06-06  2:26   ` Hongyang Yang
  0 siblings, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  2:26 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

On 06/06/2014 12:12 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v10 0/5] Remus netbuffer: Network buffering support"):
>> the code is also hosted on github:
>>
>> url: https://github.com/laijs/xen
>> branch: remus-0605
>
> I see that this branch contains a commit "remus drbd: Implement remus
> drbd replicated disk" which is not in the 5-patch series you posted.
> I guess that that commit was added later - I can see it in my inbox.
>
> If you find the need to add a new patch to a series and update the
> already-published git branch, you should send a heads-up to the
> recipients of your 0/N email.  When you do this it would be better not
> to reuse the existing published git ref name, but to make a new one,
> unless you do the followup very quickly after the initial email.
>
> This is useful because the committers and reviewers rely on the two
> versions of the series (available via git, and via email) being
> identical.
>
> Also, if you send the new patch as N+1/N, eg, in this case
>    Subject: [PATCH v10 6/5] remus drbd: Implement remus drbd replicated disk
> then it becomes a bit clear that it's supposed to tie into the series.
>
> I will review your drbd patch as part of the series.

Hi Ian, thanks for the review. will combine the drbd and netbuffer patchs
into one patchset in the next version.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05 18:14         ` Shriram Rajagopalan
  2014-06-05 18:26           ` Ian Jackson
@ 2014-06-06  5:38           ` Hongyang Yang
  2014-06-06  7:12             ` Shriram Rajagopalan
  2014-06-06 11:18             ` Ian Jackson
  1 sibling, 2 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  5:38 UTC (permalink / raw)
  To: rshriram, Ian Jackson
  Cc: Lai Jiangshan, Wen Congyang, Andrew Cooper, Jiang Yunhong,
	Dong Eddie, xen-devel, Ian Campbell, Roger Pau Monné

On 06/06/2014 02:14 AM, Shriram Rajagopalan wrote:
>
> On Jun 5, 2014 11:11 PM, "Ian Jackson" <Ian.Jackson@eu.citrix.com
> <mailto:Ian.Jackson@eu.citrix.com>> wrote:
>  >
>  > Shriram Rajagopalan writes ("Re: [PATCH v10] remus drbd: Implement remus drbd
> replicated disk"):
>  > > On Wed, Jun 4, 2014 at 8:39 PM, Yang Hongyang <yanghy@cn.fujitsu.com
> <mailto:yanghy@cn.fujitsu.com>> wrote:
>  > >     +    if (ackwait) {
>  > >     +        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
>  > >     +        ackwait = 0;
>  > >     +    }
>  > ...
>  > > Please get rid of the async execution just to execute a sys
>  > > call.
>  >
>  > Are you sure ?  Does this syscall not await network traffic ?
>  >
>
> It does. But the design is such that the disk and memory checkpoints are
> simultaneously transmitted. So by the time this call is made, the ack is already
> in the system.
> -- this is the common case. Covers about 90% of the calls (since disk traffic is
> pretty low compared to memory checkpoint).
>
>  > What if the network is broken ?  Might it not then delay indefinitely ?
>
> Nope.  I designed the relevant drbd code such that the ioctl wait times out
> (configurable) in worst case, returning an error. The time out is generally
> about 300ms. This code path is exercised only during failures.
>
> So, a one-time error condition and few slow checkpoints out of an indefinite
> number of checkpoints don't warrant a fork per ioctl call (which usually returns
> immediately).

Can we use the following approach:
   The interface for per checkpoint remains async. but in the implementation,
because the syscalls are fast enough, we can simply make it a sync call and
call the async callback when done?

>
>  >
>  > > Not to mention a fork & exec per sys call,
>  >
>  > In fact there is no exec, only a fork.
>  >
>  > Ian.
>  >
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 5/5] libxl: network buffering cmdline switch
  2014-06-05 17:30   ` [PATCH v10 5/5] libxl: network buffering cmdline switch Ian Jackson
@ 2014-06-06  6:34     ` Hongyang Yang
  2014-06-06  7:26       ` Shriram Rajagopalan
  2014-06-06 11:13       ` Ian Jackson
  0 siblings, 2 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-06  6:34 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

On 06/06/2014 01:30 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v10 5/5] libxl: network buffering cmdline switch"):
>> Command line switch to 'xl remus' command, to enable network buffering.
>> Pass on this flag to libxl so that it can act accordingly.
>
> You provide a global option to control the script, but no per-domain
> config option.  Why ?
>
> A similar question arises about the network buffering boolean.
>
> Wouldn't it be better if these were options on the devices, in the
> domain configuration ?

Do you mean we make "-n -N" options into domain configuration?
I think these options are only related to remus and may not be used that
often because we provided a default network script which would be suitable
for most cases. these options are sort of second choices for users, may not
worth to be set in the domain configuration.

>
> Feel free to tell me I'm wrong and it is better this way, if that's
> true - just explain it.
>
>> +     * TODO: Split-Brain check.
>
> What are your plans for the split brain check ?

It's hard to do the split brain check under current implementation because
there's only one remus connection between the two domain. We may need to add
a heardbeat module to do this.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support
  2014-06-06  1:48     ` Hongyang Yang
@ 2014-06-06  6:45       ` Shriram Rajagopalan
  2014-06-06 10:07         ` Ian Campbell
  2014-06-06 11:04       ` Ian Jackson
  1 sibling, 1 reply; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-06  6:45 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monné,
	Lai Jiangshan


[-- Attachment #1.1: Type: text/plain, Size: 3602 bytes --]

On Thu, Jun 5, 2014 at 8:48 PM, Hongyang Yang <yanghy@cn.fujitsu.com> wrote:

> On 06/06/2014 12:18 AM, Ian Jackson wrote:
>
>> Yang Hongyang writes ("[PATCH v10 2/5] remus: add libnl3 dependency for
>> network buffering support"):
>>
>>> Libnl3 is required for controlling Remus network buffering.
>>> This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
>>> Also provide ability to configure tools without libnl3 support, that
>>> is without network buffering support.
>>>
>>
>> This patch looks broadly good to me.  I have some very minor comments
>> about the details.
>>
>>  when there's no network buffering support,libxl__netbuffer_enabled()
>>> returns 0, otherwise returns 1.
>>>
>>
>> The commit message should explicitly state that callers will be
>> introduced in the rest of the series.
>>
>>  Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>>>
>>
>> For a patch which changes configure.ac, it would be helpful to add a
>> reminder (for the commiter) to rerun autogen.sh.  This should ideally
>> appear just before the first Signed-off-by.  The committer should
>> delete the note, and rerun autogen.sh, as they apply the patch.
>>
>
> Thanks, will add the comment.
>
>
>
>>  Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
>>>
>> ...
>>
>>> +# Check for libnl3 >=3.2.8. If present enable remus network buffering.
>>> +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
>>> +    [libnl3_lib="y"], [libnl3_lib="n"])
>>> +
>>> +AS_IF([test "x$libnl3_lib" = "xn" ], [
>>> +    AC_MSG_WARN([Disabling support for Remus network buffering.
>>> +    Please install libnl3 libraries, command line tools and devel
>>> +    headers - version 3.2.8 or higher])
>>> +    AC_SUBST(remus_netbuf, [n])
>>> +    ],[
>>> +    AC_SUBST(LIBNL3_LIBS)
>>> +    AC_SUBST(LIBNL3_CFLAGS)
>>>
>>
>> It might be better to put these AC_SUBSTs into the main body of
>> configure.ac ?  Like this:
>>
>>     diff --git a/tools/configure.ac b/tools/configure.ac
>>     index 38d2d05..ee36707 100644
>>     --- a/tools/configure.ac
>>     +++ b/tools/configure.ac
>>     @@ -257,10 +257,11 @@ AS_IF([test "x$libnl3_lib" = "xn" ], [
>>          headers - version 3.2.8 or higher])
>>          AC_SUBST(remus_netbuf, [n])
>>          ],[
>>     -    AC_SUBST(LIBNL3_LIBS)
>>     -    AC_SUBST(LIBNL3_CFLAGS)
>>          AC_SUBST(remus_netbuf, [y])
>>      ])
>>
>>     +AC_SUBST(LIBNL3_LIBS)
>>     +AC_SUBST(LIBNL3_CFLAGS)
>>     +
>>      AC_OUTPUT()
>>
>
> yes, i'll try that, if we do these, then the following check of
> CONFIG_REMUS_NETBUF in libxl Makefile will no longer need:
>
>
>
>  LIBXL_LIBS =
>  LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest)
> $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS)
> $(LIBUUID_LIBS)
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +LIBXL_LIBS += $(LIBNL3_LIBS)
> +endif
>
>  CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
>  CFLAGS_LIBXL += $(CFLAGS_libxenguest)
>  CFLAGS_LIBXL += $(CFLAGS_libxenstore)
>  CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
> +endif
>  CFLAGS_LIBXL += -Wshadow
>
>
>
IIRC, starting with GCC 4.6, that check is not needed anyway, because
gcc will prevent unused shared libraries from being linked into libxl.
However,
with earlier GCC versions, these libraries will get linked unless the
-as-needed flag
is supplied. Which was the reason I had that check in place.


>
>> Thanks,
>> Ian.
>> .
>>
>>
> --
> Thanks,
> Yang.
>
>

[-- Attachment #1.2: Type: text/html, Size: 6138 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-06  5:38           ` Hongyang Yang
@ 2014-06-06  7:12             ` Shriram Rajagopalan
  2014-06-06 11:18             ` Ian Jackson
  1 sibling, 0 replies; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-06  7:12 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: Lai Jiangshan, Wen Congyang, Andrew Cooper, Jiang Yunhong,
	Ian Jackson, xen-devel, Dong Eddie, Ian Campbell,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2255 bytes --]

On Fri, Jun 6, 2014 at 12:38 AM, Hongyang Yang <yanghy@cn.fujitsu.com>
wrote:

> On 06/06/2014 02:14 AM, Shriram Rajagopalan wrote:
>
>>
>> On Jun 5, 2014 11:11 PM, "Ian Jackson" <Ian.Jackson@eu.citrix.com
>> <mailto:Ian.Jackson@eu.citrix.com>> wrote:
>>  >
>>  > Shriram Rajagopalan writes ("Re: [PATCH v10] remus drbd: Implement
>> remus drbd
>> replicated disk"):
>>  > > On Wed, Jun 4, 2014 at 8:39 PM, Yang Hongyang <yanghy@cn.fujitsu.com
>> <mailto:yanghy@cn.fujitsu.com>> wrote:
>>  > >     +    if (ackwait) {
>>  > >     +        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
>>  > >     +        ackwait = 0;
>>  > >     +    }
>>  > ...
>>  > > Please get rid of the async execution just to execute a sys
>>  > > call.
>>  >
>>  > Are you sure ?  Does this syscall not await network traffic ?
>>  >
>>
>> It does. But the design is such that the disk and memory checkpoints are
>> simultaneously transmitted. So by the time this call is made, the ack is
>> already
>> in the system.
>> -- this is the common case. Covers about 90% of the calls (since disk
>> traffic is
>> pretty low compared to memory checkpoint).
>>
>>  > What if the network is broken ?  Might it not then delay indefinitely ?
>>
>> Nope.  I designed the relevant drbd code such that the ioctl wait times
>> out
>> (configurable) in worst case, returning an error. The time out is
>> generally
>> about 300ms. This code path is exercised only during failures.
>>
>> So, a one-time error condition and few slow checkpoints out of an
>> indefinite
>> number of checkpoints don't warrant a fork per ioctl call (which usually
>> returns
>> immediately).
>>
>
> Can we use the following approach:
>   The interface for per checkpoint remains async. but in the
> implementation,
> because the syscalls are fast enough, we can simply make it a sync call and
> call the async callback when done?
>
>

Yes, that should work for the fast path - which is the common case.

I know that there is no script execution.
My problem is with forking itself. Quick is a very relative term: fork may
appear quick
under simple test cases, but not under moderate to heavy loads (say there
are multiple
remus instances running on a host). The CPU overhead in Dom0 will no longer
be
sustainable.

[-- Attachment #1.2: Type: text/html, Size: 3523 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 5/5] libxl: network buffering cmdline switch
  2014-06-06  6:34     ` Hongyang Yang
@ 2014-06-06  7:26       ` Shriram Rajagopalan
  2014-06-06 11:13       ` Ian Jackson
  1 sibling, 0 replies; 44+ messages in thread
From: Shriram Rajagopalan @ 2014-06-06  7:26 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: Ian Campbell, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie,
	Roger Pau Monné,
	Lai Jiangshan


[-- Attachment #1.1: Type: text/plain, Size: 2392 bytes --]

On Fri, Jun 6, 2014 at 1:34 AM, Hongyang Yang <yanghy@cn.fujitsu.com> wrote:

> On 06/06/2014 01:30 AM, Ian Jackson wrote:
>
>> Yang Hongyang writes ("[PATCH v10 5/5] libxl: network buffering cmdline
>> switch"):
>>
>>> Command line switch to 'xl remus' command, to enable network buffering.
>>> Pass on this flag to libxl so that it can act accordingly.
>>>
>>
>> You provide a global option to control the script, but no per-domain
>> config option.  Why ?
>>
>>
There is a per-domain option to provide custom netbuffer scripts. "-N"


>  A similar question arises about the network buffering boolean.
>>
>> Wouldn't it be better if these were options on the devices, in the
>> domain configuration ?
>>
>
> Do you mean we make "-n -N" options into domain configuration?
> I think these options are only related to remus and may not be used that
> often because we provided a default network script which would be suitable
> for most cases. these options are sort of second choices for users, may not
> worth to be set in the domain configuration.
>
>
>
>> Feel free to tell me I'm wrong and it is better this way, if that's
>> true - just explain it.
>>
>>  +     * TODO: Split-Brain check.
>>>
>>
>> What are your plans for the split brain check ?
>>
>
>
For the moment, a DRBD backed VM will have much less chance of having a
split-brain
than non-drbd cases. DRBD has built in split-brain resolution and is
capable of interfacing
with a wide variety of external subsystems like corosync etc.

Split-Brain is a complicated issue. Heartbeats will work on a LAN and avoid
spurious timeouts
based failover.  However, on a larger network where there are several
elements in between the
primary and backup, you can't really know whether the primary died or the
link died. You will then
need quorum, reachability based failover, etc. All of this is policy
driven. So its best left to the user.

My plan was to add another option that would allow the user to provide her
own
scripts for checking the liveliness of the remote host. Depending on the
script's
return value, remus checkpoints may continue/terminate & promote
backup->primary, etc.

It's hard to do the split brain check under current implementation because
> there's only one remus connection between the two domain. We may need to
> add
> a heardbeat module to do this.
>
>
>> Thanks,
>> Ian.
>> .
>>
>>
> --
> Thanks,
> Yang.
>
>

[-- Attachment #1.2: Type: text/html, Size: 4746 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support
  2014-06-06  6:45       ` Shriram Rajagopalan
@ 2014-06-06 10:07         ` Ian Campbell
  0 siblings, 0 replies; 44+ messages in thread
From: Ian Campbell @ 2014-06-06 10:07 UTC (permalink / raw)
  To: rshriram
  Cc: Lai Jiangshan, Wen Congyang, Stefano Stabellini, Andrew Cooper,
	Jiang Yunhong, Ian Jackson, xen-devel, Dong Eddie, Hongyang Yang,
	Roger Pau Monné

On Fri, 2014-06-06 at 01:45 -0500, Shriram Rajagopalan wrote:

> IIRC, starting with GCC 4.6, that check is not needed anyway, because
> gcc will prevent unused shared libraries from being linked into libxl.
> However,
> with earlier GCC versions, these libraries will get linked unless the
> -as-needed flag
> is supplied. Which was the reason I had that check in place.

passing -lfoo is still an error if libfoo.so isn't even present though,
right? That's one of the situations which may arise since this is an
optional feature.

Anyway, I think we still support gcc < 4.6 so we need to keep this
conditional stuff around.

Ian.

> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support
  2014-06-06  1:48     ` Hongyang Yang
  2014-06-06  6:45       ` Shriram Rajagopalan
@ 2014-06-06 11:04       ` Ian Jackson
  1 sibling, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-06 11:04 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Hongyang Yang writes ("Re: [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support"):
> yes, i'll try that, if we do these, then the following check of 
> CONFIG_REMUS_NETBUF in libxl Makefile will no longer need:
> 
> 
>   LIBXL_LIBS =
>   LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) 
> $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +LIBXL_LIBS += $(LIBNL3_LIBS)
> +endif

I thought about this but I wasn't sure that that was right.  Might
there might turn out to be other reasons why remus netbuffering would
want to be disabled ?  If not then you are right.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 5/5] libxl: network buffering cmdline switch
  2014-06-06  6:34     ` Hongyang Yang
  2014-06-06  7:26       ` Shriram Rajagopalan
@ 2014-06-06 11:13       ` Ian Jackson
  1 sibling, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-06 11:13 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Hongyang Yang writes ("Re: [PATCH v10 5/5] libxl: network buffering cmdline switch"):
> On 06/06/2014 01:30 AM, Ian Jackson wrote:
> > Wouldn't it be better if these were options on the devices, in the
> > domain configuration ?
> 
> Do you mean we make "-n -N" options into domain configuration?

Well, the equivalent information, yes.  Not exactly as those options.

> I think these options are only related to remus and may not be used that
> often because we provided a default network script which would be suitable
> for most cases. these options are sort of second choices for users, may not
> worth to be set in the domain configuration.

There is no problem with having extra options in the domain
configuration which most users do not set.  Users who don't want to
use remus can just ignore them.

The real question here is this: in a single system, is the network
buffering configuration more likely to, for a single domain (a) vary
between different devices or (b) vary between subsequent invocations
of remus ?

If we put the netbuffer config in the arguments to the remus
invocation, (a) becomes impossible.  If we put it in the domain device
configuration, (b) becomes difficult (although perhaps not quite
impossible).

> > Feel free to tell me I'm wrong and it is better this way, if that's
> > true - just explain it.
> >
> >> +     * TODO: Split-Brain check.
> >
> > What are your plans for the split brain check ?
> 
> It's hard to do the split brain check under current implementation because
> there's only one remus connection between the two domain. We may need to add
> a heardbeat module to do this.

Yes.  I guess my question is: do you plan to add hooks/calls/something
to libxl to allow the construction of a coherent system ?  That is, do
you intend that the libxl remus API will acquire the necessary
interfaces to connect to an external heartbeat module ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-06  5:38           ` Hongyang Yang
  2014-06-06  7:12             ` Shriram Rajagopalan
@ 2014-06-06 11:18             ` Ian Jackson
  1 sibling, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-06 11:18 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: Lai Jiangshan, Wen Congyang, Andrew Cooper, Jiang Yunhong,
	Dong Eddie, xen-devel, rshriram, Ian Campbell,
	Roger Pau Monné

Hongyang Yang writes ("Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk"):
> Can we use the following approach:
>    The interface for per checkpoint remains async. but in the implementation,
> because the syscalls are fast enough, we can simply make it a sync call and
> call the async callback when done?

Yes, that is a perfectly legitimate thing to do.  Many of libxl's
internal asynchronous interfaces are sometimes synchronous in that
way.

Callers of such an interface in libxl must typically be aware that the
callback may happen reentrantly.  This should be mentioned as part of
the API specification.  For an example see the comment just above
libxl__bootloader_run.

I think this is probably the right thing to do for the network
buffering syscalls.


For the drbd checkpoint wait system call, we have sadly still to come
up with a good solution.  Shriram tells us that this call might, in an
error case, block for a significant fraction of a second.

Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk
  2014-06-05 18:26           ` Ian Jackson
@ 2014-06-06 11:23             ` Ian Jackson
  0 siblings, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2014-06-06 11:23 UTC (permalink / raw)
  To: rshriram, Andrew Cooper, Lai Jiangshan, Roger Pau Monné,
	Jiang Yunhong, Wen Congyang, Dong Eddie, Yang Hongyang,
	xen-devel, Ian Campbell

Ian Jackson writes ("Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk"):
> Shriram Rajagopalan writes ("Re: [PATCH v10] remus drbd: Implement remus drbd replicated disk"):
> > Nope.  I designed the relevant drbd code such that the ioctl wait times out
> > (configurable) in worst case, returning an error. The time out is generally
> > about 300ms. This code path is exercised only during failures.
> 
> If you think this ioctl will, when there is no error, complete
> immediately, can we have a non-blocking [version], and fall back to the
> fork trick ?
> 
> Or better still, is there something we could poll() on to find out
> when the ioctl will definitely complete ?

Or, you say the timeout is configurable ?

If it's configurable per control fd, could we perhaps do this:

  1. set the timeout to zero
  2. make the ioctl DRBD_WAIT_CHECKPOINT_ACK
     if the ioctl says "timeout", fork and:
       3. set the timeout to something sane
       4. make the ioctl again
       5. in the parent, use the asynchronous child wait machinery

?

This depends on DRBD_WAIT_CHECKPOINT_ACK doing something sensible if
called again after having timed out.


I hesitate to suggest it, but if there is no better way, perhaps we
will have to make a special-purpose thread just for issuing this
ioctl.  That would be a pain.

So, Shriram, suggestions/opinions/etc. welcome.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 3/5] remus: introduce remus device
  2014-06-05 17:06   ` Ian Jackson
  2014-06-06  1:54     ` Hongyang Yang
@ 2014-06-09  2:08     ` Hongyang Yang
  1 sibling, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-06-09  2:08 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

On 06/06/2014 01:06 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v10 3/5] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provide the following APIs for libxl:
>
> Thanks.
>
>>    >libxl__remus_device_setup
>>      setup remus devices, like attach qdisc, enable disk buffering, etc
>>    >libxl__remus_device_teardown
>>      teardown devices
>>    >libxl__remus_device_postsuspend
>>    >libxl__remus_device_preresume
>>    >libxl__remus_device_commit
>>      above three are for checkpoint.
>
> I started reviewing this patch by reading the commit message and the
> changes to libxl_internal.h.
>
> As far as I can tell what's going on, it mostly looks plausible.  But
> the new parts of libxl_internal.h are missing important information
> about the new interfaces.
>
> I'd like the documentation in libxl_internal.h to be sufficient to
> read, understand, and check the code on one side of those interfaces,
> without having to go and read the code on the other side.
>
> I'll go into more detail about this below.
>
> Because of this, I haven't read the actual implementation code yet.
>
>> through remus device layer, the remus execution flow will be like
>> this:
>>    xl remus -> remus device setup
>>                  |-> remus checkpoint(postsuspend, commit, preresume)
>>                        ...
>>                         |-> remus device teardown,failover or abort
>
> This diagram could usefully be transferred into a comment in the code,
> probably in libxl_internal.h.
>
>> the remus device layer provide an interface
>>    libxl__remus_device_ops
>> which a remus device must implement.the whole remus structure:
>>                              |remus|
>>                                 |
>>                          |remus device|
>>                                 |
>>                  |nic| |drbd disks| |qemu disks| ...
>> a device(nic, drbd disks, qemu disks, etc) must implement
>> libxl__remus_device_ops to support remus.
>
> Again, this diagram too.
>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 2b46121..20601b2 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
> ...
>> +/*----- remus device related state structure -----*/
>>
>> +typedef enum libxl__remus_device_kind {
>> +    LIBXL__REMUS_DEVICE_NIC,
>> +    LIBXL__REMUS_DEVICE_DISK,
>> +} libxl__remus_device_kind;
>> +
>> +typedef struct libxl__remus_state libxl__remus_state;
>> +typedef struct libxl__remus_device libxl__remus_device;
>> +typedef struct libxl__remus_device_state libxl__remus_device_state;
>> +typedef struct libxl__remus_device_ops libxl__remus_device_ops;
>
> All fine.
>
>> +struct libxl__remus_device_ops {
>
> Who produces and consumes this ?  I _think_ from your diagram above
> that this is produced by a device type and consumed by the main remus
> code.  Is that right ?  I think this should be documented.
>
>> +    /*
>> +     * init device ops private data, etc. must implenment
>> +     */
>
> What does "must implement" mean ?  Do you mean that some of these
> function pointers can be 0 ?  If so, you should say this explicitly
> somewhere and say what 0 means.  (No-op?)

Only the APIs for checkpoints may be 0, mean the op may not be
implemented. Will document those.

>
>> +    int (*init)(libxl__remus_device_ops *self,
>> +                libxl__remus_state *rs);
>> +    /*
>> +     * free device ops private data, etc. must implenment
>> +     */
>> +    void (*destroy)(libxl__remus_device_ops *self);
>> +    /* device ops's private data */
>> +    void *data;
>
> In libxl we often embed structs inside other structs, rather than
> chaining them with void*s.  Is tehre some reason why that's not a good
> idea here ?

This is a device ops's private data, for different device types, the data
structs are different, so we can not specify a specific data structure
here.

>
> Also, I assume this should be "const void*", since I think this can
> only refer to static data ?

Yes, the data was init by the init() api above.

>
>> +    /*
>> +     * checkpoint callbacks, async ops. may not implemented
>> +     */
>
> If these are async ops, what happens when they complete ?
>
> I see a libxl__remus_device_callback typedef below, and a callback
> field in libxl__remus_device.  Is that it ?

yes

>
> If so this should be written down.
>
>> +    /*
>> +     * check whether device ops match the device, async op. must implement
>> +     */
>> +    void (*match)(libxl__remus_device_ops *self,
>> +                  libxl__remus_device *dev);
>
> I don't understand the purpose of this, but perhaps this is because I
> don't understand the lifecycle of a libxl__remus_device and what
> fields in it are for which bits of code.

This API determine whether the ops match the specific device. In the
implementation, we first init all device ops, for example, NIC ops,
DRBD ops ... Then we will find out the libxl devices, and match the
device with the ops, if the device is a drbd disk, then it will be
matched with DRBD ops, and the further ops(such as checkpoint ops etc.)
of this device will using DRBD ops. This API is mainly for disks,
because we must use an external script to determine whether a libxl_disk
is a DRBD disk.

>
>> +    /*
>> +     * setup the remus device, async op. must implement
>> +     */
>> +    void (*setup)(libxl__remus_device *dev);
>> +
>> +    /*
>> +     * teardown the remus device, async op. must implement
>> +     */
>> +    void (*teardown)(libxl__remus_device *dev);
>
> When we say "setup" and "teardown" we refer to the actual device, not
> merely the libxl data structures, which are setup and torn down with
> "init" and "destroy" ?  This could be clearer.
>
>> +struct libxl__remus_device_state {
>> +    libxl__ao *ao;
>> +    libxl__egc *egc;
>> +
>> +    /* devices that have been setuped */
>> +    libxl__remus_device **dev;
>> +
>> +    int num_nics;
>> +    int num_disks;
>> +
>> +    /* for counting devices that have been handled */
>> +    int num_devices;
>> +    /* for counting devices that matched and setuped */
>> +    int num_setuped;
>> +};
>
> We need to know who owns which fields in this structure.  Or who is
> supposed to set them.  Also, I'm not sure what this structure is for.
> It appears only to be in libxl__remus_state.  What is the reason for
> it being separated out ?
>
>> +struct libxl__remus_device {
>
> Again, we need to know who owns/sets which fields in this structure.
> (If the struct is shared between different layers of code, it is
> normally easiest to do this by reordering the fields into groups
> according to their ownership.)
>
> If the whole struct is owned by the same set of code, then we need to
> know which set of code that is.  Perhaps by specifying a pattern on
> the function name (libxl__remus_device_*?)
>
>> +    int devid;
>> +    /* libxl__device_* which this remus device related to */
>> +    const void *backend_dev;
>> +    libxl__remus_device_kind kind;
>> +    int ops_index;
>
> What is ops_index ?

This is for matching, we must go through all device ops until we find
a matched op for the device. The ops_index record which ops we are
matching.

>
>> +    libxl__remus_device_ops *ops;
>
> I think these ops structs are vtables so should be const.
>
>> +    /* for calling scripts */
>> +    libxl__async_exec_state aes;
>
> Conversely, I'm not sure that particular comments add anything.
> libxl__async_exec_state is always for executing scripts.
>
>> +    /* for async func calls */
>> +    libxl__ev_child child;
>
> Is this an ownership comment ?  If so are you sure that the ownership
> isn't "this is owned by device-specific ops methods" ?  (It is obvious
> that only an /asynchronous/ device-specific ops method would be able
> to make use of it.)
>
>> +typedef void libxl__remus_callback(libxl__egc *,
>> +                                   libxl__remus_state *, int rc);
>> +
>> +struct libxl__remus_state {
>> +    libxl__ao *ao;
>> +    uint32_t domid;
>> +    libxl__remus_callback *callback;
>
> You should say that these must be set by the caller.  (Looking at the
> API I assume that's the case.)
>
>> +    /* private */
>> +    int saved_rc;
>> +    /* context containing device related stuff */
>> +    libxl__remus_device_state dev_state;
>> +
>> +    libxl__ev_time timeout; /* used for checkpoint */
>> +};
>> +
>> +_hidden void libxl__remus_device_setup(libxl__egc *egc,
>> +                                       libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_teardown(libxl__egc *egc,
>> +                                          libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_postsuspend(libxl__egc *egc,
>> +                                             libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_preresume(libxl__egc *egc,
>> +                                           libxl__remus_state *rs);
>> +_hidden void libxl__remus_device_commit(libxl__egc *egc,
>> +                                        libxl__remus_state *rs);
>>   _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>
> These functions all call rs->callback() when done ?  If so there
> should be a comment to say so.
>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 52f1aa9..4278a6b 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "NOT_MATCH"),
>>       ], value_namespace = "")
>
> It is good that you introduce a new error code for your new error
> case.  But I think it needs to have a better name.  What does it
> mean ?

It's for the match API, if the device ops not matched with the device,
then match() will return the error code. If the match op encounted an
error, then we will return ERROR_FAIL.

>
> In fact, I grepped the whole of this patch for NOT_MATCH and this
> new error status is checked for somewhere but never generated!

We used the error code in device_match_cb().

>
> If it forms a part of the new internal API then that should be
> documented.
>
>> diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
>> index 745e2ac..36bae04 100755
>> --- a/tools/libxl/libxl_save_msgs_gen.pl
>> +++ b/tools/libxl/libxl_save_msgs_gen.pl
>> @@ -24,7 +24,7 @@ our @msgs = (
>>                                                   'unsigned long', 'done',
>>                                                   'unsigned long', 'total'] ],
>>       [  3, 'scxA',   "suspend", [] ],
>> -    [  4, 'scxW',   "postcopy", [] ],
>> +    [  4, 'scxA',   "postcopy", [] ],
>>       [  5, 'scxA',   "checkpoint", [] ],
>>       [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
>>                                                 unsigned enable)] ],
>
> I think this change (and its consequential changes to the handwritten
> parts) should be split out into a "no functional change" pre-patch.
>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-05 17:24   ` Ian Jackson
@ 2014-06-10  7:33     ` Hongyang Yang
  2014-07-09 23:15       ` Ian Jackson
  0 siblings, 1 reply; 44+ messages in thread
From: Hongyang Yang @ 2014-06-10  7:33 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Hi Ian,

   Sorry for the late reply, just notice this comment...

On 06/06/2014 01:24 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
>> 1.Add two members in libxl_domain_remus_info:
>
> Thanks for this patch.
>
> I'm deferring reviewing the parts of this inside libxl which use the
> new libxl device interface, until we have the API documentation
> comments which I discussed in my last email.  I hope that's OK.
>
> But there i
>
>>      netbuf: whether netbuf is enabled
>>      netbufscript: the path of the script which will be run to setup
>>         and tear down the guest's interface.
>> 2.introduces remus-netbuf-setup hotplug script responsible for
>>    setting up and tearing down the necessary infrastructure required for
>>    network output buffering in Remus.  This script is intended to be invoked
>>    by libxl for each guest interface, when starting or stopping Remus.
>>
>>    Apart from returning success/failure indication via the usual hotplug
>>    entries in xenstore, this script also writes to xenstore, the name of
>>    the IFB device to be used to control the vif's network output.
>>
>>    The script relies on libnl3 command line utilities to perform various
>>    setup/teardown functions. The script is confined to Linux platforms only
>>    since NetBSD does not seem to have libnl3.
>>
>>    The following steps are taken during init:
>>      a) establish a dedicated remus context containing libnl related
>>         state (netlink sockets, qdisc caches, etc.,)
>>
>>    The following steps are taken for each vif during setup:
>>      a) call the hotplug script to setup its network buffer
>>
>>      b) Obtain handles to plug qdiscs installed on the IFB devices
>>         chosen by the hotplug scripts.
>>
>>    And during teardown, the netlink resources are released, followed by
>>    invocation of hotplug scripts to remove the ifb devices.
>> 3.implement the remus device interface. setup, teardown, etc.
>>
>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>   docs/misc/xenstore-paths.markdown      |   4 +
>>   tools/hotplug/Linux/Makefile           |   1 +
>>   tools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
>>   tools/libxl/libxl.c                    |  18 ++
>>   tools/libxl/libxl.h                    |  13 +
>>   tools/libxl/libxl_internal.h           |   3 +
>>   tools/libxl/libxl_netbuffer.c          | 519 +++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_nonetbuffer.c        |  67 +++++
>>   tools/libxl/libxl_remus_device.c       |  22 +-
>>   tools/libxl/libxl_types.idl            |   2 +
>>   10 files changed, 831 insertions(+), 1 deletion(-)
>>   create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>>
>
>
>
>> diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
>> index 70ab7f4..039eaea 100644
>> --- a/docs/misc/xenstore-paths.markdown
>> +++ b/docs/misc/xenstore-paths.markdown
>> @@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
>>
>>   The device model version for a domain.
>>
>> +#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
>> +
>> +ifb device used by Remus to buffer network output from the associated vif.
>> +
>
> Thanks for updating the doc.  Your changes to the hotplug Makefile
> look good too.
>
>> diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
>> new file mode 100644
>> index 0000000..aed2583
>> --- /dev/null
>> +++ b/tools/hotplug/Linux/remus-netbuf-setup
>> @@ -0,0 +1,183 @@
>> +#!/bin/bash
>> +#============================================================================
>> +# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
>> +#
>> +# Script for attaching a network buffer to the specified vif (in any mode).
>> +# The hotplugging system will call this script when starting remus via libxl
>> +# API, libxl_domain_remus_start.
>
> Right.  Thanks for the comprehensive head comment.
>
>> +#============================================================================
> ...
>> +setup_ifb() {
>> +
>> +    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
>> +    do
>> +        local installed=`nl-qdisc-list -d $ifb`
>> +        [ -n "$installed" ] && continue
>> +        IFB="$ifb"
>> +        break
>> +    done
>
> As far as I can see this attempts to search for an ifb which is not in
> use.
>
> I see you claim a lock to ensure that you don't fail due to races with
> other copies of this script.
>
> But are there potentially other things (not Xen related, parhaps) in
> the system which might try to allocate an ifb using a similar
> approach ?  How do we deal with the potential race with them ?
>
> Also: I think you should:
>   - write the IFB name to xenstore _before_ starting to configure it
>   - in the loop I quote above, check in xenstore that the ifb is not
>     in use by another domain
>
> Otherwise there seems to be the following risk:
>   1. You pick ifbX using the loop above
>   2. You start to configure ifbX, eventually resulting in a
>      configuration which makes it not show up as free
>   3. Something bad happens and you fail, before writing the
>      ifb name to xenstore
>
> In this case, the ifb would be leaked.  (I see you do try to avoid
> this with xs_write_failed, but scripts can fail for other reasons.)

If the setup failed for any reason, we will call teardown in the remus
device framework, so the ifb won't be leaked.

>
>> +    do_or_die tc qdisc add dev "$vif" ingress
>
> I'm not qualified to review these tc manipulations.  I guess I'm going
> to trust that they're correct.
>
>> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
>> index 80947c3..db30a97 100644
>> --- a/tools/libxl/libxl.h
>> +++ b/tools/libxl/libxl.h
>> @@ -437,6 +437,19 @@
>>   #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
>>
>>   /*
>> + * LIBXL_HAVE_REMUS_NETBUF 1
>> + *
>> + * If this is defined, then the libxl_domain_remus_info structure will
>> + * have a boolean field (netbuf) and a string field (netbufscript).
>> + *
>> + * netbuf, if true, indicates that network buffering should be enabled.
>> + *
>> + * netbufscript, if set, indicates the path to the hotplug script to
>> + * setup or teardown network buffers.
>> + */
>> +#define LIBXL_HAVE_REMUS_NETBUF 1
>
> Good.
>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 4278a6b..50bf1ef 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -566,6 +566,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
>>       ("interval",     integer),
>>       ("blackhole",    bool),
>>       ("compression",  bool),
>> +    ("netbuf",       bool),
>> +    ("netbufscript", string),
>
> I think netbuf should be a defbool, not a bool.  Indeed, perhaps this
> is true of the other options too.  Is there some reason it shouldn't
> default to enabled ?

The netbuffering is enabled by default, this option is used to disable
the netbuffering support.

>
> You should mention in your commit message that this is going to be
> plumbed into xl and the documentation in the next patch.
>
> Regarding the other remus options here (and perhaps changing their
> types), I think it would be OK to break API compatibility, since the
> previous versions of remus exposed via xl have not been suitable for
> deployment.  Do you agree ?

Agree. Since I've done v11 patch series, I will look into it in the next
series. For rapid iteration, I will send v11 patch series for review in
a short period.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-06-10  7:33     ` Hongyang Yang
@ 2014-07-09 23:15       ` Ian Jackson
  2014-07-10  1:38         ` Hongyang Yang
  0 siblings, 1 reply; 44+ messages in thread
From: Ian Jackson @ 2014-07-09 23:15 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Hongyang Yang writes ("Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
>    Sorry for the late reply, just notice this comment...

And here I am being even later replying...

> On 06/06/2014 01:24 AM, Ian Jackson wrote:
> > As far as I can see this attempts to search for an ifb which is not in
> > use.
> >
> > I see you claim a lock to ensure that you don't fail due to races with
> > other copies of this script.
> >
> > But are there potentially other things (not Xen related, parhaps) in
> > the system which might try to allocate an ifb using a similar
> > approach ?  How do we deal with the potential race with them ?

Have I missed an answer to this question ?

> > Also: I think you should:
> >   - write the IFB name to xenstore _before_ starting to configure it
> >   - in the loop I quote above, check in xenstore that the ifb is not
> >     in use by another domain
> >
> > Otherwise there seems to be the following risk:
> >   1. You pick ifbX using the loop above
> >   2. You start to configure ifbX, eventually resulting in a
> >      configuration which makes it not show up as free
> >   3. Something bad happens and you fail, before writing the
> >      ifb name to xenstore
> >
> > In this case, the ifb would be leaked.  (I see you do try to avoid
> > this with xs_write_failed, but scripts can fail for other reasons.)
> 
> If the setup failed for any reason, we will call teardown in the remus
> device framework, so the ifb won't be leaked.

I'm afraid that you can't assume that your script will necessarily
execute the teardown.  Perhaps the script got killed by the OOM
killer, or something else horrible went wrong.

So you need to make sure that all the states the system goes through,
however briefly and no matter what cleanup will be attempted on
failure.

> >> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> >> index 4278a6b..50bf1ef 100644
> >> --- a/tools/libxl/libxl_types.idl
> >> +++ b/tools/libxl/libxl_types.idl
> >> @@ -566,6 +566,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
> >>       ("interval",     integer),
> >>       ("blackhole",    bool),
> >>       ("compression",  bool),
> >> +    ("netbuf",       bool),
> >> +    ("netbufscript", string),
> >
> > I think netbuf should be a defbool, not a bool.  Indeed, perhaps this
> > is true of the other options too.  Is there some reason it shouldn't
> > default to enabled ?
> 
> The netbuffering is enabled by default, this option is used to disable
> the netbuffering support.

Maybe I haven't followed the code here correctly, but I think that has
to be done with a defbool.  Where is the default set ?  I don't see it
in this patch.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices
  2014-07-09 23:15       ` Ian Jackson
@ 2014-07-10  1:38         ` Hongyang Yang
  0 siblings, 0 replies; 44+ messages in thread
From: Hongyang Yang @ 2014-07-10  1:38 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Hi Ian,

   Thank you for the reply, I have some comments below...

On 07/10/2014 07:15 AM, Ian Jackson wrote:
> Hongyang Yang writes ("Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices"):
>>     Sorry for the late reply, just notice this comment...
>
> And here I am being even later replying...
>
>> On 06/06/2014 01:24 AM, Ian Jackson wrote:
>>> As far as I can see this attempts to search for an ifb which is not in
>>> use.
>>>
>>> I see you claim a lock to ensure that you don't fail due to races with
>>> other copies of this script.
>>>
>>> But are there potentially other things (not Xen related, parhaps) in
>>> the system which might try to allocate an ifb using a similar
>>> approach ?  How do we deal with the potential race with them ?
>
> Have I missed an answer to this question ?
>
>>> Also: I think you should:
>>>    - write the IFB name to xenstore _before_ starting to configure it
>>>    - in the loop I quote above, check in xenstore that the ifb is not
>>>      in use by another domain

Sorry for mention this explicit,your comment here has already been addressed:
+#return 0 if the ifb is free
+check_ifb() {
+    local installed=`nl-qdisc-list -d $1`
+    [ -n "$installed" ] && return 1
+
+    for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+    do
+        [ $domid -eq 0 ] && continue
+        xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+        for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || 
true`
+        do
+            local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+            xenstore-exists $path || continue
+            local ifb=`xenstore-read "$path" 2>/dev/null || true`
+            [ "$ifb" = "$1" ] && return 1
+        done
+    done
+
+    return 0
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        check_ifb "$ifb" || continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    #not using xenstore_write that automatically exits on error
+    #because we need to cleanup
+    _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+    do_or_die ip link set dev "$IFB" up
+}

>>>
>>> Otherwise there seems to be the following risk:
>>>    1. You pick ifbX using the loop above
>>>    2. You start to configure ifbX, eventually resulting in a
>>>       configuration which makes it not show up as free
>>>    3. Something bad happens and you fail, before writing the
>>>       ifb name to xenstore
>>>
>>> In this case, the ifb would be leaked.  (I see you do try to avoid
>>> this with xs_write_failed, but scripts can fail for other reasons.)
>>
>> If the setup failed for any reason, we will call teardown in the remus
>> device framework, so the ifb won't be leaked.
>
> I'm afraid that you can't assume that your script will necessarily
> execute the teardown.  Perhaps the script got killed by the OOM
> killer, or something else horrible went wrong.
>
> So you need to make sure that all the states the system goes through,
> however briefly and no matter what cleanup will be attempted on
> failure.
>
>>>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>>>> index 4278a6b..50bf1ef 100644
>>>> --- a/tools/libxl/libxl_types.idl
>>>> +++ b/tools/libxl/libxl_types.idl
>>>> @@ -566,6 +566,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
>>>>        ("interval",     integer),
>>>>        ("blackhole",    bool),
>>>>        ("compression",  bool),
>>>> +    ("netbuf",       bool),
>>>> +    ("netbufscript", string),
>>>
>>> I think netbuf should be a defbool, not a bool.  Indeed, perhaps this
>>> is true of the other options too.  Is there some reason it shouldn't
>>> default to enabled ?
>>
>> The netbuffering is enabled by default, this option is used to disable
>> the netbuffering support.
>
> Maybe I haven't followed the code here correctly, but I think that has
> to be done with a defbool.  Where is the default set ?  I don't see it
> in this patch.

It was set in the main_remus function in netbuf command switch patch,
default is on, and will only be switched off when user passes an -n option,
so I think this does not need to be done with a defbool:
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 68df548..b324f34 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7175,8 +7175,9 @@ int main_remus(int argc, char **argv)
      r_info.interval = 200;
      r_info.blackhole = 0;
      r_info.compression = 1;
+    r_info.netbuf = 1;

-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
@@ -7186,6 +7187,12 @@ int main_remus(int argc, char **argv)
      case 'u':
          r_info.compression = 0;
          break;
+    case 'n':
+        r_info.netbuf = 0;
+        break;
+    case 'N':
>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2014-07-10  1:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-05  1:34 [PATCH v10 0/5] Remus netbuffer: Network buffering support Yang Hongyang
2014-06-05  1:34 ` [PATCH v10 1/5] libxl: introduce asynchronous execution API Yang Hongyang
2014-06-05 16:01   ` Ian Jackson
2014-06-05  1:34 ` [PATCH v10 2/5] remus: add libnl3 dependency for network buffering support Yang Hongyang
2014-06-05 16:18   ` Ian Jackson
2014-06-06  1:48     ` Hongyang Yang
2014-06-06  6:45       ` Shriram Rajagopalan
2014-06-06 10:07         ` Ian Campbell
2014-06-06 11:04       ` Ian Jackson
2014-06-05  1:34 ` [PATCH v10 3/5] remus: introduce remus device Yang Hongyang
2014-06-05 17:06   ` Ian Jackson
2014-06-06  1:54     ` Hongyang Yang
2014-06-09  2:08     ` Hongyang Yang
2014-06-05  1:34 ` [PATCH v10 4/5] remus: implement remus network buffering for nic devices Yang Hongyang
2014-06-05 16:50   ` Shriram Rajagopalan
2014-06-05 17:37     ` Ian Jackson
2014-06-05 17:44       ` Ian Jackson
2014-06-05 17:56         ` Shriram Rajagopalan
2014-06-06  2:08           ` Hongyang Yang
2014-06-06  1:59     ` Hongyang Yang
2014-06-05 17:24   ` Ian Jackson
2014-06-10  7:33     ` Hongyang Yang
2014-07-09 23:15       ` Ian Jackson
2014-07-10  1:38         ` Hongyang Yang
2014-06-05  1:34 ` [PATCH v10 5/5] libxl: network buffering cmdline switch Yang Hongyang
2014-06-05  1:39   ` [PATCH v10] remus drbd: Implement remus drbd replicated disk Yang Hongyang
2014-06-05 16:25     ` Shriram Rajagopalan
2014-06-05 17:41       ` Ian Jackson
2014-06-05 18:14         ` Shriram Rajagopalan
2014-06-05 18:26           ` Ian Jackson
2014-06-06 11:23             ` Ian Jackson
2014-06-06  5:38           ` Hongyang Yang
2014-06-06  7:12             ` Shriram Rajagopalan
2014-06-06 11:18             ` Ian Jackson
2014-06-06  2:21       ` Hongyang Yang
2014-06-05 17:30   ` [PATCH v10 5/5] libxl: network buffering cmdline switch Ian Jackson
2014-06-06  6:34     ` Hongyang Yang
2014-06-06  7:26       ` Shriram Rajagopalan
2014-06-06 11:13       ` Ian Jackson
2014-06-05 10:47 ` [PATCH v10 0/5] Remus netbuffer: Network buffering support George Dunlap
2014-06-06  2:17   ` Hongyang Yang
2014-06-05 16:01 ` Ian Jackson
2014-06-05 16:12 ` Ian Jackson
2014-06-06  2:26   ` Hongyang Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.