All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk
@ 2014-09-26  6:13 Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant Yang Hongyang
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This patch series adds support for network buffering and drbd disk
in the Remus codebase in libxl.

the code is also hosted on github:
url: https://github.com/macrosheep/xen/tree/remus-v21

Changes in v21:
  Correct use of multidev (by IanJ)
  Use existing libxl__device_kind (by IanJ)
  unsafe->allow_unsafe (by IanJ)

Changes in v20:
  Rebased.

Changes in v19:
  Use defbool for cmdline switch.
  Restruct of subkind init and cleanup operation.
  Use libxl__device_kind instead of libxl__remus_device_kind
  Fix a layer violation issue pointed out by IanJ.
  Other minor fixes.
  Rebased to the latest staging tree.

Changes in v18:
  Merge match() and setup() api.
  Reuse libxl__multidev and libxl__ao_device.
  Commit messages and code comments improved. Thanks to Shriram.
  Rebased.

Changes in v17:
  Make remus device abstract layer more generic.
  Addressed Ian J's comments.

Changes in v16:
  Merge libxl__remus_state and libxl__remus_device_state.
  Pass the ops to device abstract layer instead of defined it in the layer.
  Optimized subkind ops APIs.
  Addressed Ian J's comments.
  Rebased.

Changes in v15:
  The first patch in v14 has been taken, so remove it from the patchset.
  Add a patch to Update maintained files of REMUS.
  Rebased.

Changes in v14:
  Addressed IanJ's comments.
  Rebased.

Changes in v13:
  Addressed Konrad's comments.
  Rebased.

Changes in v12:
  Add disk buffering cmdline switch.

Changes in v11:
  Addressed comments from Ian J and Shriram.
  Add drbd disk implement into this patch series.

Changes in V10:
  Restructured the whole patch series.
  Introduce the remus device abstract layer.
  Make remus checkpoint asynchronous.

Changes in V9:
  Use async exec script api to exec scripts.

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks,
Yang.

Ian Jackson (2):
  libxl: multidev: Clarify comments about which callbacks are meant
  libxl: multidev: Expose libxl__multidev_one_callback

Yang Hongyang (12):
  libxl: introduce libxl__multidev_prepare_with_aodev
  libxl: Extend libxl__ao_device with a libxl__ev_child member
  autoconf: add libnl3 dependency for Remus network buffering support
  libxl/remus: introduce an abstract Remus device layer
  libxl/remus: setup and control network output buffering
  libxl/remus: setup and control disk replication for DRBD backends
  xl/remus: change bool to defbool
  xl/remus: cmdline switch to explicitly enable unsafe configurations
  xl/remus: cmdline switches and config vars to control network
    buffering
  xl/remus: add a cmdline switch to disable disk replication
  libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
  MAINTAINERS: update maintained files of Remus

 MAINTAINERS                            |   7 +
 README                                 |   4 +
 config/Tools.mk.in                     |   4 +
 docs/README.remus                      |  16 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  30 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  16 +
 tools/hotplug/Linux/Makefile           |   2 +
 tools/hotplug/Linux/block-drbd-probe   |  87 ++++++
 tools/hotplug/Linux/remus-netbuf-setup | 230 +++++++++++++++
 tools/libxl/Makefile                   |  15 +
 tools/libxl/libxl.c                    |  74 ++++-
 tools/libxl/libxl.h                    |   6 +
 tools/libxl/libxl_device.c             |  22 +-
 tools/libxl/libxl_dom.c                | 170 ++++++++++-
 tools/libxl/libxl_internal.h           | 211 +++++++++++++-
 tools/libxl/libxl_netbuffer.c          | 517 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  54 ++++
 tools/libxl/libxl_remus_device.c       | 327 +++++++++++++++++++++
 tools/libxl/libxl_remus_disk_drbd.c    | 258 ++++++++++++++++
 tools/libxl/libxl_types.idl            |  10 +-
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  42 ++-
 tools/libxl/xl_cmdtable.c              |  11 +-
 26 files changed, 2076 insertions(+), 52 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus_device.c
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26 13:56   ` Wei Liu
  2014-09-26  6:13 ` [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback Yang Hongyang
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

From: Ian Jackson <ian.jackson@eu.citrix.com>

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_internal.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f61673c..20aca4b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2164,7 +2164,8 @@ struct libxl__ao_device {
  *       (or some other thing which will eventually call aodev->callback)
  * Finally, once
  *    libxl__multidev_prepared
- * which will result (perhaps reentrantly) in one call to callback().
+ * which will result (perhaps reentrantly) in one call to
+ * multidev->callback().
  */
 
 /* Starts preparing to add/remove a bunch of devices. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26 13:58   ` Wei Liu
  2014-09-26  6:13 ` [PATCH for-4.5 v21 03/14] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

From: Ian Jackson <ian.jackson@eu.citrix.com>

Now a caller who wants to be able to do other work when the aodev
completes can put their own callback into the aodev, and make the
multidev machinery aware that the particular aodev is complete (from
the point of view that multidev should have) whenever it likes.

No functional change in this patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_device.c   |  8 +++-----
 tools/libxl/libxl_internal.h | 13 ++++++++++++-
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 3425446..1f6514c 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -479,15 +479,13 @@ void libxl__multidev_begin(libxl__ao *ao, libxl__multidev *multidev)
     multidev->preparation = libxl__multidev_prepare(multidev);
 }
 
-static void multidev_one_callback(libxl__egc *egc, libxl__ao_device *aodev);
-
 libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
     STATE_AO_GC(multidev->ao);
     libxl__ao_device *aodev;
 
     GCNEW(aodev);
     aodev->multidev = multidev;
-    aodev->callback = multidev_one_callback;
+    aodev->callback = libxl__multidev_one_callback;
     libxl__prepare_ao_device(ao, aodev);
 
     if (multidev->used >= multidev->allocd) {
@@ -499,7 +497,7 @@ libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
     return aodev;
 }
 
-static void multidev_one_callback(libxl__egc *egc, libxl__ao_device *aodev)
+void libxl__multidev_one_callback(libxl__egc *egc, libxl__ao_device *aodev)
 {
     STATE_AO_GC(aodev->ao);
     libxl__multidev *multidev = aodev->multidev;
@@ -523,7 +521,7 @@ void libxl__multidev_prepared(libxl__egc *egc,
                               libxl__multidev *multidev, int rc)
 {
     multidev->preparation->rc = rc;
-    multidev_one_callback(egc, multidev->preparation);
+    libxl__multidev_one_callback(egc, multidev->preparation);
 }
 
 /******************************************************************************/
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 20aca4b..b19374e 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2161,7 +2161,8 @@ struct libxl__ao_device {
  * Then zero or more times
  *    libxl__multidev_prepare
  *    libxl__initiate_device_{remove/addition}
- *       (or some other thing which will eventually call aodev->callback)
+ *       (or some other thing which will eventually call
+ *        aodev->callback or libxl__multidev_one_callback)
  * Finally, once
  *    libxl__multidev_prepared
  * which will result (perhaps reentrantly) in one call to
@@ -2176,6 +2177,16 @@ _hidden void libxl__multidev_begin(libxl__ao *ao, libxl__multidev*);
  * had ->callback set.  The user should not mess with aodev->callback. */
 _hidden libxl__ao_device *libxl__multidev_prepare(libxl__multidev*);
 
+/* Indicates to multidev that this one device has been processed.
+ * Normally the multidev user does not need to touch this function, as
+ * multidev_prepare will name it in aodev->callback.  However, if you
+ * want to do something more complicated you can set aodev->callback
+ * yourself to something else, so long as you eventually call
+ * libxl__multidev_one_callback.
+ */
+_hidden void libxl__multidev_one_callback(libxl__egc *egc,
+                                          libxl__ao_device *aodev);
+
 /* Notifies the multidev machinery that we have now finished preparing
  * and initiating devices.  multidev->callback may then be called as
  * soon as there are no prepared but not completed operations
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 03/14] libxl: introduce libxl__multidev_prepare_with_aodev
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 04/14] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

libxl__multidev_prepare_with_aodev is similar to libxl__multidev_prepare,
but takes a libxl__ao_device as an extra argument.
libxl__multidev_prepare is now a wrapper around
libxl__multidev_prepare_with_aodev.

This new internal API will be used by the Remus device abstract layer
for handling various Remus devices.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_device.c   | 13 ++++++++++---
 tools/libxl/libxl_internal.h | 15 ++++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 1f6514c..4c49c4c 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -479,11 +479,10 @@ void libxl__multidev_begin(libxl__ao *ao, libxl__multidev *multidev)
     multidev->preparation = libxl__multidev_prepare(multidev);
 }
 
-libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
+void libxl__multidev_prepare_with_aodev(libxl__multidev *multidev,
+                                        libxl__ao_device *aodev) {
     STATE_AO_GC(multidev->ao);
-    libxl__ao_device *aodev;
 
-    GCNEW(aodev);
     aodev->multidev = multidev;
     aodev->callback = libxl__multidev_one_callback;
     libxl__prepare_ao_device(ao, aodev);
@@ -493,6 +492,14 @@ libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
         GCREALLOC_ARRAY(multidev->array, multidev->allocd);
     }
     multidev->array[multidev->used++] = aodev;
+}
+
+libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
+    STATE_AO_GC(multidev->ao);
+    libxl__ao_device *aodev;
+
+    GCNEW(aodev);
+    libxl__multidev_prepare_with_aodev(multidev, aodev);
 
     return aodev;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b19374e..a2dd7ca 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2172,9 +2172,18 @@ struct libxl__ao_device {
 /* Starts preparing to add/remove a bunch of devices. */
 _hidden void libxl__multidev_begin(libxl__ao *ao, libxl__multidev*);
 
-/* Prepares to add/remove one of many devices.  Returns a libxl__ao_device
- * which has had libxl__prepare_ao_device called, and which has also
- * had ->callback set.  The user should not mess with aodev->callback. */
+/* Prepares to add/remove one of many devices.
+ * Calls libxl__prepare_ao_device on libxl__ao_device argument provided and
+ * also sets the aodev->callback (to libxl__multidev_one_callback)
+ * The user should not mess with aodev->callback.
+ */
+_hidden void libxl__multidev_prepare_with_aodev(libxl__multidev*,
+                                                libxl__ao_device*);
+
+/* A wrapper function around libxl__multidev_prepare_with_aodev.
+ * Allocates a libxl__ao_device and prepares it for addition/removal.
+ * Returns the newly allocated libxl__ao_dev.
+ */
 _hidden libxl__ao_device *libxl__multidev_prepare(libxl__multidev*);
 
 /* Indicates to multidev that this one device has been processed.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 04/14] libxl: Extend libxl__ao_device with a libxl__ev_child member
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 03/14] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This can be used to fork children to allow the asynchronous execution
of system calls which only come in a synchronous variant. This will
be useful for Remus, in the following patches.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_device.c   | 1 +
 tools/libxl/libxl_internal.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 4c49c4c..4b51ded 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -453,6 +453,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
     /* We init this here because we might call device_hotplug_done
      * without actually calling any hotplug script */
     libxl__async_exec_init(&aodev->aes);
+    libxl__ev_child_init(&aodev->child);
 }
 
 /* multidev */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a2dd7ca..4a44482 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2150,6 +2150,8 @@ struct libxl__ao_device {
     libxl__async_exec_state aes;
     /* If we need to update JSON config */
     bool update_json;
+    /* for asynchronous execution of synchronous-only syscalls etc. */
+    libxl__ev_child child;
 };
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 04/14] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-10-06 14:48   ` Ian Campbell
  2014-09-26  6:13 ` [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.

When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.

NOTE: This patch changes tools/configure.ac, please rerun
      autogen.sh while applying the patch.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 README                          |  4 ++++
 config/Tools.mk.in              |  4 ++++
 docs/README.remus               |  6 ++++++
 tools/configure.ac              | 16 ++++++++++++++++
 tools/libxl/Makefile            | 13 +++++++++++++
 tools/libxl/libxl_internal.h    |  1 +
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 8 files changed, 106 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/README b/README
index 81bf938..78c5db2 100644
--- a/README
+++ b/README
@@ -73,6 +73,10 @@ disabled at compile time:
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
     * systemd daemon development files
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 974e28e..a69b846 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -43,6 +43,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
+
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -62,6 +65,7 @@ CONFIG_BLKTAP1      := @blktap1@
 CONFIG_BLKTAP2      := @blktap2@
 CONFIG_VTPM         := @vtpm@
 CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 CONFIG_SYSTEMD      := @systemd@
 SYSTEMD_CFLAGS      := @SYSTEMD_CFLAGS@
diff --git a/docs/README.remus b/docs/README.remus
index 9fa00fe..ddf5b55 100644
--- a/docs/README.remus
+++ b/docs/README.remus
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://wiki.xen.org/wiki/Remus for details.
+
+Using Remus with libxl on Xen 4.5 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
diff --git a/tools/configure.ac b/tools/configure.ac
index 4f45418..cfa4dd6 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -320,6 +320,22 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+    [libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+    AC_MSG_WARN([Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher])
+    AC_SUBST(remus_netbuf, [n])
+    ],[
+    AC_SUBST(remus_netbuf, [y])
+])
+
+AC_SUBST(LIBNL3_LIBS)
+AC_SUBST(LIBNL3_CFLAGS)
+
 fi # ! $rump
 
 AX_AVAILABLE_SYSTEMD()
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 496a269..58f9975 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,17 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
@@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4a44482..09742d9 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2586,6 +2586,7 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
 
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..52d593c
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..1c72a7f
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26 12:59   ` Ian Jackson
  2014-09-26  6:13 ` [PATCH for-4.5 v21 07/14] libxl/remus: setup and control network output buffering Yang Hongyang
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.

The following API are exposed to libxl:

One-time configuration operations:
  *libxl__remus_devices_setup
    > Enable output buffering for NICs, setup disk replication, etc.
  *libxl__remus_devices_teardown
    > Disable network output buffering and disk replication;
      teardown any associated external setups like qdiscs for NICs.

Operations executed every checkpoint (in order of invocation):
  *libxl__remus_devices_postsuspend
  *libxl__remus_devices_preresume
  *libxl__remus_devices_commit

Each device type needs to implement the interfaces specified in
the libxl__remus_device_instance_ops if it wishes to support Remus.

The high-level control flow through the Remus device layer is shown below:

xl remus
  |->  libxl_domain_remus_start
    |-> libxl__remus_devices_setup
      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
        ...
        |-> On backup failure/network error/other errors
            libxl__remus_devices_teardown

callback processing
* Only call the per-device libxl__multidev_one_callback
  when the iteration has succeded or failed.
* The final callback (called by multidev) is a trivial
  shim to shuffle the pointers and notify our own caller.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
For comments:
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 tools/libxl/Makefile             |   2 +
 tools/libxl/libxl.c              |  46 +++++-
 tools/libxl/libxl_dom.c          | 168 ++++++++++++++++++++--
 tools/libxl/libxl_internal.h     | 162 +++++++++++++++++++++
 tools/libxl/libxl_remus_device.c | 304 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl      |   2 +
 6 files changed, 667 insertions(+), 17 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 58f9975..da3cddb 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,6 +56,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus_device.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3735f55..e108e40 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -782,6 +782,10 @@ out:
     return ptr;
 }
 
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds, int rc);
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_devices_state *rds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
@@ -813,16 +817,50 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->ao = ao;
+    rds->domid = domid;
+    rds->callback = libxl__remus_setup_done;
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_devices_setup(egc, rds);
     return AO_INPROGRESS;
 
  out:
     return AO_ABORT(rc);
 }
 
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
+        dss->domid, rc);
+    rds->callback = libxl__remus_setup_failed;
+    libxl__remus_devices_teardown(egc, rds);
+}
+
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", dss->domid, rc);
+
+    dss->callback(egc, dss, rc);
+}
+
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc)
 {
@@ -832,10 +870,6 @@ static void remus_failover_cb(libxl__egc *egc,
      * backup died or some network error occurred preventing us
      * from sending checkpoints.
      */
-
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bd21841..e9d29b5 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -920,8 +920,6 @@ static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok);
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
 
 /*----- complicated callback, called by xc_domain_save -----*/
 
@@ -1583,6 +1581,14 @@ static void domain_suspend_callback_common_done(libxl__egc *egc,
 }
 
 /*----- remus callbacks -----*/
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss, int ok);
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc);
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc);
 
 static void libxl__remus_domain_suspend_callback(void *data)
 {
@@ -1597,32 +1603,77 @@ static void libxl__remus_domain_suspend_callback(void *data)
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
+    if (!ok)
+        goto out;
+
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_postsuspend_cb;
+    libxl__remus_devices_postsuspend(egc, rds);
+    return;
+
+out:
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-static void libxl__remus_domain_resume_callback(void *data)
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc)
 {
     int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    if (rc)
+        goto out;
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void libxl__remus_domain_resume_callback(void *data)
+{
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_preresume_cb;
+    libxl__remus_devices_preresume(egc, rds);
+}
+
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        goto out;
+
     /* Resumes the domain and the device model */
-    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+    rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
+    if (rc)
         goto out;
 
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
     ok = 1;
+
 out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
 
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1642,10 +1693,73 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rds->callback = remus_devices_commit_cb;
+    libxl__remus_devices_commit(egc, rds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    }
+
+    /*
+     * At this point, we have successfully checkpointed the guest and
+     * committed it at the backup. We'll come back after the checkpoint
+     * interval to checkpoint the guest again. Until then, let the guest
+     * continue execution.
+     */
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+                                     remus_next_checkpoint,
+                                     dss->interval);
+
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__domain_suspend_state *dss =
+                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+
+    STATE_AO_GC(dss->ao);
+
+    /*
+     * Time to checkpoint the guest again. We return 1 to libxc
+     * (xc_domain_save.c). in order to continue executing the infinite loop
+     * (suspend, checkpoint, resume) in xc_domain_save().
+     */
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1860,6 +1974,10 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
+static void remus_teardown_done(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc);
+
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
@@ -1874,6 +1992,34 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (!dss->remus) {
+        remus_teardown_done(egc, &dss->rds, rc);
+        return;
+    }
+
+    /*
+     * With Remus, if we reach this point, it means either
+     * backup died or some network error occurred preventing us
+     * from sending checkpoints. Teardown the network buffers and
+     * release netlink resources.  This is an async op.
+     */
+    LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
+        " teardown Remus devices...", rc);
+    dss->rds.callback = remus_teardown_done;
+    libxl__remus_devices_teardown(egc, &dss->rds);
+}
+
+static void remus_teardown_done(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
+            " rc %d", dss->domid, rc);
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 09742d9..35fbdcd 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2586,6 +2586,166 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+/*----- remus device related state structure -----*/
+/*
+ * The abstract Remus device layer exposes a common
+ * set of API to [external] libxl for manipulating devices attached to
+ * a guest protected by Remus. The device layer also exposes a set of
+ * [internal] interfaces that every device type must implement.
+ *
+ * The following API are exposed to libxl:
+ *
+ * One-time configuration operations:
+ *  +libxl__remus_devices_setup
+ *    > Enable output buffering for NICs, setup disk replication, etc.
+ *  +libxl__remus_devices_teardown
+ *    > Disable output buffering and disk replication; teardown any
+ *       associated external setups like qdiscs for NICs.
+ *
+ * Operations executed every checkpoint (in order of invocation):
+ *  +libxl__remus_devices_postsuspend
+ *  +libxl__remus_devices_preresume
+ *  +libxl__remus_devices_commit
+ *
+ * Each device type needs to implement the interfaces specified in
+ * the libxl__remus_device_instance_ops if it wishes to support Remus.
+ *
+ * The high-level control flow through the Remus device layer is shown below:
+ *
+ * xl remus
+ *  |->  libxl_domain_remus_start
+ *    |-> libxl__remus_devices_setup
+ *      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
+ *        ...
+ *        |-> On backup failure, network error or other internal errors:
+ *            libxl__remus_devices_teardown
+ */
+
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_devices_state libxl__remus_devices_state;
+typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops;
+
+/*
+ * Interfaces to be implemented by every device subkind that wishes to
+ * support Remus. Functions must be implemented unless otherwise
+ * stated. Many of these functions are asynchronous. They call
+ * dev->aodev.callback when done.  The actual implementations may be
+ * synchronous and call dev->aodev.callback directly (as the last
+ * thing they do).
+ */
+struct libxl__remus_device_instance_ops {
+    /* the device kind this ops belongs to... */
+    libxl__device_kind kind;
+
+    /*
+     * Checkpoint operations. May be NULL, meaning the op is not
+     * implemented and the caller should treat them as a no-op (and do
+     * nothing when checkpointing).
+     * Asynchronous.
+     */
+
+    void (*postsuspend)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*preresume)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*commit)(libxl__egc *egc, libxl__remus_device *dev);
+
+    /*
+     * setup() and teardown() are refer to the actual remus device.
+     * Asynchronous.
+     * teardown is called even if setup fails.
+     */
+    /*
+     * setup() should first determines whether the subkind matches the specific
+     * device. If matched, the device will then be managed with this set of
+     * subkind operations.
+     * Yields 0 if the device successfully set up.
+     * REMUS_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
+     * any other rc indicates failure.
+     */
+    void (*setup)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*teardown)(libxl__egc *egc, libxl__remus_device *dev);
+};
+
+typedef void libxl__remus_callback(libxl__egc *,
+                                   libxl__remus_devices_state *, int rc);
+
+/*
+ * State associated with a remus invocation, including parameters
+ * passed to the remus abstract device layer by the remus
+ * save/restore machinery.
+ */
+struct libxl__remus_devices_state {
+    /*---- must be set by caller of libxl__remus_device_(setup|teardown) ----*/
+
+    libxl__ao *ao;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+    int device_kind_flags;
+
+    /*----- private for abstract layer only -----*/
+
+    int num_devices;
+    /*
+     * this array is allocated before setup the remus devices by the
+     * remus abstract layer.
+     * devs may be NULL, means there's no remus devices that has been set up.
+     * the size of this array is 'num_devices', which is the total number
+     * of libxl nic devices and disk devices(num_nics + num_disks).
+     */
+    libxl__remus_device **devs;
+
+    libxl_device_nic *nics;
+    int num_nics;
+    libxl_device_disk *disks;
+    int num_disks;
+
+    libxl__multidev multidev;
+};
+
+/*
+ * Information about a single device being handled by remus.
+ * Allocated by the remus abstract layer.
+ */
+struct libxl__remus_device {
+    /*----- shared between abstract and concrete layers -----*/
+    /*
+     * if this is true, that means the subkind ops match the device
+     */
+    bool matched;
+
+    /*----- set by remus device abstruct layer -----*/
+    /* libxl__device_* which this remus device related to */
+    const void *backend_dev;
+    libxl__device_kind kind;
+    libxl__remus_devices_state *rds;
+    libxl__ao_device aodev;
+
+    /*----- private for abstract layer only -----*/
+
+    /*
+     * Control and state variables for the asynchronous callback
+     * based loops which iterate over device subkinds, and over
+     * individual devices.
+     */
+    int ops_index;
+    const libxl__remus_device_instance_ops *ops;
+
+    /*----- private for concrete (device-specific) layer -----*/
+
+    /* concrete device's private data */
+    void *concrete_data;
+};
+
+/* the following 5 APIs are async ops, call rds->callback when done */
+_hidden void libxl__remus_devices_setup(libxl__egc *egc,
+                                        libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
+                                           libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
+                                              libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
+                                            libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_commit(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2626,6 +2786,8 @@ struct libxl__domain_suspend_state {
     libxl__ev_xswatch guest_watch;
     libxl__ev_time guest_timeout;
     const char *dm_savefile;
+    libxl__remus_devices_state rds;
+    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..4e77587
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,304 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static const libxl__remus_device_instance_ops *remus_ops[] = {
+    NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__remus_devices_state *rds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    return 0;
+}
+
+static void cleanup_device_subkind(libxl__remus_devices_state *rds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+}
+
+/*----- setup() and teardown() -----*/
+
+/* callbacks */
+
+static void all_devices_setup_cb(libxl__egc *egc,
+                                 libxl__multidev *multidev,
+                                 int rc);
+static void device_setup_iterate(libxl__egc *egc,
+                                 libxl__ao_device *aodev);
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc);
+
+/* remus device setup and teardown */
+
+static libxl__remus_device* remus_device_init(libxl__egc *egc,
+                                              libxl__remus_devices_state *rds,
+                                              libxl__device_kind kind,
+                                              void *libxl_dev)
+{
+    libxl__remus_device *dev = NULL;
+
+    STATE_AO_GC(rds->ao);
+    GCNEW(dev);
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->rds = rds;
+
+    return dev;
+}
+
+static void remus_devices_setup(libxl__egc *egc,
+                                libxl__remus_devices_state *rds);
+
+void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_devices_state *rds)
+{
+    int i, rc;
+
+    STATE_AO_GC(rds->ao);
+
+    rc = init_device_subkind(rds);
+    if (rc)
+        goto out;
+
+    rds->num_devices = 0;
+    rds->num_nics = 0;
+    rds->num_disks = 0;
+
+    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
+        rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
+
+    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
+        rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
+
+    if (rds->num_nics == 0 && rds->num_disks == 0)
+        goto out;
+
+    GCNEW_ARRAY(rds->devs, rds->num_nics + rds->num_disks);
+
+    for (i = 0; i < rds->num_nics; i++) {
+        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
+                                                LIBXL__DEVICE_KIND_VIF,
+                                                &rds->nics[i]);
+    }
+
+    for (i = 0; i < rds->num_disks; i++) {
+        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
+                                                LIBXL__DEVICE_KIND_VBD,
+                                                &rds->disks[i]);
+    }
+
+    remus_devices_setup(egc, rds);
+
+    return;
+
+out:
+    rds->callback(egc, rds, rc);
+}
+
+static void remus_devices_setup(libxl__egc *egc,
+                                libxl__remus_devices_state *rds)
+{
+    int i, rc;
+
+    STATE_AO_GC(rds->ao);
+
+    libxl__multidev_begin(ao, &rds->multidev);
+    rds->multidev.callback = all_devices_setup_cb;
+    for (i = 0; i < rds->num_devices; i++) {
+        libxl__remus_device *dev = rds->devs[i];
+        dev->ops_index = -1;
+        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+
+        dev->aodev.rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+        dev->aodev.callback = device_setup_iterate;
+        device_setup_iterate(egc,&dev->aodev);
+    }
+
+    rc = 0;
+    libxl__multidev_prepared(egc, &rds->multidev, rc);
+}
+
+
+static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    EGC_GC;
+
+    if (aodev->rc != ERROR_REMUS_DEVICE_NOT_SUPPORTED &&
+        aodev->rc != ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
+        /* might be success or disaster */
+        goto out;
+
+    do {
+        dev->ops = remus_ops[++dev->ops_index];
+        if (!dev->ops) {
+            libxl_device_nic * nic = NULL;
+            libxl_device_disk * disk = NULL;
+            uint32_t domid;
+            int devid;
+            if (dev->kind == LIBXL__DEVICE_KIND_VIF) {
+                nic = (libxl_device_nic *)dev->backend_dev;
+                domid = nic->backend_domid;
+                devid = nic->devid;
+            } else if (dev->kind == LIBXL__DEVICE_KIND_VBD) {
+                disk = (libxl_device_disk *)dev->backend_dev;
+                domid = disk->backend_domid;
+                devid = libxl__device_disk_dev_number(disk->vdev, NULL, NULL);
+            } else {
+                LOG(ERROR,"device kind not handled by remus: %s",
+                    libxl__device_kind_to_string(dev->kind));
+                aodev->rc = ERROR_FAIL;
+                goto out;
+            }
+            LOG(ERROR,"device not handled by remus"
+                " (device=%s:%"PRId32"/%"PRId32")",
+                libxl__device_kind_to_string(dev->kind),
+                domid, devid);
+            aodev->rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+            goto out;
+        }
+    } while (dev->ops->kind != dev->kind);
+
+    /* found the next ops_index to try */
+    assert(dev->aodev.callback == device_setup_iterate);
+    dev->ops->setup(egc,dev);
+    return;
+
+ out:
+    libxl__multidev_one_callback(egc,aodev);
+}
+
+static void all_devices_setup_cb(libxl__egc *egc,
+                                 libxl__multidev *multidev,
+                                 int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds =
+                            CONTAINER_OF(multidev, *rds, multidev);
+
+    rds->callback(egc, rds, rc);
+}
+
+void libxl__remus_devices_teardown(libxl__egc *egc,
+                                   libxl__remus_devices_state *rds)
+{
+    int i;
+    libxl__remus_device *dev;
+
+    STATE_AO_GC(rds->ao);
+
+    libxl__multidev_begin(ao, &rds->multidev);
+    rds->multidev.callback = devices_teardown_cb;
+    for (i = 0; i < rds->num_devices; i++) {
+        dev = rds->devs[i];
+        if (!dev->ops || !dev->matched)
+            continue;
+
+        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+        dev->ops->teardown(egc,dev);
+    }
+
+    libxl__multidev_prepared(egc, &rds->multidev, 0);
+}
+
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc)
+{
+    int i;
+
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds =
+                            CONTAINER_OF(multidev, *rds, multidev);
+
+    /* clean nic */
+    for (i = 0; i < rds->num_nics; i++)
+        libxl_device_nic_dispose(&rds->nics[i]);
+    free(rds->nics);
+    rds->nics = NULL;
+    rds->num_nics = 0;
+
+    /* clean disk */
+    for (i = 0; i < rds->num_disks; i++)
+        libxl_device_disk_dispose(&rds->disks[i]);
+    free(rds->disks);
+    rds->disks = NULL;
+    rds->num_disks = 0;
+
+    cleanup_device_subkind(rds);
+
+    rds->callback(egc, rds, rc);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc);
+
+/* API implementations */
+
+#define define_remus_checkpoint_api(api)                                \
+void libxl__remus_devices_##api(libxl__egc *egc,                        \
+                                libxl__remus_devices_state *rds)        \
+{                                                                       \
+    int i;                                                              \
+    libxl__remus_device *dev;                                           \
+                                                                        \
+    STATE_AO_GC(rds->ao);                                               \
+                                                                        \
+    libxl__multidev_begin(ao, &rds->multidev);                          \
+    rds->multidev.callback = devices_checkpoint_cb;                     \
+    for (i = 0; i < rds->num_devices; i++) {                            \
+        dev = rds->devs[i];                                             \
+        if (!dev->matched || !dev->ops->api)                            \
+            continue;                                                   \
+        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);\
+        dev->ops->api(egc,dev);                                         \
+    }                                                                   \
+                                                                        \
+    libxl__multidev_prepared(egc, &rds->multidev, 0);                   \
+}
+
+define_remus_checkpoint_api(postsuspend);
+
+define_remus_checkpoint_api(preresume);
+
+define_remus_checkpoint_api(commit);
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds =
+                            CONTAINER_OF(multidev, *rds, multidev);
+
+    rds->callback(egc, rds, rc);
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 7f9e7c7..da4c52d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -61,6 +61,8 @@ libxl_error = Enumeration("error", [
     (-15, "LOCK_FAIL"),
     (-16, "JSON_CONFIG_EMPTY"),
     (-17, "DEVICE_EXISTS"),
+    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
+    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 07/14] libxl/remus: setup and control network output buffering
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 08/14] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:

1. Hotplug scripts: The remus-netbuf-setup script is responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering.  This script should be invoked by libxl for
  each of the guest's network interfaces, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the REMUS_IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

2. Remus network device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
      invocation. They establish and free netlink related state respectively.

   b) setup() and teardown are called for each vif attached to the
      guest.
      During setup():
      i) The hotplug script is called to setup a network buffer on a
         given vif. The script chooses an available IFB device from
         the system, redirects vif egress traffic to the IFB device
         and sets up the plug qdisc (output buffer) on the IFB device.
         The name of the IFB device is communicated via xenstore to
         libxl.

      ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
          and subsequently controls output buffering using this handle
          in the checkpoint callbacks.

      During teardown(), the hotplug scripts are called again to remove
      the vif->ifb traffic redirection, release the ifb and the plug
      qdisc associated with it.

   c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
      are implemented as synchronous ops as the netlink calls associated
      with the qdisc subsystem are very fast.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 230 ++++++++++++++++
 tools/libxl/libxl.c                    |   7 +
 tools/libxl/libxl_internal.h           |  10 +
 tools/libxl/libxl_netbuffer.c          | 481 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  23 ++
 tools/libxl/libxl_remus_device.c       |  18 +-
 8 files changed, 773 insertions(+), 1 deletion(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index ea67536..d94ea9d 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -393,6 +393,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index d5a9ed2..31e57f7 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..87dfa69
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,230 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the REMUS_IFB device details will be
+#             stored or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# REMUS_IFB   ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the REMUS_IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+
+# Remus network buffering requirements:
+
+# We need to buffer (queue) egress traffic from every vif attached to
+# the guest and release the buffers when the checkpoint associated
+# with them has been committed at the backup host. We achieve this
+# with the help of the plug queuing discipline (sch_plug module).
+# Simply put, Remus' network buffering imposes traffic
+# shaping on the guest's vif(s).
+
+# Limitations and Workarounds:
+
+# Egress traffic from a vif appears as ingress traffic to dom0. Linux
+# supports policing (dropping packets) but not traffic shaping
+# (queuing packets) on ingress traffic. The standard workaround to
+# this limitation is to attach an ingress qdisc to the guest vif,
+# redirect all egress traffic from the guest to an intermediate
+# queuing interface, and apply egress rules to it. The IFB
+# (Intermediate Functional Block) device serves the purpose of an
+# intermediate queuing interface.
+#
+
+# The following commands install a network buffer on a
+# guest's vif (vif1.0) using an IFB device (ifb0):
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+# Note:
+# 1. If the setup process fails, the script's cleanup is limited to removing the
+#    ingress qdisc on the guest vif, so that its traffic can flow normally.
+#    The chosen ifb device is not torn down. Libxl has to execute the
+#    teardown op to remove other qdiscs and subsequently free the IFB device.
+#
+# 2. The teardown op may be invoked multiple times by libxl.
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+#return 0 if the ifb is free
+check_ifb() {
+    local installed=`nl-qdisc-list -d $1`
+    [ -n "$installed" ] && return 1
+
+    for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+    do
+        [ $domid -eq 0 ] && continue
+        xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+        for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true`
+        do
+            local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+            xenstore-exists $path || continue
+            local ifb=`xenstore-read "$path" 2>/dev/null || true`
+            [ "$ifb" = "$1" ] && return 1
+        done
+    done
+
+    return 0
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        check_ifb "$ifb" || continue
+        REMUS_IFB="$ifb"
+        break
+    done
+
+    if [ -z "$REMUS_IFB" ]
+    then
+        fatal "Unable to find a free ifb device for $vifname"
+    fi
+
+    #not using xenstore_write that automatically exits on error
+    #because we need to cleanup
+    xenstore_write "$XENBUS_PATH/ifb" "$REMUS_IFB"
+    do_or_die ip link set dev "$REMUS_IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    #Check if the XENBUS_PATH/ifb exists and has IFB name same as REMUS_IFB.
+    #Otherwise, if the teardown op is called multiple times, then we may end
+    #up freeing another domain's allocated IFB inside the if loop.
+    xenstore-exists "$XENBUS_PATH/ifb" && \
+        local ifb2=`xenstore-read "$XENBUS_PATH/ifb" 2>/dev/null || true`
+
+    if [[ "$ifb2" && "$ifb2" == "$ifb" ]]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$REMUS_IFB"
+        add_plug_qdisc "$vifname" "$REMUS_IFB"
+        release_lock "pickifb"
+
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$REMUS_IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $REMUS_IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index e108e40..27fdfc2 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -819,6 +819,13 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
+
+    if (!libxl__netbuffer_enabled(gc)) {
+        LOG(ERROR, "Remus: No support for network buffering");
+        goto out;
+    }
+    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+
     rds->ao = ao;
     rds->domid = domid;
     rds->callback = libxl__remus_setup_done;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 35fbdcd..2776d19 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2665,6 +2665,9 @@ struct libxl__remus_device_instance_ops {
     void (*teardown)(libxl__egc *egc, libxl__remus_device *dev);
 };
 
+int init_subkind_nic(libxl__remus_devices_state *rds);
+void cleanup_subkind_nic(libxl__remus_devices_state *rds);
+
 typedef void libxl__remus_callback(libxl__egc *,
                                    libxl__remus_devices_state *, int rc);
 
@@ -2699,6 +2702,13 @@ struct libxl__remus_devices_state {
     int num_disks;
 
     libxl__multidev multidev;
+
+    /*----- private for concrete (device-specific) layer only -----*/
+
+    /* private for nic device subkind ops */
+    char *netbufscript;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
 };
 
 /*
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 52d593c..72e0ad0 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,492 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_device_nic {
+    int devid;
+
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+int init_subkind_nic(libxl__remus_devices_state *rds)
+{
+    int rc, ret;
+
+    STATE_AO_GC(rds->ao);
+
+    rds->nlsock = nl_socket_alloc();
+    if (!rds->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                  libxl__xen_script_dir_path());
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+{
+    STATE_AO_GC(rds->ao);
+
+    /* free qdisc cache */
+    if (rds->qdisc_cache) {
+        nl_cache_clear(rds->qdisc_cache);
+        nl_cache_free(rds->qdisc_cache);
+        rds->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (rds->nlsock) {
+        nl_close(rds->nlsock);
+        nl_socket_free(rds->nlsock);
+        rds->nlsock = NULL;
+    }
+}
+
+/*----- setup() and teardown() -----*/
+
+/* helper functions */
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->rds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_devices_state *rds,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int rc, ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Now that we have brought up REMUS_IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get a handle to the REMUS_IFB interface */
+    ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+    if (ret) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the REMUS_IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the REMUS_IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    if (rc && qdisc)
+        nl_object_put((struct nl_object *)qdisc);
+
+    return rc;
+}
+
+/* callbacks */
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status);
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status);
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $REMUS_IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__remus_device *dev, char *op)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+    libxl__remus_devices_state *rds = dev->rds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Convenience aliases */
+    char *const script = libxl__strdup(gc, rds->netbufscript);
+    const uint32_t domid = rds->domid;
+    const int dev_id = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "REMUS_IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->rds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = netbuf_teardown_script_cb;
+    else
+        aes->callback = netbuf_setup_script_cb;
+}
+
+/* setup() and teardown() */
+
+static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices
+     */
+    dev->matched = true;
+
+    GCNEW(remus_nic);
+    dev->concrete_data = remus_nic;
+    remus_nic->devid = nic->devid;
+    remus_nic->vif = get_vifname(dev, nic);
+    if (!remus_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup");
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/*
+ * In return, the script writes the name of REMUS_IFB device (during setup)
+ * to be used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+    libxl__remus_devices_state *rds = dev->rds;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = rds->domid;
+    const int devid = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    /*
+     * we need to get ifb first because it's needed for teardown
+     */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc)
+        goto out;
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            rds->netbufscript, vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(rds, remus_nic);
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void nic_teardown(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    STATE_AO_GC(dev->rds->ao);
+
+    setup_async_exec(dev, "teardown");
+
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    free_qdisc(remus_nic);
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* The value of buffer_op, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+/* API implementations */
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+                           libxl__remus_devices_state *rds,
+                           int buffer_op)
+{
+    int rc, ret;
+
+    STATE_AO_GC(rds->ao);
+
+    if (buffer_op == tc_buffer_start)
+        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (ret) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    if (ret) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    if (rc)
+        LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+            ((buffer_op == tc_buffer_start) ?
+            "start_new_epoch" : "release_prev_epoch"),
+            remus_nic->ifb, nl_geterror(ret));
+    return rc;
+}
+
+static void nic_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void nic_commit(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+    .postsuspend = nic_postsuspend,
+    .commit = nic_commit,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 1c72a7f..3c659c2 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,29 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+int init_subkind_nic(libxl__remus_devices_state *rds)
+{
+    return 0;
+}
+
+void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+{
+    return;
+}
+
+static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    dev->aodev.rc = ERROR_FAIL;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = nic_setup,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 4e77587..b20168f 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -17,7 +17,9 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__remus_device_instance_ops remus_device_nic;
 static const libxl__remus_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
     NULL,
 };
 
@@ -26,12 +28,26 @@ static const libxl__remus_device_instance_ops *remus_ops[] = {
 static int init_device_subkind(libxl__remus_devices_state *rds)
 {
     /* init device subkind-specific state in the libxl ctx */
-    return 0;
+    int rc;
+    STATE_AO_GC(rds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(rds);
+        if (rc) goto out;
+    }
+
+    rc = 0;
+out:
+    return rc;
 }
 
 static void cleanup_device_subkind(libxl__remus_devices_state *rds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(rds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(rds);
 }
 
 /*----- setup() and teardown() -----*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 08/14] libxl/remus: setup and control disk replication for DRBD backends
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (6 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 07/14] libxl/remus: setup and control network output buffering Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool Yang Hongyang
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This patch adds the machinery required for protecting a guest's
disk state, when the guest disk uses a DRBD disk backend.
This patch comprises of two parts:

1. Hotplug scripts: The block-drbd-probe script is responsible for
  performing sanity checks on the state of the DRBD disk before the
  checkpointing process begins. This script should be invoked by
  libxl for each of the guest's disk devices, when starting Remus.

2. Remus drbd disk device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) setup() is called for each disk attached to the guest.
      During setup():
      i) The hotplug script is called to perform the sanity check.

      ii) Libxl obtains a handle to the DRBD device (/dev/drbd*) and
          and subsequently controls disk checkpoint replication using
          this handle in the checkpoint callbacks.

   c) The preresume() checkpoint callback is executed asynchronously
      using libxl__ev_child_fork(), as it may potentially block for more
      than few seconds in case of backup failure.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Edits to commit message:
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/README.remus                    |  10 ++
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/block-drbd-probe |  87 ++++++++++++
 tools/libxl/Makefile                 |   2 +-
 tools/libxl/libxl.c                  |   1 +
 tools/libxl/libxl_internal.h         |   5 +
 tools/libxl/libxl_remus_device.c     |   7 +
 tools/libxl/libxl_remus_disk_drbd.c  | 258 +++++++++++++++++++++++++++++++++++
 8 files changed, 370 insertions(+), 1 deletion(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/docs/README.remus b/docs/README.remus
index ddf5b55..20783c9 100644
--- a/docs/README.remus
+++ b/docs/README.remus
@@ -8,3 +8,13 @@ Using Remus with libxl on Xen 4.5 and higher:
  or higher along with the development headers and command line utilities.
  If your distro does not have the appropriate libnl3 version, you can find
  the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
+
+Disk replication:
+ VMs protected by Remus need to use DRBD based disk backends. Specifically, you
+ need a compile and install a custom version of DRBD, that is available publicly
+ at https://github.com/rshriram/remus-drbd
+ This code is based on DRBD 8.3.11 and uses a new replication protocol (named
+ protocol D) for asynchronous disk checkpoint replication. A protected VM's DRBD
+ disks on the primary and backup hosts need to be configured to use protocol D
+ as the replication protocol. An example resource configuration file can be found
+ in the aforementioned github repository.
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 31e57f7..5317fef 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -24,6 +24,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..247a9d0
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,87 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+#     2: unkown error
+#     3: the drbd device does not use protocol D
+#     4: the drbd device is not ready
+
+set -e
+
+drbd_res=
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            drbd_res="$temp_drbd_res"
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+rc=$?
+if [[ $rc -ne 0 ]]; then
+    exit $rc
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+    exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+    exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+    exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index da3cddb..a6c3b0e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 27fdfc2..1856ae5 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -825,6 +825,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
     rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
 
     rds->ao = ao;
     rds->domid = domid;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2776d19..b87c5e2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2667,6 +2667,8 @@ struct libxl__remus_device_instance_ops {
 
 int init_subkind_nic(libxl__remus_devices_state *rds);
 void cleanup_subkind_nic(libxl__remus_devices_state *rds);
+int init_subkind_drbd_disk(libxl__remus_devices_state *rds);
+void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds);
 
 typedef void libxl__remus_callback(libxl__egc *,
                                    libxl__remus_devices_state *, int rc);
@@ -2709,6 +2711,9 @@ struct libxl__remus_devices_state {
     char *netbufscript;
     struct nl_sock *nlsock;
     struct nl_cache *qdisc_cache;
+
+    /* private for drbd disk subkind ops */
+    char *drbd_probe_script;
 };
 
 /*
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index b20168f..a6cb7f6 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -18,8 +18,10 @@
 #include "libxl_internal.h"
 
 extern const libxl__remus_device_instance_ops remus_device_nic;
+extern const libxl__remus_device_instance_ops remus_device_drbd_disk;
 static const libxl__remus_device_instance_ops *remus_ops[] = {
     &remus_device_nic,
+    &remus_device_drbd_disk,
     NULL,
 };
 
@@ -36,6 +38,9 @@ static int init_device_subkind(libxl__remus_devices_state *rds)
         if (rc) goto out;
     }
 
+    rc = init_subkind_drbd_disk(rds);
+    if (rc) goto out;
+
     rc = 0;
 out:
     return rc;
@@ -48,6 +53,8 @@ static void cleanup_device_subkind(libxl__remus_devices_state *rds)
 
     if (libxl__netbuffer_enabled(gc))
         cleanup_subkind_nic(rds);
+
+    cleanup_subkind_drbd_disk(rds);
 }
 
 /*----- setup() and teardown() -----*/
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..3215f93
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,258 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+typedef struct libxl__remus_drbd_disk {
+    int ctl_fd;
+    int ackwait;
+} libxl__remus_drbd_disk;
+
+int init_subkind_drbd_disk(libxl__remus_devices_state *rds)
+{
+    STATE_AO_GC(rds->ao);
+
+    rds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                       libxl__xen_script_dir_path());
+
+    return 0;
+}
+
+void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds)
+{
+    return;
+}
+
+/*----- helper functions, for async calls -----*/
+static void drbd_async_call(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            void func(libxl__remus_device *),
+                            libxl__ev_child_callback callback)
+{
+    int pid = -1, rc;
+    libxl__ao_device *aodev = &dev->aodev;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &aodev->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/*----- match(), setup() and teardown() -----*/
+
+/* callbacks */
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status);
+
+/* implementations */
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev);
+
+static void drbd_setup(libxl__egc *egc, libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    match_async_exec(egc, dev);
+}
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0, rc;
+    const libxl_device_disk *disk = dev->backend_dev;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(aes->env, arraysize);
+    aes->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(aes->args, arraysize);
+    aes->args[nr++] = dev->rds->drbd_probe_script;
+    aes->args[nr++] = disk->pdev_path;
+    aes->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    aes->ao = dev->rds->ao;
+    aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->callback = match_async_exec_cb;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    rc = libxl__async_exec_start(gc, aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_drbd_disk *drbd_disk;
+    const libxl_device_disk *disk = dev->backend_dev;
+
+    STATE_AO_GC(aodev->ao);
+
+    if (status) {
+        rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    /* ops matched */
+    dev->matched = true;
+
+    GCNEW(drbd_disk);
+    dev->concrete_data = drbd_disk;
+    drbd_disk->ackwait = 0;
+    drbd_disk->ctl_fd = open(disk->pdev_path, O_RDONLY);
+    if (drbd_disk->ctl_fd < 0) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void drbd_teardown(libxl__egc *egc, libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk = dev->concrete_data;
+    STATE_AO_GC(dev->rds->ao);
+
+    close(drbd_disk->ctl_fd);
+    dev->aodev.rc = 0;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+static void checkpoint_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       pid_t pid, int status);
+
+/* API implementations */
+
+/* this op will not wait and block, so implement as sync op */
+static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    libxl__remus_drbd_disk *rdd = dev->concrete_data;
+
+    if (!rdd->ackwait) {
+        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            rdd->ackwait = 1;
+    }
+
+    dev->aodev.rc = 0;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+
+static void drbd_preresume_async(libxl__remus_device *dev);
+
+static void drbd_preresume(libxl__egc *egc, libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    drbd_async_call(egc, dev, drbd_preresume_async, checkpoint_async_call_done);
+}
+
+static void drbd_preresume_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->concrete_data;
+    int ackwait = rdd->ackwait;
+
+    if (ackwait) {
+        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        ackwait = 0;
+    }
+
+    _exit(ackwait);
+}
+
+static void checkpoint_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       pid_t pid, int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_drbd_disk *rdd = dev->concrete_data;
+
+    STATE_AO_GC(aodev->ao);
+
+    if (!WIFEXITED(status)) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rdd->ackwait = WEXITSTATUS(status);
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_drbd_disk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (7 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 08/14] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26 12:57   ` Ian Jackson
  2014-09-26  6:13 ` [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Use defbool instead of bool for boolean flags in remus_info struct.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl.c         | 3 +++
 tools/libxl/libxl_dom.c     | 2 +-
 tools/libxl/libxl_types.idl | 4 ++--
 tools/libxl/xl_cmdimpl.c    | 9 ++++-----
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 1856ae5..02a1638 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -804,6 +804,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    libxl_defbool_setdefault(&info->blackhole, false);
+    libxl_defbool_setdefault(&info->compression, true);
+
     GCNEW(dss);
     dss->ao = ao;
     dss->callback = remus_failover_cb;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e9d29b5..d63ae1b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1809,7 +1809,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 
     if (r_info != NULL) {
         dss->interval = r_info->interval;
-        if (r_info->compression)
+        if (libxl_defbool_val(r_info->compression))
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index da4c52d..16e374f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -611,8 +611,8 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
 
 libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
-    ("blackhole",    bool),
-    ("compression",  bool),
+    ("blackhole",    libxl_defbool),
+    ("compression",  libxl_defbool),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index d205f96..e9e8900 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7495,18 +7495,17 @@ int main_remus(int argc, char **argv)
     memset(&r_info, 0, sizeof(libxl_domain_remus_info));
     /* Defaults */
     r_info.interval = 200;
-    r_info.blackhole = 0;
-    r_info.compression = 1;
+    libxl_defbool_setdefault(&r_info.blackhole, false);
 
     SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
     case 'b':
-        r_info.blackhole = 1;
+        libxl_defbool_set(&r_info.blackhole, true);
         break;
     case 'u':
-        r_info.compression = 0;
+        libxl_defbool_set(&r_info.compression, false);
         break;
     case 's':
         ssh_command = optarg;
@@ -7519,7 +7518,7 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
-    if (r_info.blackhole) {
+    if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
             perror("failed to open /dev/null");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (8 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26 12:57   ` Ian Jackson
  2014-09-26  6:13 ` [PATCH for-4.5 v21 11/14] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

By default, network buffering and disk replication are enabled;
checkpoints are replicated to another standby VM.

This patch allows the user to disable any of these features by
explicitly specifying a 'run in unsafe mode' switch when invoking
the 'xl remus' command.  While running Remus in an unsafe mode
makes little sense under normal circumstances, it is useful to be
able to disable one or more features mentioned above for
testing/debugging/profiling purposes.

Unless this option is enabled, it will not be possible to
replicate memory checkpoints to /dev/null (blackhole replication),
disable network buffering or disk replication.

As a starter, the use of blackhole replication now requires that
the unsafe mode be enabled. Subsequent patches will add support
for disabling network buffering and disk replication in a similar
manner.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/man/xl.pod.1           | 15 ++++++++++-----
 tools/libxl/libxl.c         |  7 +++++++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  5 ++++-
 tools/libxl/xl_cmdtable.c   |  7 +++++--
 5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f9bc812..2ae3007 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -446,11 +446,6 @@ B<OPTIONS>
 
 Checkpoint domain memory every MS milliseconds (default 200ms).
 
-=item B<-b>
-
-Replicate memory checkpoints to /dev/null (blackhole).
-Generally useful for debugging.
-
 =item B<-u>
 
 Disable memory checkpoint compression.
@@ -465,6 +460,16 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
 On the new host, do not wait in the background (on <host>) for the death
 of the domain. See the corresponding option of the I<create> subcommand.
 
+=item B<-F>
+
+Run Remus in unsafe mode. Use this option with caution as failover may
+not work as intended.
+
+=item B<-b>
+
+Replicate memory checkpoints to /dev/null (blackhole).
+Generally useful for debugging. Requires enabling unsafe mode.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 02a1638..332b7df 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -804,9 +804,16 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    libxl_defbool_setdefault(&info->allow_unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
 
+    if (!libxl_defbool_val(info->allow_unsafe) &&
+        libxl_defbool_val(info->blackhole)) {
+        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
+        goto out;
+    }
+
     GCNEW(dss);
     dss->ao = ao;
     dss->callback = remus_failover_cb;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 16e374f..0fea5b6 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -611,6 +611,7 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
 
 libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
+    ("allow_unsafe", libxl_defbool),
     ("blackhole",    libxl_defbool),
     ("compression",  libxl_defbool),
     ])
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index e9e8900..edcfa64 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7497,10 +7497,13 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
+    case 'F':
+        libxl_defbool_set(&r_info.allow_unsafe, true);
+        break;
     case 'b':
         libxl_defbool_set(&r_info.blackhole, true);
         break;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index dd15947..08f3c90 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -495,13 +495,16 @@ struct cmd_spec cmd_table[] = {
       "Enable Remus HA for domain",
       "[options] <Domain> [<host>]",
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
-      "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
       "-e                      Do not wait in the background (on <host>) for the death\n"
-      "                        of the domain."
+      "                        of the domain.\n"
+      "-F                      Enable unsafe configurations [-b flags]. Use this option\n"
+      "                        with caution as failover may not work as intended.\n"
+      "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
+      "                        Works only in unsafe mode."
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 11/14] xl/remus: cmdline switches and config vars to control network buffering
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (9 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 12/14] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
                  and tear down the guest's interface.

Add cmdline switches to 'xl remus' command to enable or disable
network buffering and a domain-specific hotplug script to setup
network buffering.

Add a new config var 'remus.default.netbufscript' to xl.conf, that
allows the user to override the default global script used to
setup network buffering.

Note: Network buffering is enabled by default. Disabling network
buffering requires enabling unsafe mode.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/man/xl.conf.pod.5        |  6 ++++++
 docs/man/xl.pod.1             | 11 ++++++++++-
 tools/libxl/libxl.c           | 18 ++++++++++++------
 tools/libxl/libxl_netbuffer.c |  9 +++++++--
 tools/libxl/libxl_types.idl   |  2 ++
 tools/libxl/xl.c              |  4 ++++
 tools/libxl/xl.h              |  1 +
 tools/libxl/xl_cmdimpl.c      | 27 +++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c     |  7 +++++--
 9 files changed, 68 insertions(+), 17 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2ae3007..1f165ad 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -436,7 +436,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     There is no support for disk buffering at the moment.
 
 B<OPTIONS>
 
@@ -460,6 +460,11 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
 On the new host, do not wait in the background (on <host>) for the death
 of the domain. See the corresponding option of the I<create> subcommand.
 
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the
+default script (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-F>
 
 Run Remus in unsafe mode. Use this option with caution as failover may
@@ -470,6 +475,10 @@ not work as intended.
 Replicate memory checkpoints to /dev/null (blackhole).
 Generally useful for debugging. Requires enabling unsafe mode.
 
+=item B<-n>
+
+Disable network output buffering. Requires enabling unsafe mode.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 332b7df..e0e1b44 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -807,13 +807,17 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     libxl_defbool_setdefault(&info->allow_unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
+    libxl_defbool_setdefault(&info->netbuf, true);
 
     if (!libxl_defbool_val(info->allow_unsafe) &&
-        libxl_defbool_val(info->blackhole)) {
-        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
+        (libxl_defbool_val(info->blackhole) ||
+         !libxl_defbool_val(info->netbuf))) {
+        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null and "
+                   "disable network buffering");
         goto out;
     }
 
+
     GCNEW(dss);
     dss->ao = ao;
     dss->callback = remus_failover_cb;
@@ -830,11 +834,13 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
 
-    if (!libxl__netbuffer_enabled(gc)) {
-        LOG(ERROR, "Remus: No support for network buffering");
-        goto out;
+    if (libxl_defbool_val(info->netbuf)) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
     }
-    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
     rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
 
     rds->ao = ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 72e0ad0..edc6843 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -41,6 +41,7 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 int init_subkind_nic(libxl__remus_devices_state *rds)
 {
     int rc, ret;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     STATE_AO_GC(rds->ao);
 
@@ -68,8 +69,12 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
         goto out;
     }
 
-    rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
-                                  libxl__xen_script_dir_path());
+    if (dss->remus->netbufscript) {
+        rds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+    } else {
+        rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                      libxl__xen_script_dir_path());
+    }
 
     rc = 0;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0fea5b6..494d37e 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -614,6 +614,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("allow_unsafe", libxl_defbool),
     ("blackhole",    libxl_defbool),
     ("compression",  libxl_defbool),
+    ("netbuf",       libxl_defbool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 4c5a5ee..f014306 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -176,6 +177,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+        &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 6a6a0f9..6c7aa8e 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -171,6 +171,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index edcfa64..48a3a41 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7497,7 +7497,7 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbuni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7510,6 +7510,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         libxl_defbool_set(&r_info.compression, false);
         break;
+    case 'n':
+        libxl_defbool_set(&r_info.netbuf, false);
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7521,6 +7527,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7558,13 +7567,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 08f3c90..cd1b612 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -501,10 +501,13 @@ struct cmd_spec cmd_table[] = {
       "                        ssh <host> xl migrate-receive -r [-e]\n"
       "-e                      Do not wait in the background (on <host>) for the death\n"
       "                        of the domain.\n"
-      "-F                      Enable unsafe configurations [-b flags]. Use this option\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        default script (/etc/xen/scripts/remus-netbuf-setup).\n"
+      "-F                      Enable unsafe configurations [-b|-n flags]. Use this option\n"
       "                        with caution as failover may not work as intended.\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
-      "                        Works only in unsafe mode."
+      "                        Works only in unsafe mode.\n"
+      "-n                      Disable network output buffering. Works only in unsafe mode."
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 12/14] xl/remus: add a cmdline switch to disable disk replication
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (10 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 11/14] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 13/14] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Disk replication is enabled by default. This patch adds a cmdline
switch to 'xl remus' command to explicitly disable disk replication.
A new boolean field 'diskbuf' is added to the libxl_domain_remus_info
structure to represent this configuration option inside libxl.

Note: Disabling disk replication requires enabling unsafe mode.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/man/xl.pod.1           |  6 +++++-
 tools/libxl/libxl.c         | 12 ++++++++----
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  5 ++++-
 tools/libxl/xl_cmdtable.c   |  5 +++--
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 1f165ad..362e92f 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -436,7 +436,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for disk buffering at the moment.
+     Disk replication support is limited to DRBD disks.
 
 B<OPTIONS>
 
@@ -479,6 +479,10 @@ Generally useful for debugging. Requires enabling unsafe mode.
 
 Disable network output buffering. Requires enabling unsafe mode.
 
+=item B<-d>
+
+Disable disk replication. Requires enabling unsafe mode.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index e0e1b44..9f629c4 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -808,12 +808,14 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
     libxl_defbool_setdefault(&info->netbuf, true);
+    libxl_defbool_setdefault(&info->diskbuf, true);
 
     if (!libxl_defbool_val(info->allow_unsafe) &&
         (libxl_defbool_val(info->blackhole) ||
-         !libxl_defbool_val(info->netbuf))) {
-        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null and "
-                   "disable network buffering");
+         !libxl_defbool_val(info->netbuf) ||
+         !libxl_defbool_val(info->diskbuf))) {
+        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null,"
+                   "disable network buffering and disk replication");
         goto out;
     }
 
@@ -841,7 +843,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         }
         rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
     }
-    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
+
+    if (libxl_defbool_val(info->diskbuf))
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
 
     rds->ao = ao;
     rds->domid = domid;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 494d37e..bbb03e2 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -616,6 +616,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("compression",  libxl_defbool),
     ("netbuf",       libxl_defbool),
     ("netbufscript", string),
+    ("diskbuf",      libxl_defbool),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 48a3a41..7912e06 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7497,7 +7497,7 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbuni:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7516,6 +7516,9 @@ int main_remus(int argc, char **argv)
     case 'N':
         r_info.netbufscript = optarg;
         break;
+    case 'd':
+        libxl_defbool_set(&r_info.diskbuf, false);
+        break;
     case 's':
         ssh_command = optarg;
         break;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index cd1b612..f93ee4f 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -503,11 +503,12 @@ struct cmd_spec cmd_table[] = {
       "                        of the domain.\n"
       "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
       "                        default script (/etc/xen/scripts/remus-netbuf-setup).\n"
-      "-F                      Enable unsafe configurations [-b|-n flags]. Use this option\n"
+      "-F                      Enable unsafe configurations [-b|-n|-d flags]. Use this option\n"
       "                        with caution as failover may not work as intended.\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
       "                        Works only in unsafe mode.\n"
-      "-n                      Disable network output buffering. Works only in unsafe mode."
+      "-n                      Disable network output buffering. Works only in unsafe mode.\n"
+      "-d                      Disable disk replication. Works only in unsafe mode."
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 13/14] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (11 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 12/14] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26  6:13 ` [PATCH for-4.5 v21 14/14] MAINTAINERS: update maintained files of Remus Yang Hongyang
  2014-09-26 13:10 ` [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Ian Jackson
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Add LIBXL_HAVE_REMUS to indicate Remus support in libxl

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 9ae0fcc..2700cc1 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -647,6 +647,12 @@ typedef struct libxl__ctx libxl_ctx;
  */
 #define LIBXL_HAVE_BUILDINFO_SERIAL_LIST 1
 
+/*
+ * LIBXL_HAVE_REMUS
+ * If this is defined, then libxl supports remus.
+ */
+#define LIBXL_HAVE_REMUS 1
+
 typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_FMT "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx"
 #define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.5 v21 14/14] MAINTAINERS: update maintained files of Remus
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (12 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 13/14] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
@ 2014-09-26  6:13 ` Yang Hongyang
  2014-09-26 13:10 ` [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Ian Jackson
  14 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2014-09-26  6:13 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Add Remus specific hotplug scripts and libxl files
to the list of maintained files.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bf6b099..935e6cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -260,8 +260,15 @@ M:	Shriram Rajagopalan <rshriram@cs.ubc.ca>
 M:	Yang Hongyang <yanghy@cn.fujitsu.com>
 S:	Maintained
 F:	docs/README.remus
+F:	tools/libxc/xc_domain_save.c
+F:	tools/libxc/xc_domain_restore.c
 F:	tools/blktap2/drivers/block-remus.c
 F:	tools/blktap2/drivers/hashtable*
+F:	tools/libxl/libxl_remus_*
+F:	tools/libxl/libxl_netbuffer.c
+F:	tools/libxl/libxl_nonetbuffer.c
+F:	tools/hotplug/Linux/remus-netbuf-setup
+F:	tools/hotplug/Linux/block-drbd-probe
 
 SCHEDULING
 M:	George Dunlap <george.dunlap@eu.citrix.com>
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations
  2014-09-26  6:13 ` [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
@ 2014-09-26 12:57   ` Ian Jackson
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Jackson @ 2014-09-26 12:57 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: laijs, wency, yunhong.jiang, eddie.dong, xen-devel, rshriram,
	ian.campbell

Yang Hongyang writes ("[PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations"):
> By default, network buffering and disk replication are enabled;
> checkpoints are replicated to another standby VM.
> 
> This patch allows the user to disable any of these features by
> explicitly specifying a 'run in unsafe mode' switch when invoking
> the 'xl remus' command.  While running Remus in an unsafe mode
> makes little sense under normal circumstances, it is useful to be
> able to disable one or more features mentioned above for
> testing/debugging/profiling purposes.
...
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool
  2014-09-26  6:13 ` [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool Yang Hongyang
@ 2014-09-26 12:57   ` Ian Jackson
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Jackson @ 2014-09-26 12:57 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: laijs, wency, yunhong.jiang, eddie.dong, xen-devel, rshriram,
	ian.campbell

Yang Hongyang writes ("[PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool"):
> Use defbool instead of bool for boolean flags in remus_info struct.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer
  2014-09-26  6:13 ` [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
@ 2014-09-26 12:59   ` Ian Jackson
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Jackson @ 2014-09-26 12:59 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: laijs, wency, yunhong.jiang, eddie.dong, xen-devel, rshriram,
	ian.campbell

Yang Hongyang writes ("[PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer"):
> Introduce an abstract device layer that allows the Remus
> logic in libxl to control a guest's devices in a device-agnostic
> manner. The device layer also exposes a set of internal interfaces
> that a device type must implement, if it wishes to support Remus.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk
  2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (13 preceding siblings ...)
  2014-09-26  6:13 ` [PATCH for-4.5 v21 14/14] MAINTAINERS: update maintained files of Remus Yang Hongyang
@ 2014-09-26 13:10 ` Ian Jackson
  2014-09-26 14:14   ` Ian Jackson
  14 siblings, 1 reply; 24+ messages in thread
From: Ian Jackson @ 2014-09-26 13:10 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: laijs, wency, yunhong.jiang, eddie.dong, xen-devel, rshriram,
	ian.campbell

Yang Hongyang writes ("[PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk"):
> This patch series adds support for network buffering and drbd disk
> in the Remus codebase in libxl.

Thanks.

Of these, all have sufficient acks etc. now apart from:


   6 libxl/remus: introduce an abstract Remus device layer
   9 xl/remus: change bool to defbool

These are lacking Konrad's review/ack in the commit itself and in the
v21 resend but I think Konrad intended to ack them and OK them for the
release in his mail:
    Date: Thu, 25 Sep 2014 15:28:31 -0400
    Message-ID: <20140925192831.GJ29663@laptop.dumpdata.com>


   1 libxl: multidev: Clarify comments about which callbacks are meant
   2 libxl: multidev: Expose libxl__multidev_one_callback

These are new patches which I wrote yesterday.  There is no functional
change in them, so the review should be a formality, but they still
should have 1. an ack from a libxl maintainer 2. a release ack from
Konrad.


Konrad, Ian/Wei: can you confirm as applicable ?  If so I will
transfer the acks into the git commits, rebase onto staging's tip, and
push the series.


Thanks,
Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant
  2014-09-26  6:13 ` [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant Yang Hongyang
@ 2014-09-26 13:56   ` Wei Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Liu @ 2014-09-26 13:56 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, xen-devel, rshriram, laijs

On Fri, Sep 26, 2014 at 02:13:06PM +0800, Yang Hongyang wrote:
> From: Ian Jackson <ian.jackson@eu.citrix.com>
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

> ---
>  tools/libxl/libxl_internal.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f61673c..20aca4b 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2164,7 +2164,8 @@ struct libxl__ao_device {
>   *       (or some other thing which will eventually call aodev->callback)
>   * Finally, once
>   *    libxl__multidev_prepared
> - * which will result (perhaps reentrantly) in one call to callback().
> + * which will result (perhaps reentrantly) in one call to
> + * multidev->callback().
>   */
>  
>  /* Starts preparing to add/remove a bunch of devices. */
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback
  2014-09-26  6:13 ` [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback Yang Hongyang
@ 2014-09-26 13:58   ` Wei Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Liu @ 2014-09-26 13:58 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, xen-devel, rshriram, laijs

On Fri, Sep 26, 2014 at 02:13:07PM +0800, Yang Hongyang wrote:
> From: Ian Jackson <ian.jackson@eu.citrix.com>
> 
> Now a caller who wants to be able to do other work when the aodev
> completes can put their own callback into the aodev, and make the
> multidev machinery aware that the particular aodev is complete (from
> the point of view that multidev should have) whenever it likes.
> 
> No functional change in this patch.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk
  2014-09-26 13:10 ` [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Ian Jackson
@ 2014-09-26 14:14   ` Ian Jackson
  2014-09-26 14:20     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Jackson @ 2014-09-26 14:14 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel, ian.campbell, wency, yunhong.jiang,
	laijs, eddie.dong, rshriram, Wei Liu, Konrad Rzeszutek Wilk

Ian Jackson writes ("Re: [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk"):
> These are lacking Konrad's review/ack in the commit itself and in the
> v21 resend but I think Konrad intended to ack them and OK them for the
> release in his mail:
>     Date: Thu, 25 Sep 2014 15:28:31 -0400
>     Message-ID: <20140925192831.GJ29663@laptop.dumpdata.com>

Konrad confirmed his acks on IRC and Wei acked my two extra patches.

So I have sorted out the acks in the commit messages and pushed this
series to staging.

Thanks everyone!

Regards,
Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk
  2014-09-26 14:14   ` Ian Jackson
@ 2014-09-26 14:20     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 24+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 14:20 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, laijs, wency, yunhong.jiang, eddie.dong, xen-devel,
	rshriram, Yang Hongyang, ian.campbell

On Fri, Sep 26, 2014 at 03:14:07PM +0100, Ian Jackson wrote:
> Ian Jackson writes ("Re: [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk"):
> > These are lacking Konrad's review/ack in the commit itself and in the
> > v21 resend but I think Konrad intended to ack them and OK them for the
> > release in his mail:
> >     Date: Thu, 25 Sep 2014 15:28:31 -0400
> >     Message-ID: <20140925192831.GJ29663@laptop.dumpdata.com>
> 
> Konrad confirmed his acks on IRC and Wei acked my two extra patches.
> 
> So I have sorted out the acks in the commit messages and pushed this
> series to staging.

Woot!
> 
> Thanks everyone!

:-)
> 
> Regards,
> Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support
  2014-09-26  6:13 ` [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
@ 2014-10-06 14:48   ` Ian Campbell
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Campbell @ 2014-10-06 14:48 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: laijs, wency, ian.jackson, yunhong.jiang, eddie.dong, xen-devel,
	rshriram

On Fri, 2014-09-26 at 14:13 +0800, Yang Hongyang wrote:
> NOTE: This patch changes tools/configure.ac, please rerun
>       autogen.sh while applying the patch.

I'm going to commit the following as part of my next batch of
committery.


commit c2203e410b22985b925514b80d0e85a76c0d2ac5
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Mon Oct 6 15:47:17 2014 +0100

    autoconf: autogen.sh after 622e837570f4 "autoconf: add libnl3 dependency ..."
    
    It appears this was forgotten.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff --git a/tools/configure b/tools/configure
index 78bcb6b..fe44b4e 100755
--- a/tools/configure
+++ b/tools/configure
@@ -629,6 +629,9 @@ SYSTEMD_CFLAGS
 SYSTEMD_MODULES_LOAD
 SYSTEMD_DIR
 systemd
+remus_netbuf
+LIBNL3_LIBS
+LIBNL3_CFLAGS
 libiconv
 PTYFUNCS_LIBS
 PTHREAD_LIBS
@@ -829,6 +832,8 @@ PKG_CONFIG_PATH
 PKG_CONFIG_LIBDIR
 glib_CFLAGS
 glib_LIBS
+LIBNL3_CFLAGS
+LIBNL3_LIBS
 SYSTEMD_CFLAGS
 SYSTEMD_LIBS'
 
@@ -1528,6 +1533,9 @@ Some influential environment variables:
               path overriding pkg-config's built-in search path
   glib_CFLAGS C compiler flags for glib, overriding pkg-config
   glib_LIBS   linker flags for glib, overriding pkg-config
+  LIBNL3_CFLAGS
+              C compiler flags for LIBNL3, overriding pkg-config
+  LIBNL3_LIBS linker flags for LIBNL3, overriding pkg-config
   SYSTEMD_CFLAGS
               C compiler flags for SYSTEMD, overriding pkg-config
   SYSTEMD_LIBS
@@ -8403,6 +8411,100 @@ fi
 done
 
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for LIBNL3" >&5
+$as_echo_n "checking for LIBNL3... " >&6; }
+
+if test -n "$LIBNL3_CFLAGS"; then
+    pkg_cv_LIBNL3_CFLAGS="$LIBNL3_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_LIBNL3_CFLAGS=`$PKG_CONFIG --cflags "libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+if test -n "$LIBNL3_LIBS"; then
+    pkg_cv_LIBNL3_LIBS="$LIBNL3_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_LIBNL3_LIBS=`$PKG_CONFIG --libs "libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+   	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+        _pkg_short_errors_supported=yes
+else
+        _pkg_short_errors_supported=no
+fi
+        if test $_pkg_short_errors_supported = yes; then
+	        LIBNL3_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8" 2>&1`
+        else
+	        LIBNL3_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8" 2>&1`
+        fi
+	# Put the nasty error message in config.log where it belongs
+	echo "$LIBNL3_PKG_ERRORS" >&5
+
+	libnl3_lib="n"
+elif test $pkg_failed = untried; then
+     	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+	libnl3_lib="n"
+else
+	LIBNL3_CFLAGS=$pkg_cv_LIBNL3_CFLAGS
+	LIBNL3_LIBS=$pkg_cv_LIBNL3_LIBS
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+	libnl3_lib="y"
+fi
+
+if test "x$libnl3_lib" = "xn" ; then :
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher" >&5
+$as_echo "$as_me: WARNING: Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher" >&2;}
+    remus_netbuf=n
+
+
+else
+
+    remus_netbuf=y
+
+
+fi
+
+
+
+
 fi # ! $rump

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-10-06 14:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant Yang Hongyang
2014-09-26 13:56   ` Wei Liu
2014-09-26  6:13 ` [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback Yang Hongyang
2014-09-26 13:58   ` Wei Liu
2014-09-26  6:13 ` [PATCH for-4.5 v21 03/14] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 04/14] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
2014-10-06 14:48   ` Ian Campbell
2014-09-26  6:13 ` [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
2014-09-26 12:59   ` Ian Jackson
2014-09-26  6:13 ` [PATCH for-4.5 v21 07/14] libxl/remus: setup and control network output buffering Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 08/14] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool Yang Hongyang
2014-09-26 12:57   ` Ian Jackson
2014-09-26  6:13 ` [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
2014-09-26 12:57   ` Ian Jackson
2014-09-26  6:13 ` [PATCH for-4.5 v21 11/14] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 12/14] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 13/14] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 14/14] MAINTAINERS: update maintained files of Remus Yang Hongyang
2014-09-26 13:10 ` [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Ian Jackson
2014-09-26 14:14   ` Ian Jackson
2014-09-26 14:20     ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.