All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk
@ 2014-09-25  6:16 Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 01/12] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This patch series adds support for network buffering and drbd disk
in the Remus codebase in libxl.

the code is also hosted on github:
url: https://github.com/macrosheep/xen/tree/remus-v20

Changes in v20:
  Rebased.

Changes in v19:
  Use defbool for cmdline switch.
  Restruct of subkind init and cleanup operation.
  Use libxl__device_kind instead of libxl__remus_device_kind
  Fix a layer violation issue pointed out by IanJ.
  Other minor fixes.
  Rebased to the latest staging tree.

Changes in v18:
  Merge match() and setup() api.
  Reuse libxl__multidev and libxl__ao_device.
  Commit messages and code comments improved. Thanks to Shriram.
  Rebased.

Changes in v17:
  Make remus device abstract layer more generic.
  Addressed Ian J's comments.

Changes in v16:
  Merge libxl__remus_state and libxl__remus_device_state.
  Pass the ops to device abstract layer instead of defined it in the layer.
  Optimized subkind ops APIs.
  Addressed Ian J's comments.
  Rebased.

Changes in v15:
  The first patch in v14 has been taken, so remove it from the patchset.
  Add a patch to Update maintained files of REMUS.
  Rebased.

Changes in v14:
  Addressed IanJ's comments.
  Rebased.

Changes in v13:
  Addressed Konrad's comments.
  Rebased.

Changes in v12:
  Add disk buffering cmdline switch.

Changes in v11:
  Addressed comments from Ian J and Shriram.
  Add drbd disk implement into this patch series.

Changes in V10:
  Restructured the whole patch series.
  Introduce the remus device abstract layer.
  Make remus checkpoint asynchronous.

Changes in V9:
  Use async exec script api to exec scripts.

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks,
Yang.

Legend:
  A - acked
  D - previous acked, but new change introduced so acked-by dropped
  M - Modified
  S - the same version as last round
  No marker - new patch

Yang Hongyang (12):
  A libxl: introduce libxl__multidev_prepare_with_aodev
  A libxl: Extend libxl__ao_device with a libxl__ev_child member
  A autoconf: add libnl3 dependency for Remus network buffering support
  S libxl/remus: introduce an abstract Remus device layer
  A libxl/remus: setup and control network output buffering
  A libxl/remus: setup and control disk replication for DRBD backends
  S xl/remus: change bool to defbool
  S xl/remus: cmdline switch to explicitly enable unsafe configurations
  A xl/remus: cmdline switches and config vars to control network
      buffering
  A xl/remus: add a cmdline switch to disable disk replication
  A libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
  S MAINTAINERS: update maintained files of Remus

 MAINTAINERS                            |   7 +
 README                                 |   4 +
 config/Tools.mk.in                     |   4 +
 docs/README.remus                      |  16 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  30 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  16 +
 tools/hotplug/Linux/Makefile           |   2 +
 tools/hotplug/Linux/block-drbd-probe   |  87 ++++++
 tools/hotplug/Linux/remus-netbuf-setup | 230 +++++++++++++++
 tools/libxl/Makefile                   |  15 +
 tools/libxl/libxl.c                    |  75 ++++-
 tools/libxl/libxl.h                    |   6 +
 tools/libxl/libxl_device.c             |  14 +-
 tools/libxl/libxl_dom.c                | 170 ++++++++++-
 tools/libxl/libxl_internal.h           | 195 ++++++++++++-
 tools/libxl/libxl_netbuffer.c          | 517 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  54 ++++
 tools/libxl/libxl_remus_device.c       | 296 +++++++++++++++++++
 tools/libxl/libxl_remus_disk_drbd.c    | 257 ++++++++++++++++
 tools/libxl/libxl_types.idl            |  10 +-
 tools/libxl/libxl_types_internal.idl   |   2 +
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  42 ++-
 tools/libxl/xl_cmdtable.c              |  11 +-
 27 files changed, 2030 insertions(+), 45 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus_device.c
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 01/12] libxl: introduce libxl__multidev_prepare_with_aodev
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 02/12] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

libxl__multidev_prepare_with_aodev is similar to libxl__multidev_prepare,
but takes a libxl__ao_device as an extra argument.
libxl__multidev_prepare is now a wrapper around
libxl__multidev_prepare_with_aodev.

This new internal API will be used by the Remus device abstract layer
for handling various Remus devices.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_device.c   | 13 ++++++++++---
 tools/libxl/libxl_internal.h | 14 +++++++++++---
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 3425446..903cd41 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -481,11 +481,10 @@ void libxl__multidev_begin(libxl__ao *ao, libxl__multidev *multidev)
 
 static void multidev_one_callback(libxl__egc *egc, libxl__ao_device *aodev);
 
-libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
+void libxl__multidev_prepare_with_aodev(libxl__multidev *multidev,
+                                        libxl__ao_device *aodev) {
     STATE_AO_GC(multidev->ao);
-    libxl__ao_device *aodev;
 
-    GCNEW(aodev);
     aodev->multidev = multidev;
     aodev->callback = multidev_one_callback;
     libxl__prepare_ao_device(ao, aodev);
@@ -495,6 +494,14 @@ libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
         GCREALLOC_ARRAY(multidev->array, multidev->allocd);
     }
     multidev->array[multidev->used++] = aodev;
+}
+
+libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
+    STATE_AO_GC(multidev->ao);
+    libxl__ao_device *aodev;
+
+    GCNEW(aodev);
+    libxl__multidev_prepare_with_aodev(multidev, aodev);
 
     return aodev;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f61673c..e931334 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2170,9 +2170,17 @@ struct libxl__ao_device {
 /* Starts preparing to add/remove a bunch of devices. */
 _hidden void libxl__multidev_begin(libxl__ao *ao, libxl__multidev*);
 
-/* Prepares to add/remove one of many devices.  Returns a libxl__ao_device
- * which has had libxl__prepare_ao_device called, and which has also
- * had ->callback set.  The user should not mess with aodev->callback. */
+/* Prepares to add/remove one of many devices.
+ * Calls libxl__prepare_ao_device on libxl__ao_device argument provided and
+ * also sets the ->callback. The user should not mess with aodev->callback.
+ */
+_hidden void libxl__multidev_prepare_with_aodev(libxl__multidev*,
+                                                libxl__ao_device*);
+
+/* A wrapper function around libxl__multidev_prepare_with_aodev.
+ * Allocates a libxl__ao_device and prepares it for addition/removal.
+ * Returns the newly allocated libxl__ao_dev.
+ */
 _hidden libxl__ao_device *libxl__multidev_prepare(libxl__multidev*);
 
 /* Notifies the multidev machinery that we have now finished preparing
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 02/12] libxl: Extend libxl__ao_device with a libxl__ev_child member
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 01/12] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 03/12] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This can be used to fork children to allow the asynchronous execution
of system calls which only come in a synchronous variant. This will
be useful for Remus, in the following patches.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_device.c   | 1 +
 tools/libxl/libxl_internal.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 903cd41..50647d2 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -453,6 +453,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
     /* We init this here because we might call device_hotplug_done
      * without actually calling any hotplug script */
     libxl__async_exec_init(&aodev->aes);
+    libxl__ev_child_init(&aodev->child);
 }
 
 /* multidev */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e931334..add411b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2150,6 +2150,8 @@ struct libxl__ao_device {
     libxl__async_exec_state aes;
     /* If we need to update JSON config */
     bool update_json;
+    /* for asynchronous execution of synchronous-only syscalls etc. */
+    libxl__ev_child child;
 };
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 03/12] autoconf: add libnl3 dependency for Remus network buffering support
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 01/12] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 02/12] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 04/12] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.

When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.

NOTE: This patch changes tools/configure.ac, please rerun
      autogen.sh while applying the patch.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 README                          |  4 ++++
 config/Tools.mk.in              |  4 ++++
 docs/README.remus               |  6 ++++++
 tools/configure.ac              | 16 ++++++++++++++++
 tools/libxl/Makefile            | 13 +++++++++++++
 tools/libxl/libxl_internal.h    |  1 +
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 8 files changed, 106 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/README b/README
index 81bf938..78c5db2 100644
--- a/README
+++ b/README
@@ -73,6 +73,10 @@ disabled at compile time:
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
     * systemd daemon development files
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 974e28e..a69b846 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -43,6 +43,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
+
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -62,6 +65,7 @@ CONFIG_BLKTAP1      := @blktap1@
 CONFIG_BLKTAP2      := @blktap2@
 CONFIG_VTPM         := @vtpm@
 CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 CONFIG_SYSTEMD      := @systemd@
 SYSTEMD_CFLAGS      := @SYSTEMD_CFLAGS@
diff --git a/docs/README.remus b/docs/README.remus
index 9fa00fe..ddf5b55 100644
--- a/docs/README.remus
+++ b/docs/README.remus
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://wiki.xen.org/wiki/Remus for details.
+
+Using Remus with libxl on Xen 4.5 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
diff --git a/tools/configure.ac b/tools/configure.ac
index 4f45418..cfa4dd6 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -320,6 +320,22 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+    [libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+    AC_MSG_WARN([Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher])
+    AC_SUBST(remus_netbuf, [n])
+    ],[
+    AC_SUBST(remus_netbuf, [y])
+])
+
+AC_SUBST(LIBNL3_LIBS)
+AC_SUBST(LIBNL3_CFLAGS)
+
 fi # ! $rump
 
 AX_AVAILABLE_SYSTEMD()
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 9d67d0b..4605124 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,17 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
@@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index add411b..229f63b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2573,6 +2573,7 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
 
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..52d593c
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..1c72a7f
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 04/12] libxl/remus: introduce an abstract Remus device layer
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 03/12] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 05/12] libxl/remus: setup and control network output buffering Yang Hongyang
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.

The following API are exposed to libxl:

One-time configuration operations:
  *libxl__remus_devices_setup
    > Enable output buffering for NICs, setup disk replication, etc.
  *libxl__remus_devices_teardown
    > Disable network output buffering and disk replication;
      teardown any associated external setups like qdiscs for NICs.

Operations executed every checkpoint (in order of invocation):
  *libxl__remus_devices_postsuspend
  *libxl__remus_devices_preresume
  *libxl__remus_devices_commit

Each device type needs to implement the interfaces specified in
the libxl__remus_device_instance_ops if it wishes to support Remus.

The high-level control flow through the Remus device layer is shown below:

xl remus
  |->  libxl_domain_remus_start
    |-> libxl__remus_devices_setup
      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
        ...
        |-> On backup failure/network error/other errors
            libxl__remus_devices_teardown

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>

For comments:
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 tools/libxl/Makefile                 |   2 +
 tools/libxl/libxl.c                  |  47 +++++-
 tools/libxl/libxl_dom.c              | 168 +++++++++++++++++++--
 tools/libxl/libxl_internal.h         | 163 +++++++++++++++++++++
 tools/libxl/libxl_remus_device.c     | 273 +++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl          |   2 +
 tools/libxl/libxl_types_internal.idl |   2 +
 7 files changed, 640 insertions(+), 17 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 4605124..28b8616 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,6 +56,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus_device.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3735f55..728fdec 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -782,6 +782,10 @@ out:
     return ptr;
 }
 
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds, int rc);
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_devices_state *rds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
@@ -813,16 +817,51 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->ao = ao;
+    rds->egc = egc;
+    rds->domid = domid;
+    rds->callback = libxl__remus_setup_done;
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_devices_setup(egc, rds);
     return AO_INPROGRESS;
 
  out:
     return AO_ABORT(rc);
 }
 
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
+        dss->domid, rc);
+    rds->callback = libxl__remus_setup_failed;
+    libxl__remus_devices_teardown(egc, rds);
+}
+
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_devices_state *rds, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", dss->domid, rc);
+
+    dss->callback(egc, dss, rc);
+}
+
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc)
 {
@@ -832,10 +871,6 @@ static void remus_failover_cb(libxl__egc *egc,
      * backup died or some network error occurred preventing us
      * from sending checkpoints.
      */
-
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bd21841..e9d29b5 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -920,8 +920,6 @@ static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok);
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
 
 /*----- complicated callback, called by xc_domain_save -----*/
 
@@ -1583,6 +1581,14 @@ static void domain_suspend_callback_common_done(libxl__egc *egc,
 }
 
 /*----- remus callbacks -----*/
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss, int ok);
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc);
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc);
 
 static void libxl__remus_domain_suspend_callback(void *data)
 {
@@ -1597,32 +1603,77 @@ static void libxl__remus_domain_suspend_callback(void *data)
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
+    if (!ok)
+        goto out;
+
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_postsuspend_cb;
+    libxl__remus_devices_postsuspend(egc, rds);
+    return;
+
+out:
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-static void libxl__remus_domain_resume_callback(void *data)
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds,
+                                         int rc)
 {
     int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    if (rc)
+        goto out;
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void libxl__remus_domain_resume_callback(void *data)
+{
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
+    libxl__remus_devices_state *const rds = &dss->rds;
+    rds->callback = remus_devices_preresume_cb;
+    libxl__remus_devices_preresume(egc, rds);
+}
+
+static void remus_devices_preresume_cb(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        goto out;
+
     /* Resumes the domain and the device model */
-    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+    rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
+    if (rc)
         goto out;
 
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
     ok = 1;
+
 out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
 
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1642,10 +1693,73 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds = &dss->rds;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rds->callback = remus_devices_commit_cb;
+    libxl__remus_devices_commit(egc, rds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_devices_commit_cb(libxl__egc *egc,
+                                    libxl__remus_devices_state *rds,
+                                    int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    }
+
+    /*
+     * At this point, we have successfully checkpointed the guest and
+     * committed it at the backup. We'll come back after the checkpoint
+     * interval to checkpoint the guest again. Until then, let the guest
+     * continue execution.
+     */
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+                                     remus_next_checkpoint,
+                                     dss->interval);
+
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__domain_suspend_state *dss =
+                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+
+    STATE_AO_GC(dss->ao);
+
+    /*
+     * Time to checkpoint the guest again. We return 1 to libxc
+     * (xc_domain_save.c). in order to continue executing the infinite loop
+     * (suspend, checkpoint, resume) in xc_domain_save().
+     */
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1860,6 +1974,10 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
+static void remus_teardown_done(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc);
+
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
@@ -1874,6 +1992,34 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (!dss->remus) {
+        remus_teardown_done(egc, &dss->rds, rc);
+        return;
+    }
+
+    /*
+     * With Remus, if we reach this point, it means either
+     * backup died or some network error occurred preventing us
+     * from sending checkpoints. Teardown the network buffers and
+     * release netlink resources.  This is an async op.
+     */
+    LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
+        " teardown Remus devices...", rc);
+    dss->rds.callback = remus_teardown_done;
+    libxl__remus_devices_teardown(egc, &dss->rds);
+}
+
+static void remus_teardown_done(libxl__egc *egc,
+                                       libxl__remus_devices_state *rds,
+                                       int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+    STATE_AO_GC(dss->ao);
+
+    if (rc)
+        LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
+            " rc %d", dss->domid, rc);
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 229f63b..7be373f 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2573,6 +2573,167 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+/*----- remus device related state structure -----*/
+/*
+ * The abstract Remus device layer exposes a common
+ * set of API to [external] libxl for manipulating devices attached to
+ * a guest protected by Remus. The device layer also exposes a set of
+ * [internal] interfaces that every device type must implement.
+ *
+ * The following API are exposed to libxl:
+ *
+ * One-time configuration operations:
+ *  +libxl__remus_devices_setup
+ *    > Enable output buffering for NICs, setup disk replication, etc.
+ *  +libxl__remus_devices_teardown
+ *    > Disable output buffering and disk replication; teardown any
+ *       associated external setups like qdiscs for NICs.
+ *
+ * Operations executed every checkpoint (in order of invocation):
+ *  +libxl__remus_devices_postsuspend
+ *  +libxl__remus_devices_preresume
+ *  +libxl__remus_devices_commit
+ *
+ * Each device type needs to implement the interfaces specified in
+ * the libxl__remus_device_instance_ops if it wishes to support Remus.
+ *
+ * The high-level control flow through the Remus device layer is shown below:
+ *
+ * xl remus
+ *  |->  libxl_domain_remus_start
+ *    |-> libxl__remus_devices_setup
+ *      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
+ *        ...
+ *        |-> On backup failure, network error or other internal errors:
+ *            libxl__remus_devices_teardown
+ */
+
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_devices_state libxl__remus_devices_state;
+typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops;
+
+/*
+ * Interfaces to be implemented by every device subkind that wishes to
+ * support Remus. Functions must be implemented unless otherwise
+ * stated. Many of these functions are asynchronous. They call
+ * dev->aodev.callback when done.  The actual implementations may be
+ * synchronous and call dev->aodev.callback directly (as the last
+ * thing they do).
+ */
+struct libxl__remus_device_instance_ops {
+    /* the device kind this ops belongs to... */
+    libxl__device_kind kind;
+
+    /*
+     * Checkpoint operations. May be NULL, meaning the op is not
+     * implemented and the caller should treat them as a no-op (and do
+     * nothing when checkpointing).
+     * Asynchronous.
+     */
+
+    void (*postsuspend)(libxl__remus_device *dev);
+    void (*preresume)(libxl__remus_device *dev);
+    void (*commit)(libxl__remus_device *dev);
+
+    /*
+     * setup() and teardown() are refer to the actual remus device.
+     * Asynchronous.
+     * teardown is called even if setup fails.
+     */
+    /*
+     * setup() should first determines whether the subkind matches the specific
+     * device. If matched, the device will then be managed with this set of
+     * subkind operations.
+     * Yields 0 if the device successfully set up.
+     * REMUS_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
+     * any other rc indicates failure.
+     */
+    void (*setup)(libxl__remus_device *dev);
+    void (*teardown)(libxl__remus_device *dev);
+};
+
+typedef void libxl__remus_callback(libxl__egc *,
+                                   libxl__remus_devices_state *, int rc);
+
+/*
+ * State associated with a remus invocation, including parameters
+ * passed to the remus abstract device layer by the remus
+ * save/restore machinery.
+ */
+struct libxl__remus_devices_state {
+    /*---- must be set by caller of libxl__remus_device_(setup|teardown) ----*/
+
+    libxl__ao *ao;
+    libxl__egc *egc;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+    int device_kind_flags;
+
+    /*----- private for abstract layer only -----*/
+
+    int num_devices;
+    /*
+     * this array is allocated before setup the remus devices by the
+     * remus abstract layer.
+     * devs may be NULL, means there's no remus devices that has been set up.
+     * the size of this array is 'num_devices', which is the total number
+     * of libxl nic devices and disk devices(num_nics + num_disks).
+     */
+    libxl__remus_device **devs;
+
+    libxl_device_nic *nics;
+    int num_nics;
+    libxl_device_disk *disks;
+    int num_disks;
+
+    libxl__multidev multidev;
+};
+
+/*
+ * Information about a single device being handled by remus.
+ * Allocated by the remus abstract layer.
+ */
+struct libxl__remus_device {
+    /*----- shared between abstract and concrete layers -----*/
+    /*
+     * if this is true, that means the subkind ops match the device
+     */
+    bool matched;
+
+    /*----- set by remus device abstruct layer -----*/
+    /* libxl__device_* which this remus device related to */
+    const void *backend_dev;
+    libxl__device_kind kind;
+    libxl__remus_devices_state *rds;
+    libxl__ao_device aodev;
+
+    /*----- private for abstract layer only -----*/
+
+    /*
+     * Control and state variables for the asynchronous callback
+     * based loops which iterate over device subkinds, and over
+     * individual devices.
+     */
+    int ops_index;
+    const libxl__remus_device_instance_ops *ops;
+
+    /*----- private for concrete (device-specific) layer -----*/
+
+    /* concrete device's private data */
+    void *concrete_data;
+};
+
+/* the following 5 APIs are async ops, call rds->callback when done */
+_hidden void libxl__remus_devices_setup(libxl__egc *egc,
+                                        libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
+                                           libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
+                                              libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
+                                            libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_commit(libxl__egc *egc,
+                                         libxl__remus_devices_state *rds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2613,6 +2774,8 @@ struct libxl__domain_suspend_state {
     libxl__ev_xswatch guest_watch;
     libxl__ev_time guest_timeout;
     const char *dm_savefile;
+    libxl__remus_devices_state rds;
+    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..be87f1e
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,273 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static const libxl__remus_device_instance_ops *remus_ops[] = {
+    NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__remus_devices_state *rds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    return 0;
+}
+
+static void cleanup_device_subkind(libxl__remus_devices_state *rds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+}
+
+/*----- setup() and teardown() -----*/
+
+/* callbacks */
+
+static void devices_setup_cb(libxl__egc *egc,
+                             libxl__multidev *multidev,
+                             int rc);
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc);
+
+/* remus device setup and teardown */
+
+static libxl__remus_device* remus_device_init(libxl__egc *egc,
+                                              libxl__remus_devices_state *rds,
+                                              libxl__device_kind kind,
+                                              void *libxl_dev)
+{
+    libxl__remus_device *dev = NULL;
+
+    STATE_AO_GC(rds->ao);
+    GCNEW(dev);
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->rds = rds;
+    dev->ops_index = -1;
+
+    return dev;
+}
+
+static void remus_devices_setup(libxl__egc *egc,
+                                libxl__remus_devices_state *rds);
+
+void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_devices_state *rds)
+{
+    int i, rc;
+
+    STATE_AO_GC(rds->ao);
+
+    rc = init_device_subkind(rds);
+    if (rc)
+        goto out;
+
+    rds->num_devices = 0;
+    rds->num_nics = 0;
+    rds->num_disks = 0;
+
+    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_REMUS_NIC))
+        rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
+
+    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_REMUS_DISK))
+        rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
+
+    if (rds->num_nics == 0 && rds->num_disks == 0)
+        goto out;
+
+    GCNEW_ARRAY(rds->devs, rds->num_nics + rds->num_disks);
+
+    for (i = 0; i < rds->num_nics; i++) {
+        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
+                                                LIBXL__DEVICE_KIND_REMUS_NIC,
+                                                &rds->nics[i]);
+    }
+
+    for (i = 0; i < rds->num_disks; i++) {
+        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
+                                                LIBXL__DEVICE_KIND_REMUS_DISK,
+                                                &rds->disks[i]);
+    }
+
+    remus_devices_setup(egc, rds);
+
+    return;
+
+out:
+    rds->callback(egc, rds, rc);
+}
+
+static void remus_devices_setup(libxl__egc *egc,
+                                libxl__remus_devices_state *rds)
+{
+    int i, rc;
+    libxl__remus_device *dev;
+
+    STATE_AO_GC(rds->ao);
+
+    libxl__multidev_begin(ao, &rds->multidev);
+    rds->multidev.callback = devices_setup_cb;
+    for (i = 0; i < rds->num_devices; i++) {
+        dev = rds->devs[i];
+        if (dev->matched)
+            continue;
+
+        /* find avaliable ops */
+        do {
+            dev->ops = remus_ops[++dev->ops_index];
+            if (!dev->ops) {
+                rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+                goto out;
+            }
+        } while (dev->ops->kind != dev->kind);
+
+        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+        dev->ops->setup(dev);
+    }
+
+    rc = 0;
+out:
+    libxl__multidev_prepared(egc, &rds->multidev, rc);
+}
+
+static void devices_setup_cb(libxl__egc *egc,
+                             libxl__multidev *multidev,
+                             int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds =
+                            CONTAINER_OF(multidev, *rds, multidev);
+
+    /*
+     * if the error is ERROR_REMUS_DEVOPS_DOES_NOT_MATCH, begin next iter
+     * if there are devices that can't be set up, the rc will become
+     * ERROR_FAIL or ERROR_REMUS_DEVICE_NOT_SUPPORTED at last anyway.
+     */
+    if (rc == ERROR_REMUS_DEVOPS_DOES_NOT_MATCH) {
+        remus_devices_setup(egc, rds);
+        return;
+    }
+
+    rds->callback(egc, rds, rc);
+}
+
+void libxl__remus_devices_teardown(libxl__egc *egc,
+                                   libxl__remus_devices_state *rds)
+{
+    int i;
+    libxl__remus_device *dev;
+
+    STATE_AO_GC(rds->ao);
+
+    libxl__multidev_begin(ao, &rds->multidev);
+    rds->multidev.callback = devices_teardown_cb;
+    for (i = 0; i < rds->num_devices; i++) {
+        dev = rds->devs[i];
+        if (!dev->ops || !dev->matched)
+            continue;
+
+        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+        dev->ops->teardown(dev);
+    }
+
+    libxl__multidev_prepared(egc, &rds->multidev, 0);
+}
+
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc)
+{
+    int i;
+
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds =
+                            CONTAINER_OF(multidev, *rds, multidev);
+
+    /* clean nic */
+    for (i = 0; i < rds->num_nics; i++)
+        libxl_device_nic_dispose(&rds->nics[i]);
+    free(rds->nics);
+    rds->nics = NULL;
+    rds->num_nics = 0;
+
+    /* clean disk */
+    for (i = 0; i < rds->num_disks; i++)
+        libxl_device_disk_dispose(&rds->disks[i]);
+    free(rds->disks);
+    rds->disks = NULL;
+    rds->num_disks = 0;
+
+    cleanup_device_subkind(rds);
+
+    rds->callback(egc, rds, rc);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc);
+
+/* API implementations */
+
+#define define_remus_checkpoint_api(api)                                \
+void libxl__remus_devices_##api(libxl__egc *egc,                        \
+                                libxl__remus_devices_state *rds)        \
+{                                                                       \
+    int i;                                                              \
+    libxl__remus_device *dev;                                           \
+                                                                        \
+    STATE_AO_GC(rds->ao);                                               \
+                                                                        \
+    libxl__multidev_begin(ao, &rds->multidev);                          \
+    rds->multidev.callback = devices_checkpoint_cb;                     \
+    for (i = 0; i < rds->num_devices; i++) {                            \
+        dev = rds->devs[i];                                             \
+        if (!dev->matched || !dev->ops->api)                            \
+            continue;                                                   \
+        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);\
+        dev->ops->api(dev);                                             \
+    }                                                                   \
+                                                                        \
+    libxl__multidev_prepared(egc, &rds->multidev, 0);                   \
+}
+
+define_remus_checkpoint_api(postsuspend);
+
+define_remus_checkpoint_api(preresume);
+
+define_remus_checkpoint_api(commit);
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__remus_devices_state *const rds =
+                            CONTAINER_OF(multidev, *rds, multidev);
+
+    rds->callback(egc, rds, rc);
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 7f9e7c7..da4c52d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -61,6 +61,8 @@ libxl_error = Enumeration("error", [
     (-15, "LOCK_FAIL"),
     (-16, "JSON_CONFIG_EMPTY"),
     (-17, "DEVICE_EXISTS"),
+    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
+    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
diff --git a/tools/libxl/libxl_types_internal.idl b/tools/libxl/libxl_types_internal.idl
index 800361b..720232e 100644
--- a/tools/libxl/libxl_types_internal.idl
+++ b/tools/libxl/libxl_types_internal.idl
@@ -22,6 +22,8 @@ libxl__device_kind = Enumeration("device_kind", [
     (6, "VKBD"),
     (7, "CONSOLE"),
     (8, "VTPM"),
+    (9, "REMUS_NIC"),
+    (10, "REMUS_DISK"),
     ])
 
 libxl__console_backend = Enumeration("console_backend", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 05/12] libxl/remus: setup and control network output buffering
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 04/12] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 06/12] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:

1. Hotplug scripts: The remus-netbuf-setup script is responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering.  This script should be invoked by libxl for
  each of the guest's network interfaces, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the REMUS_IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

2. Remus network device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
      invocation. They establish and free netlink related state respectively.

   b) setup() and teardown are called for each vif attached to the
      guest.
      During setup():
      i) The hotplug script is called to setup a network buffer on a
         given vif. The script chooses an available IFB device from
         the system, redirects vif egress traffic to the IFB device
         and sets up the plug qdisc (output buffer) on the IFB device.
         The name of the IFB device is communicated via xenstore to
         libxl.

      ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
          and subsequently controls output buffering using this handle
          in the checkpoint callbacks.

      During teardown(), the hotplug scripts are called again to remove
      the vif->ifb traffic redirection, release the ifb and the plug
      qdisc associated with it.

   c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
      are implemented as synchronous ops as the netlink calls associated
      with the qdisc subsystem are very fast.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 230 ++++++++++++++++
 tools/libxl/libxl.c                    |   7 +
 tools/libxl/libxl_internal.h           |  10 +
 tools/libxl/libxl_netbuffer.c          | 481 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  23 ++
 tools/libxl/libxl_remus_device.c       |  18 +-
 8 files changed, 773 insertions(+), 1 deletion(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index ea67536..d94ea9d 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -393,6 +393,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index d5a9ed2..31e57f7 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..87dfa69
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,230 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the REMUS_IFB device details will be
+#             stored or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# REMUS_IFB   ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the REMUS_IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+
+# Remus network buffering requirements:
+
+# We need to buffer (queue) egress traffic from every vif attached to
+# the guest and release the buffers when the checkpoint associated
+# with them has been committed at the backup host. We achieve this
+# with the help of the plug queuing discipline (sch_plug module).
+# Simply put, Remus' network buffering imposes traffic
+# shaping on the guest's vif(s).
+
+# Limitations and Workarounds:
+
+# Egress traffic from a vif appears as ingress traffic to dom0. Linux
+# supports policing (dropping packets) but not traffic shaping
+# (queuing packets) on ingress traffic. The standard workaround to
+# this limitation is to attach an ingress qdisc to the guest vif,
+# redirect all egress traffic from the guest to an intermediate
+# queuing interface, and apply egress rules to it. The IFB
+# (Intermediate Functional Block) device serves the purpose of an
+# intermediate queuing interface.
+#
+
+# The following commands install a network buffer on a
+# guest's vif (vif1.0) using an IFB device (ifb0):
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+# Note:
+# 1. If the setup process fails, the script's cleanup is limited to removing the
+#    ingress qdisc on the guest vif, so that its traffic can flow normally.
+#    The chosen ifb device is not torn down. Libxl has to execute the
+#    teardown op to remove other qdiscs and subsequently free the IFB device.
+#
+# 2. The teardown op may be invoked multiple times by libxl.
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+#return 0 if the ifb is free
+check_ifb() {
+    local installed=`nl-qdisc-list -d $1`
+    [ -n "$installed" ] && return 1
+
+    for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+    do
+        [ $domid -eq 0 ] && continue
+        xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+        for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true`
+        do
+            local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+            xenstore-exists $path || continue
+            local ifb=`xenstore-read "$path" 2>/dev/null || true`
+            [ "$ifb" = "$1" ] && return 1
+        done
+    done
+
+    return 0
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        check_ifb "$ifb" || continue
+        REMUS_IFB="$ifb"
+        break
+    done
+
+    if [ -z "$REMUS_IFB" ]
+    then
+        fatal "Unable to find a free ifb device for $vifname"
+    fi
+
+    #not using xenstore_write that automatically exits on error
+    #because we need to cleanup
+    xenstore_write "$XENBUS_PATH/ifb" "$REMUS_IFB"
+    do_or_die ip link set dev "$REMUS_IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    #Check if the XENBUS_PATH/ifb exists and has IFB name same as REMUS_IFB.
+    #Otherwise, if the teardown op is called multiple times, then we may end
+    #up freeing another domain's allocated IFB inside the if loop.
+    xenstore-exists "$XENBUS_PATH/ifb" && \
+        local ifb2=`xenstore-read "$XENBUS_PATH/ifb" 2>/dev/null || true`
+
+    if [[ "$ifb2" && "$ifb2" == "$ifb" ]]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$REMUS_IFB"
+        add_plug_qdisc "$vifname" "$REMUS_IFB"
+        release_lock "pickifb"
+
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$REMUS_IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $REMUS_IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 728fdec..d4d39f7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -819,6 +819,13 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
+
+    if (!libxl__netbuffer_enabled(gc)) {
+        LOG(ERROR, "Remus: No support for network buffering");
+        goto out;
+    }
+    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_NIC);
+
     rds->ao = ao;
     rds->egc = egc;
     rds->domid = domid;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7be373f..035792c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2652,6 +2652,9 @@ struct libxl__remus_device_instance_ops {
     void (*teardown)(libxl__remus_device *dev);
 };
 
+int init_subkind_nic(libxl__remus_devices_state *rds);
+void cleanup_subkind_nic(libxl__remus_devices_state *rds);
+
 typedef void libxl__remus_callback(libxl__egc *,
                                    libxl__remus_devices_state *, int rc);
 
@@ -2687,6 +2690,13 @@ struct libxl__remus_devices_state {
     int num_disks;
 
     libxl__multidev multidev;
+
+    /*----- private for concrete (device-specific) layer only -----*/
+
+    /* private for nic device subkind ops */
+    char *netbufscript;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
 };
 
 /*
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 52d593c..faaa9a3 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,492 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_device_nic {
+    int devid;
+
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+int init_subkind_nic(libxl__remus_devices_state *rds)
+{
+    int rc, ret;
+
+    STATE_AO_GC(rds->ao);
+
+    rds->nlsock = nl_socket_alloc();
+    if (!rds->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                  libxl__xen_script_dir_path());
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+{
+    STATE_AO_GC(rds->ao);
+
+    /* free qdisc cache */
+    if (rds->qdisc_cache) {
+        nl_cache_clear(rds->qdisc_cache);
+        nl_cache_free(rds->qdisc_cache);
+        rds->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (rds->nlsock) {
+        nl_close(rds->nlsock);
+        nl_socket_free(rds->nlsock);
+        rds->nlsock = NULL;
+    }
+}
+
+/*----- setup() and teardown() -----*/
+
+/* helper functions */
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->rds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_devices_state *rds,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int rc, ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Now that we have brought up REMUS_IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get a handle to the REMUS_IFB interface */
+    ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+    if (ret) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the REMUS_IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the REMUS_IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    if (rc && qdisc)
+        nl_object_put((struct nl_object *)qdisc);
+
+    return rc;
+}
+
+/* callbacks */
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status);
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status);
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $REMUS_IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__remus_device *dev, char *op)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+    libxl__remus_devices_state *rds = dev->rds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Convenience aliases */
+    char *const script = libxl__strdup(gc, rds->netbufscript);
+    const uint32_t domid = rds->domid;
+    const int dev_id = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "REMUS_IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->rds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = netbuf_teardown_script_cb;
+    else
+        aes->callback = netbuf_setup_script_cb;
+}
+
+/* setup() and teardown() */
+
+static void nic_setup(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices
+     */
+    dev->matched = true;
+
+    GCNEW(remus_nic);
+    dev->concrete_data = remus_nic;
+    remus_nic->devid = nic->devid;
+    remus_nic->vif = get_vifname(dev, nic);
+    if (!remus_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup");
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+/*
+ * In return, the script writes the name of REMUS_IFB device (during setup)
+ * to be used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+    libxl__remus_devices_state *rds = dev->rds;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = rds->domid;
+    const int devid = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    /*
+     * we need to get ifb first because it's needed for teardown
+     */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc)
+        goto out;
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            rds->netbufscript, vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(rds, remus_nic);
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void nic_teardown(libxl__remus_device *dev)
+{
+    int rc;
+    STATE_AO_GC(dev->rds->ao);
+
+    setup_async_exec(dev, "teardown");
+
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    free_qdisc(remus_nic);
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* The value of buffer_op, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+/* API implementations */
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+                           libxl__remus_devices_state *rds,
+                           int buffer_op)
+{
+    int rc, ret;
+
+    STATE_AO_GC(rds->ao);
+
+    if (buffer_op == tc_buffer_start)
+        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (ret) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    if (ret) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    if (rc)
+        LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+            ((buffer_op == tc_buffer_start) ?
+            "start_new_epoch" : "release_prev_epoch"),
+            remus_nic->ifb, nl_geterror(ret));
+    return rc;
+}
+
+static void nic_postsuspend(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+
+    dev->aodev.rc = rc;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+static void nic_commit(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+
+    dev->aodev.rc = rc;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_REMUS_NIC,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+    .postsuspend = nic_postsuspend,
+    .commit = nic_commit,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 1c72a7f..5ac32a4 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,29 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+int init_subkind_nic(libxl__remus_devices_state *rds)
+{
+    return 0;
+}
+
+void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+{
+    return;
+}
+
+static void nic_setup(libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    dev->aodev.rc = ERROR_FAIL;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_REMUS_NIC,
+    .setup = nic_setup,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index be87f1e..9edcc94 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -17,7 +17,9 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__remus_device_instance_ops remus_device_nic;
 static const libxl__remus_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
     NULL,
 };
 
@@ -26,12 +28,26 @@ static const libxl__remus_device_instance_ops *remus_ops[] = {
 static int init_device_subkind(libxl__remus_devices_state *rds)
 {
     /* init device subkind-specific state in the libxl ctx */
-    return 0;
+    int rc;
+    STATE_AO_GC(rds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(rds);
+        if (rc) goto out;
+    }
+
+    rc = 0;
+out:
+    return rc;
 }
 
 static void cleanup_device_subkind(libxl__remus_devices_state *rds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(rds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(rds);
 }
 
 /*----- setup() and teardown() -----*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 06/12] libxl/remus: setup and control disk replication for DRBD backends
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 05/12] libxl/remus: setup and control network output buffering Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool Yang Hongyang
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

This patch adds the machinery required for protecting a guest's
disk state, when the guest disk uses a DRBD disk backend.
This patch comprises of two parts:

1. Hotplug scripts: The block-drbd-probe script is responsible for
  performing sanity checks on the state of the DRBD disk before the
  checkpointing process begins. This script should be invoked by
  libxl for each of the guest's disk devices, when starting Remus.

2. Remus drbd disk device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) setup() is called for each disk attached to the guest.
      During setup():
      i) The hotplug script is called to perform the sanity check.

      ii) Libxl obtains a handle to the DRBD device (/dev/drbd*) and
          and subsequently controls disk checkpoint replication using
          this handle in the checkpoint callbacks.

   c) The preresume() checkpoint callback is executed asynchronously
      using libxl__ev_child_fork(), as it may potentially block for more
      than few seconds in case of backup failure.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>

Edits to commit message:
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/README.remus                    |  10 ++
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/block-drbd-probe |  87 ++++++++++++
 tools/libxl/Makefile                 |   2 +-
 tools/libxl/libxl.c                  |   1 +
 tools/libxl/libxl_internal.h         |   5 +
 tools/libxl/libxl_remus_device.c     |   7 +
 tools/libxl/libxl_remus_disk_drbd.c  | 257 +++++++++++++++++++++++++++++++++++
 8 files changed, 369 insertions(+), 1 deletion(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/docs/README.remus b/docs/README.remus
index ddf5b55..20783c9 100644
--- a/docs/README.remus
+++ b/docs/README.remus
@@ -8,3 +8,13 @@ Using Remus with libxl on Xen 4.5 and higher:
  or higher along with the development headers and command line utilities.
  If your distro does not have the appropriate libnl3 version, you can find
  the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
+
+Disk replication:
+ VMs protected by Remus need to use DRBD based disk backends. Specifically, you
+ need a compile and install a custom version of DRBD, that is available publicly
+ at https://github.com/rshriram/remus-drbd
+ This code is based on DRBD 8.3.11 and uses a new replication protocol (named
+ protocol D) for asynchronous disk checkpoint replication. A protected VM's DRBD
+ disks on the primary and backup hosts need to be configured to use protocol D
+ as the replication protocol. An example resource configuration file can be found
+ in the aforementioned github repository.
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 31e57f7..5317fef 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -24,6 +24,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..247a9d0
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,87 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+#     2: unkown error
+#     3: the drbd device does not use protocol D
+#     4: the drbd device is not ready
+
+set -e
+
+drbd_res=
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            drbd_res="$temp_drbd_res"
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+rc=$?
+if [[ $rc -ne 0 ]]; then
+    exit $rc
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+    exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+    exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+    exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 28b8616..02fc793 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index d4d39f7..79b508f 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -825,6 +825,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
     rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_NIC);
+    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_DISK);
 
     rds->ao = ao;
     rds->egc = egc;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 035792c..1d3cac7 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2654,6 +2654,8 @@ struct libxl__remus_device_instance_ops {
 
 int init_subkind_nic(libxl__remus_devices_state *rds);
 void cleanup_subkind_nic(libxl__remus_devices_state *rds);
+int init_subkind_drbd_disk(libxl__remus_devices_state *rds);
+void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds);
 
 typedef void libxl__remus_callback(libxl__egc *,
                                    libxl__remus_devices_state *, int rc);
@@ -2697,6 +2699,9 @@ struct libxl__remus_devices_state {
     char *netbufscript;
     struct nl_sock *nlsock;
     struct nl_cache *qdisc_cache;
+
+    /* private for drbd disk subkind ops */
+    char *drbd_probe_script;
 };
 
 /*
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 9edcc94..23ad36b 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -18,8 +18,10 @@
 #include "libxl_internal.h"
 
 extern const libxl__remus_device_instance_ops remus_device_nic;
+extern const libxl__remus_device_instance_ops remus_device_drbd_disk;
 static const libxl__remus_device_instance_ops *remus_ops[] = {
     &remus_device_nic,
+    &remus_device_drbd_disk,
     NULL,
 };
 
@@ -36,6 +38,9 @@ static int init_device_subkind(libxl__remus_devices_state *rds)
         if (rc) goto out;
     }
 
+    rc = init_subkind_drbd_disk(rds);
+    if (rc) goto out;
+
     rc = 0;
 out:
     return rc;
@@ -48,6 +53,8 @@ static void cleanup_device_subkind(libxl__remus_devices_state *rds)
 
     if (libxl__netbuffer_enabled(gc))
         cleanup_subkind_nic(rds);
+
+    cleanup_subkind_drbd_disk(rds);
 }
 
 /*----- setup() and teardown() -----*/
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..1be08bb
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,257 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+typedef struct libxl__remus_drbd_disk {
+    int ctl_fd;
+    int ackwait;
+} libxl__remus_drbd_disk;
+
+int init_subkind_drbd_disk(libxl__remus_devices_state *rds)
+{
+    STATE_AO_GC(rds->ao);
+
+    rds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                       libxl__xen_script_dir_path());
+
+    return 0;
+}
+
+void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds)
+{
+    return;
+}
+
+/*----- helper functions, for async calls -----*/
+static void drbd_async_call(libxl__remus_device *dev,
+                            void func(libxl__remus_device *),
+                            libxl__ev_child_callback callback)
+{
+    int pid = -1, rc;
+    libxl__ao_device *aodev = &dev->aodev;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &aodev->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(dev->rds->egc, aodev);
+}
+
+/*----- match(), setup() and teardown() -----*/
+
+/* callbacks */
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status);
+
+/* implementations */
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev);
+
+static void drbd_setup(libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    match_async_exec(dev->rds->egc, dev);
+}
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0, rc;
+    const libxl_device_disk *disk = dev->backend_dev;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(aes->env, arraysize);
+    aes->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(aes->args, arraysize);
+    aes->args[nr++] = dev->rds->drbd_probe_script;
+    aes->args[nr++] = disk->pdev_path;
+    aes->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    aes->ao = dev->rds->ao;
+    aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->callback = match_async_exec_cb;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    rc = libxl__async_exec_start(gc, aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_drbd_disk *drbd_disk;
+    const libxl_device_disk *disk = dev->backend_dev;
+
+    STATE_AO_GC(aodev->ao);
+
+    if (status) {
+        rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    /* ops matched */
+    dev->matched = true;
+
+    GCNEW(drbd_disk);
+    dev->concrete_data = drbd_disk;
+    drbd_disk->ackwait = 0;
+    drbd_disk->ctl_fd = open(disk->pdev_path, O_RDONLY);
+    if (drbd_disk->ctl_fd < 0) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void drbd_teardown(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk = dev->concrete_data;
+    STATE_AO_GC(dev->rds->ao);
+
+    close(drbd_disk->ctl_fd);
+    dev->aodev.rc = 0;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+static void checkpoint_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       pid_t pid, int status);
+
+/* API implementations */
+
+/* this op will not wait and block, so implement as sync op */
+static void drbd_postsuspend(libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    libxl__remus_drbd_disk *rdd = dev->concrete_data;
+
+    if (!rdd->ackwait) {
+        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            rdd->ackwait = 1;
+    }
+
+    dev->aodev.rc = 0;
+    dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+
+static void drbd_preresume_async(libxl__remus_device *dev);
+
+static void drbd_preresume(libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    drbd_async_call(dev, drbd_preresume_async, checkpoint_async_call_done);
+}
+
+static void drbd_preresume_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->concrete_data;
+    int ackwait = rdd->ackwait;
+
+    if (ackwait) {
+        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        ackwait = 0;
+    }
+
+    _exit(ackwait);
+}
+
+static void checkpoint_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       pid_t pid, int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_drbd_disk *rdd = dev->concrete_data;
+
+    STATE_AO_GC(aodev->ao);
+
+    if (!WIFEXITED(status)) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rdd->ackwait = WEXITSTATUS(status);
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_drbd_disk = {
+    .kind = LIBXL__DEVICE_KIND_REMUS_DISK,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 06/12] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25 19:21   ` Konrad Rzeszutek Wilk
  2014-09-25  6:16 ` [PATCH for-4.5 v20 08/12] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Use defbool instead of bool for boolean flags in remus_info struct.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl.c         | 3 +++
 tools/libxl/libxl_dom.c     | 2 +-
 tools/libxl/libxl_types.idl | 4 ++--
 tools/libxl/xl_cmdimpl.c    | 9 ++++-----
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 79b508f..9e0a800 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -804,6 +804,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    libxl_defbool_setdefault(&info->blackhole, false);
+    libxl_defbool_setdefault(&info->compression, true);
+
     GCNEW(dss);
     dss->ao = ao;
     dss->callback = remus_failover_cb;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e9d29b5..d63ae1b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1809,7 +1809,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 
     if (r_info != NULL) {
         dss->interval = r_info->interval;
-        if (r_info->compression)
+        if (libxl_defbool_val(r_info->compression))
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index da4c52d..16e374f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -611,8 +611,8 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
 
 libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
-    ("blackhole",    bool),
-    ("compression",  bool),
+    ("blackhole",    libxl_defbool),
+    ("compression",  libxl_defbool),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index d205f96..e9e8900 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7495,18 +7495,17 @@ int main_remus(int argc, char **argv)
     memset(&r_info, 0, sizeof(libxl_domain_remus_info));
     /* Defaults */
     r_info.interval = 200;
-    r_info.blackhole = 0;
-    r_info.compression = 1;
+    libxl_defbool_setdefault(&r_info.blackhole, false);
 
     SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
     case 'b':
-        r_info.blackhole = 1;
+        libxl_defbool_set(&r_info.blackhole, true);
         break;
     case 'u':
-        r_info.compression = 0;
+        libxl_defbool_set(&r_info.compression, false);
         break;
     case 's':
         ssh_command = optarg;
@@ -7519,7 +7518,7 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
-    if (r_info.blackhole) {
+    if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
             perror("failed to open /dev/null");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 08/12] xl/remus: cmdline switch to explicitly enable unsafe configurations
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (6 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25 19:23   ` Konrad Rzeszutek Wilk
  2014-09-25  6:16 ` [PATCH for-4.5 v20 09/12] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

By default, network buffering and disk replication are enabled;
checkpoints are replicated to another standby VM.

This patch allows the user to disable any of these features by
explicitly specifying a 'run in unsafe mode' switch when invoking
the 'xl remus' command.  While running Remus in an unsafe mode
makes little sense under normal circumstances, it is useful to be
able to disable one or more features mentioned above for
testing/debugging/profiling purposes.

Unless this option is enabled, it will not be possible to
replicate memory checkpoints to /dev/null (blackhole replication),
disable network buffering or disk replication.

As a starter, the use of blackhole replication now requires that
the unsafe mode be enabled. Subsequent patches will add support
for disabling network buffering and disk replication in a similar
manner.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 docs/man/xl.pod.1           | 15 ++++++++++-----
 tools/libxl/libxl.c         |  7 +++++++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  5 ++++-
 tools/libxl/xl_cmdtable.c   |  7 +++++--
 5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f9bc812..2ae3007 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -446,11 +446,6 @@ B<OPTIONS>
 
 Checkpoint domain memory every MS milliseconds (default 200ms).
 
-=item B<-b>
-
-Replicate memory checkpoints to /dev/null (blackhole).
-Generally useful for debugging.
-
 =item B<-u>
 
 Disable memory checkpoint compression.
@@ -465,6 +460,16 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
 On the new host, do not wait in the background (on <host>) for the death
 of the domain. See the corresponding option of the I<create> subcommand.
 
+=item B<-F>
+
+Run Remus in unsafe mode. Use this option with caution as failover may
+not work as intended.
+
+=item B<-b>
+
+Replicate memory checkpoints to /dev/null (blackhole).
+Generally useful for debugging. Requires enabling unsafe mode.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 9e0a800..cc5c3ac 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -804,9 +804,16 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    libxl_defbool_setdefault(&info->unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
 
+    if (!libxl_defbool_val(info->unsafe) &&
+        libxl_defbool_val(info->blackhole)) {
+        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
+        goto out;
+    }
+
     GCNEW(dss);
     dss->ao = ao;
     dss->callback = remus_failover_cb;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 16e374f..348f794 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -611,6 +611,7 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
 
 libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
+    ("unsafe",       libxl_defbool),
     ("blackhole",    libxl_defbool),
     ("compression",  libxl_defbool),
     ])
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index e9e8900..3463d45 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7497,10 +7497,13 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
+    case 'F':
+        libxl_defbool_set(&r_info.unsafe, true);
+        break;
     case 'b':
         libxl_defbool_set(&r_info.blackhole, true);
         break;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index dd15947..08f3c90 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -495,13 +495,16 @@ struct cmd_spec cmd_table[] = {
       "Enable Remus HA for domain",
       "[options] <Domain> [<host>]",
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
-      "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
       "-e                      Do not wait in the background (on <host>) for the death\n"
-      "                        of the domain."
+      "                        of the domain.\n"
+      "-F                      Enable unsafe configurations [-b flags]. Use this option\n"
+      "                        with caution as failover may not work as intended.\n"
+      "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
+      "                        Works only in unsafe mode."
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 09/12] xl/remus: cmdline switches and config vars to control network buffering
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (7 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 08/12] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 10/12] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
                  and tear down the guest's interface.

Add cmdline switches to 'xl remus' command to enable or disable
network buffering and a domain-specific hotplug script to setup
network buffering.

Add a new config var 'remus.default.netbufscript' to xl.conf, that
allows the user to override the default global script used to
setup network buffering.

Note: Network buffering is enabled by default. Disabling network
buffering requires enabling unsafe mode.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/man/xl.conf.pod.5        |  6 ++++++
 docs/man/xl.pod.1             | 11 ++++++++++-
 tools/libxl/libxl.c           | 18 ++++++++++++------
 tools/libxl/libxl_netbuffer.c |  9 +++++++--
 tools/libxl/libxl_types.idl   |  2 ++
 tools/libxl/xl.c              |  4 ++++
 tools/libxl/xl.h              |  1 +
 tools/libxl/xl_cmdimpl.c      | 27 +++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c     |  7 +++++--
 9 files changed, 68 insertions(+), 17 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2ae3007..1f165ad 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -436,7 +436,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     There is no support for disk buffering at the moment.
 
 B<OPTIONS>
 
@@ -460,6 +460,11 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
 On the new host, do not wait in the background (on <host>) for the death
 of the domain. See the corresponding option of the I<create> subcommand.
 
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the
+default script (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-F>
 
 Run Remus in unsafe mode. Use this option with caution as failover may
@@ -470,6 +475,10 @@ not work as intended.
 Replicate memory checkpoints to /dev/null (blackhole).
 Generally useful for debugging. Requires enabling unsafe mode.
 
+=item B<-n>
+
+Disable network output buffering. Requires enabling unsafe mode.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index cc5c3ac..fa757c4 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -807,13 +807,17 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     libxl_defbool_setdefault(&info->unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
+    libxl_defbool_setdefault(&info->netbuf, true);
 
     if (!libxl_defbool_val(info->unsafe) &&
-        libxl_defbool_val(info->blackhole)) {
-        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
+        (libxl_defbool_val(info->blackhole) ||
+         !libxl_defbool_val(info->netbuf))) {
+        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null and "
+                   "disable network buffering");
         goto out;
     }
 
+
     GCNEW(dss);
     dss->ao = ao;
     dss->callback = remus_failover_cb;
@@ -830,11 +834,13 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
 
-    if (!libxl__netbuffer_enabled(gc)) {
-        LOG(ERROR, "Remus: No support for network buffering");
-        goto out;
+    if (libxl_defbool_val(info->netbuf)) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_NIC);
     }
-    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_NIC);
     rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_DISK);
 
     rds->ao = ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index faaa9a3..0415bf4 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -41,6 +41,7 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 int init_subkind_nic(libxl__remus_devices_state *rds)
 {
     int rc, ret;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     STATE_AO_GC(rds->ao);
 
@@ -68,8 +69,12 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
         goto out;
     }
 
-    rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
-                                  libxl__xen_script_dir_path());
+    if (dss->remus->netbufscript) {
+        rds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+    } else {
+        rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                      libxl__xen_script_dir_path());
+    }
 
     rc = 0;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 348f794..53f7daa 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -614,6 +614,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("unsafe",       libxl_defbool),
     ("blackhole",    libxl_defbool),
     ("compression",  libxl_defbool),
+    ("netbuf",       libxl_defbool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 4c5a5ee..f014306 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -176,6 +177,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+        &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 6a6a0f9..6c7aa8e 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -171,6 +171,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 3463d45..2b61f16 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7497,7 +7497,7 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbuni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7510,6 +7510,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         libxl_defbool_set(&r_info.compression, false);
         break;
+    case 'n':
+        libxl_defbool_set(&r_info.netbuf, false);
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7521,6 +7527,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7558,13 +7567,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 08f3c90..cd1b612 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -501,10 +501,13 @@ struct cmd_spec cmd_table[] = {
       "                        ssh <host> xl migrate-receive -r [-e]\n"
       "-e                      Do not wait in the background (on <host>) for the death\n"
       "                        of the domain.\n"
-      "-F                      Enable unsafe configurations [-b flags]. Use this option\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        default script (/etc/xen/scripts/remus-netbuf-setup).\n"
+      "-F                      Enable unsafe configurations [-b|-n flags]. Use this option\n"
       "                        with caution as failover may not work as intended.\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
-      "                        Works only in unsafe mode."
+      "                        Works only in unsafe mode.\n"
+      "-n                      Disable network output buffering. Works only in unsafe mode."
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 10/12] xl/remus: add a cmdline switch to disable disk replication
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (8 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 09/12] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 11/12] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Disk replication is enabled by default. This patch adds a cmdline
switch to 'xl remus' command to explicitly disable disk replication.
A new boolean field 'diskbuf' is added to the libxl_domain_remus_info
structure to represent this configuration option inside libxl.

Note: Disabling disk replication requires enabling unsafe mode.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 docs/man/xl.pod.1           |  6 +++++-
 tools/libxl/libxl.c         | 12 ++++++++----
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  5 ++++-
 tools/libxl/xl_cmdtable.c   |  5 +++--
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 1f165ad..362e92f 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -436,7 +436,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for disk buffering at the moment.
+     Disk replication support is limited to DRBD disks.
 
 B<OPTIONS>
 
@@ -479,6 +479,10 @@ Generally useful for debugging. Requires enabling unsafe mode.
 
 Disable network output buffering. Requires enabling unsafe mode.
 
+=item B<-d>
+
+Disable disk replication. Requires enabling unsafe mode.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index fa757c4..f72c79b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -808,12 +808,14 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     libxl_defbool_setdefault(&info->blackhole, false);
     libxl_defbool_setdefault(&info->compression, true);
     libxl_defbool_setdefault(&info->netbuf, true);
+    libxl_defbool_setdefault(&info->diskbuf, true);
 
     if (!libxl_defbool_val(info->unsafe) &&
         (libxl_defbool_val(info->blackhole) ||
-         !libxl_defbool_val(info->netbuf))) {
-        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null and "
-                   "disable network buffering");
+         !libxl_defbool_val(info->netbuf) ||
+         !libxl_defbool_val(info->diskbuf))) {
+        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null,"
+                   "disable network buffering and disk replication");
         goto out;
     }
 
@@ -841,7 +843,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         }
         rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_NIC);
     }
-    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_DISK);
+
+    if (libxl_defbool_val(info->diskbuf))
+        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_REMUS_DISK);
 
     rds->ao = ao;
     rds->egc = egc;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 53f7daa..36ebfa5 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -616,6 +616,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("compression",  libxl_defbool),
     ("netbuf",       libxl_defbool),
     ("netbufscript", string),
+    ("diskbuf",      libxl_defbool),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 2b61f16..abc8887 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7497,7 +7497,7 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbuni:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7516,6 +7516,9 @@ int main_remus(int argc, char **argv)
     case 'N':
         r_info.netbufscript = optarg;
         break;
+    case 'd':
+        libxl_defbool_set(&r_info.diskbuf, false);
+        break;
     case 's':
         ssh_command = optarg;
         break;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index cd1b612..f93ee4f 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -503,11 +503,12 @@ struct cmd_spec cmd_table[] = {
       "                        of the domain.\n"
       "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
       "                        default script (/etc/xen/scripts/remus-netbuf-setup).\n"
-      "-F                      Enable unsafe configurations [-b|-n flags]. Use this option\n"
+      "-F                      Enable unsafe configurations [-b|-n|-d flags]. Use this option\n"
       "                        with caution as failover may not work as intended.\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
       "                        Works only in unsafe mode.\n"
-      "-n                      Disable network output buffering. Works only in unsafe mode."
+      "-n                      Disable network output buffering. Works only in unsafe mode.\n"
+      "-d                      Disable disk replication. Works only in unsafe mode."
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 11/12] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (9 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 10/12] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25  6:16 ` [PATCH for-4.5 v20 12/12] MAINTAINERS: update maintained files of Remus Yang Hongyang
  2014-09-25 19:28 ` [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Konrad Rzeszutek Wilk
  12 siblings, 0 replies; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Add LIBXL_HAVE_REMUS to indicate Remus support in libxl

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 9ae0fcc..2700cc1 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -647,6 +647,12 @@ typedef struct libxl__ctx libxl_ctx;
  */
 #define LIBXL_HAVE_BUILDINFO_SERIAL_LIST 1
 
+/*
+ * LIBXL_HAVE_REMUS
+ * If this is defined, then libxl supports remus.
+ */
+#define LIBXL_HAVE_REMUS 1
+
 typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_FMT "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx"
 #define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-4.5 v20 12/12] MAINTAINERS: update maintained files of Remus
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (10 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 11/12] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
@ 2014-09-25  6:16 ` Yang Hongyang
  2014-09-25 19:24   ` Konrad Rzeszutek Wilk
  2014-09-25 19:28 ` [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Konrad Rzeszutek Wilk
  12 siblings, 1 reply; 21+ messages in thread
From: Yang Hongyang @ 2014-09-25  6:16 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	rshriram, laijs

Add Remus specific hotplug scripts and libxl files
to the list of maintained files.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bf6b099..935e6cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -260,8 +260,15 @@ M:	Shriram Rajagopalan <rshriram@cs.ubc.ca>
 M:	Yang Hongyang <yanghy@cn.fujitsu.com>
 S:	Maintained
 F:	docs/README.remus
+F:	tools/libxc/xc_domain_save.c
+F:	tools/libxc/xc_domain_restore.c
 F:	tools/blktap2/drivers/block-remus.c
 F:	tools/blktap2/drivers/hashtable*
+F:	tools/libxl/libxl_remus_*
+F:	tools/libxl/libxl_netbuffer.c
+F:	tools/libxl/libxl_nonetbuffer.c
+F:	tools/hotplug/Linux/remus-netbuf-setup
+F:	tools/hotplug/Linux/block-drbd-probe
 
 SCHEDULING
 M:	George Dunlap <george.dunlap@eu.citrix.com>
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool
  2014-09-25  6:16 ` [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool Yang Hongyang
@ 2014-09-25 19:21   ` Konrad Rzeszutek Wilk
  2014-09-25 20:03     ` Shriram Rajagopalan
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 19:21 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, laijs

On Thu, Sep 25, 2014 at 02:16:19PM +0800, Yang Hongyang wrote:
> Use defbool instead of bool for boolean flags in remus_info struct.

While that change by itself looks OK, the change in 'libxl_types.idl'
break the ABI.

Could you say a bit of why that is OK? As in, would there had
been in the past any users of this ABI that now would have issues with this?

Also, how important is this patch? Does it have to go in or
can it be dropped from the patchset?

> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxl/libxl.c         | 3 +++
>  tools/libxl/libxl_dom.c     | 2 +-
>  tools/libxl/libxl_types.idl | 4 ++--
>  tools/libxl/xl_cmdimpl.c    | 9 ++++-----
>  4 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 79b508f..9e0a800 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -804,6 +804,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
>          goto out;
>      }
>  
> +    libxl_defbool_setdefault(&info->blackhole, false);
> +    libxl_defbool_setdefault(&info->compression, true);
> +
>      GCNEW(dss);
>      dss->ao = ao;
>      dss->callback = remus_failover_cb;
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index e9d29b5..d63ae1b 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1809,7 +1809,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>  
>      if (r_info != NULL) {
>          dss->interval = r_info->interval;
> -        if (r_info->compression)
> +        if (libxl_defbool_val(r_info->compression))
>              dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
>      }
>  
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index da4c52d..16e374f 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -611,8 +611,8 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
>  
>  libxl_domain_remus_info = Struct("domain_remus_info",[
>      ("interval",     integer),
> -    ("blackhole",    bool),
> -    ("compression",  bool),
> +    ("blackhole",    libxl_defbool),
> +    ("compression",  libxl_defbool),
>      ])
>  
>  libxl_event_type = Enumeration("event_type", [
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index d205f96..e9e8900 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -7495,18 +7495,17 @@ int main_remus(int argc, char **argv)
>      memset(&r_info, 0, sizeof(libxl_domain_remus_info));
>      /* Defaults */
>      r_info.interval = 200;
> -    r_info.blackhole = 0;
> -    r_info.compression = 1;
> +    libxl_defbool_setdefault(&r_info.blackhole, false);
>  
>      SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
>      case 'i':
>          r_info.interval = atoi(optarg);
>          break;
>      case 'b':
> -        r_info.blackhole = 1;
> +        libxl_defbool_set(&r_info.blackhole, true);
>          break;
>      case 'u':
> -        r_info.compression = 0;
> +        libxl_defbool_set(&r_info.compression, false);
>          break;
>      case 's':
>          ssh_command = optarg;
> @@ -7519,7 +7518,7 @@ int main_remus(int argc, char **argv)
>      domid = find_domain(argv[optind]);
>      host = argv[optind + 1];
>  
> -    if (r_info.blackhole) {
> +    if (libxl_defbool_val(r_info.blackhole)) {
>          send_fd = open("/dev/null", O_RDWR, 0644);
>          if (send_fd < 0) {
>              perror("failed to open /dev/null");
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 08/12] xl/remus: cmdline switch to explicitly enable unsafe configurations
  2014-09-25  6:16 ` [PATCH for-4.5 v20 08/12] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
@ 2014-09-25 19:23   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 19:23 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, laijs

On Thu, Sep 25, 2014 at 02:16:20PM +0800, Yang Hongyang wrote:
> By default, network buffering and disk replication are enabled;
> checkpoints are replicated to another standby VM.
> 
> This patch allows the user to disable any of these features by
> explicitly specifying a 'run in unsafe mode' switch when invoking
> the 'xl remus' command.  While running Remus in an unsafe mode
> makes little sense under normal circumstances, it is useful to be
> able to disable one or more features mentioned above for
> testing/debugging/profiling purposes.
> 
> Unless this option is enabled, it will not be possible to
> replicate memory checkpoints to /dev/null (blackhole replication),
> disable network buffering or disk replication.
> 
> As a starter, the use of blackhole replication now requires that
> the unsafe mode be enabled. Subsequent patches will add support
> for disabling network buffering and disk replication in a similar
> manner.
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  docs/man/xl.pod.1           | 15 ++++++++++-----
>  tools/libxl/libxl.c         |  7 +++++++
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/xl_cmdimpl.c    |  5 ++++-
>  tools/libxl/xl_cmdtable.c   |  7 +++++--
>  5 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index f9bc812..2ae3007 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -446,11 +446,6 @@ B<OPTIONS>
>  
>  Checkpoint domain memory every MS milliseconds (default 200ms).
>  
> -=item B<-b>
> -
> -Replicate memory checkpoints to /dev/null (blackhole).
> -Generally useful for debugging.
> -
>  =item B<-u>
>  
>  Disable memory checkpoint compression.
> @@ -465,6 +460,16 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
>  On the new host, do not wait in the background (on <host>) for the death
>  of the domain. See the corresponding option of the I<create> subcommand.
>  
> +=item B<-F>
> +
> +Run Remus in unsafe mode. Use this option with caution as failover may
> +not work as intended.
> +
> +=item B<-b>
> +
> +Replicate memory checkpoints to /dev/null (blackhole).
> +Generally useful for debugging. Requires enabling unsafe mode.
> +
>  =back
>  
>  =item B<pause> I<domain-id>
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 9e0a800..cc5c3ac 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -804,9 +804,16 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
>          goto out;
>      }
>  
> +    libxl_defbool_setdefault(&info->unsafe, false);
>      libxl_defbool_setdefault(&info->blackhole, false);
>      libxl_defbool_setdefault(&info->compression, true);
>  
> +    if (!libxl_defbool_val(info->unsafe) &&
> +        libxl_defbool_val(info->blackhole)) {
> +        LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
> +        goto out;
> +    }
> +
>      GCNEW(dss);
>      dss->ao = ao;
>      dss->callback = remus_failover_cb;
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 16e374f..348f794 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -611,6 +611,7 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
>  
>  libxl_domain_remus_info = Struct("domain_remus_info",[
>      ("interval",     integer),
> +    ("unsafe",       libxl_defbool),
>      ("blackhole",    libxl_defbool),
>      ("compression",  libxl_defbool),
>      ])
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index e9e8900..3463d45 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -7497,10 +7497,13 @@ int main_remus(int argc, char **argv)
>      r_info.interval = 200;
>      libxl_defbool_setdefault(&r_info.blackhole, false);
>  
> -    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
> +    SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
>      case 'i':
>          r_info.interval = atoi(optarg);
>          break;
> +    case 'F':
> +        libxl_defbool_set(&r_info.unsafe, true);
> +        break;
>      case 'b':
>          libxl_defbool_set(&r_info.blackhole, true);
>          break;
> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> index dd15947..08f3c90 100644
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -495,13 +495,16 @@ struct cmd_spec cmd_table[] = {
>        "Enable Remus HA for domain",
>        "[options] <Domain> [<host>]",
>        "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
> -      "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
>        "-u                      Disable memory checkpoint compression.\n"
>        "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
>        "                        to sh. If empty, run <host> instead of \n"
>        "                        ssh <host> xl migrate-receive -r [-e]\n"
>        "-e                      Do not wait in the background (on <host>) for the death\n"
> -      "                        of the domain."
> +      "                        of the domain.\n"
> +      "-F                      Enable unsafe configurations [-b flags]. Use this option\n"
> +      "                        with caution as failover may not work as intended.\n"
> +      "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
> +      "                        Works only in unsafe mode."
>      },
>  #endif
>      { "devd",
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 12/12] MAINTAINERS: update maintained files of Remus
  2014-09-25  6:16 ` [PATCH for-4.5 v20 12/12] MAINTAINERS: update maintained files of Remus Yang Hongyang
@ 2014-09-25 19:24   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 19:24 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, laijs

On Thu, Sep 25, 2014 at 02:16:24PM +0800, Yang Hongyang wrote:
> Add Remus specific hotplug scripts and libxl files
> to the list of maintained files.
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  MAINTAINERS | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bf6b099..935e6cf 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -260,8 +260,15 @@ M:	Shriram Rajagopalan <rshriram@cs.ubc.ca>
>  M:	Yang Hongyang <yanghy@cn.fujitsu.com>
>  S:	Maintained
>  F:	docs/README.remus
> +F:	tools/libxc/xc_domain_save.c
> +F:	tools/libxc/xc_domain_restore.c
>  F:	tools/blktap2/drivers/block-remus.c
>  F:	tools/blktap2/drivers/hashtable*
> +F:	tools/libxl/libxl_remus_*
> +F:	tools/libxl/libxl_netbuffer.c
> +F:	tools/libxl/libxl_nonetbuffer.c
> +F:	tools/hotplug/Linux/remus-netbuf-setup
> +F:	tools/hotplug/Linux/block-drbd-probe
>  
>  SCHEDULING
>  M:	George Dunlap <george.dunlap@eu.citrix.com>
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk
  2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (11 preceding siblings ...)
  2014-09-25  6:16 ` [PATCH for-4.5 v20 12/12] MAINTAINERS: update maintained files of Remus Yang Hongyang
@ 2014-09-25 19:28 ` Konrad Rzeszutek Wilk
  2014-09-26  5:40   ` Hongyang Yang
  12 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 19:28 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, laijs

On Thu, Sep 25, 2014 at 02:16:12PM +0800, Yang Hongyang wrote:
> This patch series adds support for network buffering and drbd disk
> in the Remus codebase in libxl.
> 
> the code is also hosted on github:
> url: https://github.com/macrosheep/xen/tree/remus-v20

I only had one question in regards to patch:
 [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool

Otherwise all the other patches that did not have an Review
from me look good (and as such I have replied with 'Reviewed-by'
on them).

All of those that had Ian's Ack on them, looked OK to me.
I didn't respond 'Acked-by' on them as I figured I would do 
it here.

Regardless of the #7 question I believe the patches can
go in 4.5 and can have 'Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>'
on them.

Thank you!
> 
> Changes in v20:
>   Rebased.
> 
> Changes in v19:
>   Use defbool for cmdline switch.
>   Restruct of subkind init and cleanup operation.
>   Use libxl__device_kind instead of libxl__remus_device_kind
>   Fix a layer violation issue pointed out by IanJ.
>   Other minor fixes.
>   Rebased to the latest staging tree.
> 
> Changes in v18:
>   Merge match() and setup() api.
>   Reuse libxl__multidev and libxl__ao_device.
>   Commit messages and code comments improved. Thanks to Shriram.
>   Rebased.
> 
> Changes in v17:
>   Make remus device abstract layer more generic.
>   Addressed Ian J's comments.
> 
> Changes in v16:
>   Merge libxl__remus_state and libxl__remus_device_state.
>   Pass the ops to device abstract layer instead of defined it in the layer.
>   Optimized subkind ops APIs.
>   Addressed Ian J's comments.
>   Rebased.
> 
> Changes in v15:
>   The first patch in v14 has been taken, so remove it from the patchset.
>   Add a patch to Update maintained files of REMUS.
>   Rebased.
> 
> Changes in v14:
>   Addressed IanJ's comments.
>   Rebased.
> 
> Changes in v13:
>   Addressed Konrad's comments.
>   Rebased.
> 
> Changes in v12:
>   Add disk buffering cmdline switch.
> 
> Changes in v11:
>   Addressed comments from Ian J and Shriram.
>   Add drbd disk implement into this patch series.
> 
> Changes in V10:
>   Restructured the whole patch series.
>   Introduce the remus device abstract layer.
>   Make remus checkpoint asynchronous.
> 
> Changes in V9:
>   Use async exec script api to exec scripts.
> 
> Changes in V8:
>   Applied some comments(by IanJ).
>   Merge some struct definitions to it's implementation.
>   (2/3/5 in V7 => 3 in V8)
> 
> Changes in V7:
>   Applied missing comments(by IanJ).
>   Applied Shriram comments.
> 
>   merge netbufering tangled setup/teardown code into one patch.
>   (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
> 
> Changes in V6:
>   Applied Ian Jackson's comments of V5 series.
>   the [PATCH 2/4 V5] is split by small functionalities.
> 
>   [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
> 
> Changes in V5:
> 
> Merge hotplug script patch (2/5) and hotplug script setup/teardown
> patch (3/5) into a single patch.
> 
> Changes in V4:
> 
> [1/5] Remove check for libnl command line utils in autoconf checks
> 
> [2/5] minor nits
> 
> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
> 
> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
> 
> [5/5] minor nits
> 
> Changes in V3:
> [1/5] Fix redundant checks in configure scripts
>       (based on Ian Campbell's suggestions)
> 
> [2/5] Introduce locking in the script, during IFB setup.
>       Add xenstore paths used by netbuf scripts
>       to xenstore-paths.markdown
> 
> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
>       following IanJ's feedback.  However, the invocations are still
>       sequential. 
> 
> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
>       commmand.
> 
> And minor nits throughout the series based on feedback from
> the last version
> 
> Changes in V2:
> [1/5] Configure script will automatically enable/disable network
>       buffer support depending on the availability of the appropriate
>       libnl3 version. [If libnl3 is unavailable, a warning message will be
>       printed to let the user know that the feature has been disabled.]
> 
>       use macros from pkg.m4 instead of pkg-config commands
>       removed redundant checks for libnl3 libraries.
> 
> [3,4/5] - Minor nits.
> 
> Version 1:
> 
> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
>       to libxl Makefile.
> 
> [2/5] External script to setup/teardown network buffering using libnl3's
>       CLI. This script will be invoked by libxl before starting Remus.
>       The script's main job is to bring up an IFB device with plug qdisc
>       attached to it.  It then re-routes egress traffic from the guest's
>       vif to the IFB device.
> 
> [3/5] Libxl code to invoke the external setup script, followed by netlink
>       related setup to obtain a handle on the output buffers attached
>       to each vif.
> 
> [4/5] Libxl interaction with network buffer module in the kernel via
>       libnl3 API.
> 
> [5/5] xl cmdline switch to explicitly enable network buffering when
>       starting remus.
> 
> 
>   Few things to note(by shriram): 
> 
>     a) Based on previous email discussions, the setup/teardown task has
>     been moved to a hotplug style shell script which can be customized as
>     desired, instead of implementing it as C code inside libxl.
> 
>     b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
>    (Linux).  So I have made network buffering support an optional feature
>    so that it can be disabled if desired.
> 
>    c) NetBSD does not have libnl3. So I have put the setup script under
>    tools/hotplug/Linux folder.
> 
> thanks,
> Yang.
> 
> Legend:
>   A - acked
>   D - previous acked, but new change introduced so acked-by dropped
>   M - Modified
>   S - the same version as last round
>   No marker - new patch
> 
> Yang Hongyang (12):
>   A libxl: introduce libxl__multidev_prepare_with_aodev
>   A libxl: Extend libxl__ao_device with a libxl__ev_child member
>   A autoconf: add libnl3 dependency for Remus network buffering support
>   S libxl/remus: introduce an abstract Remus device layer
>   A libxl/remus: setup and control network output buffering
>   A libxl/remus: setup and control disk replication for DRBD backends
>   S xl/remus: change bool to defbool
>   S xl/remus: cmdline switch to explicitly enable unsafe configurations
>   A xl/remus: cmdline switches and config vars to control network
>       buffering
>   A xl/remus: add a cmdline switch to disable disk replication
>   A libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
>   S MAINTAINERS: update maintained files of Remus
> 
>  MAINTAINERS                            |   7 +
>  README                                 |   4 +
>  config/Tools.mk.in                     |   4 +
>  docs/README.remus                      |  16 +
>  docs/man/xl.conf.pod.5                 |   6 +
>  docs/man/xl.pod.1                      |  30 +-
>  docs/misc/xenstore-paths.markdown      |   4 +
>  tools/configure.ac                     |  16 +
>  tools/hotplug/Linux/Makefile           |   2 +
>  tools/hotplug/Linux/block-drbd-probe   |  87 ++++++
>  tools/hotplug/Linux/remus-netbuf-setup | 230 +++++++++++++++
>  tools/libxl/Makefile                   |  15 +
>  tools/libxl/libxl.c                    |  75 ++++-
>  tools/libxl/libxl.h                    |   6 +
>  tools/libxl/libxl_device.c             |  14 +-
>  tools/libxl/libxl_dom.c                | 170 ++++++++++-
>  tools/libxl/libxl_internal.h           | 195 ++++++++++++-
>  tools/libxl/libxl_netbuffer.c          | 517 +++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nonetbuffer.c        |  54 ++++
>  tools/libxl/libxl_remus_device.c       | 296 +++++++++++++++++++
>  tools/libxl/libxl_remus_disk_drbd.c    | 257 ++++++++++++++++
>  tools/libxl/libxl_types.idl            |  10 +-
>  tools/libxl/libxl_types_internal.idl   |   2 +
>  tools/libxl/xl.c                       |   4 +
>  tools/libxl/xl.h                       |   1 +
>  tools/libxl/xl_cmdimpl.c               |  42 ++-
>  tools/libxl/xl_cmdtable.c              |  11 +-
>  27 files changed, 2030 insertions(+), 45 deletions(-)
>  create mode 100755 tools/hotplug/Linux/block-drbd-probe
>  create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>  create mode 100644 tools/libxl/libxl_netbuffer.c
>  create mode 100644 tools/libxl/libxl_nonetbuffer.c
>  create mode 100644 tools/libxl/libxl_remus_device.c
>  create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
> 
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool
  2014-09-25 19:21   ` Konrad Rzeszutek Wilk
@ 2014-09-25 20:03     ` Shriram Rajagopalan
  2014-09-25 23:38       ` Ian Jackson
  0 siblings, 1 reply; 21+ messages in thread
From: Shriram Rajagopalan @ 2014-09-25 20:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: ian.campbell, wency, ian.jackson, Jiang Yunhong, eddie.dong,
	xen-devel, Yang Hongyang, laijs


[-- Attachment #1.1: Type: text/plain, Size: 927 bytes --]

On Sep 25, 2014 3:21 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>
wrote:
>
> On Thu, Sep 25, 2014 at 02:16:19PM +0800, Yang Hongyang wrote:
> > Use defbool instead of bool for boolean flags in remus_info struct.
>
> While that change by itself looks OK, the change in 'libxl_types.idl'
> break the ABI.
>
> Could you say a bit of why that is OK? As in, would there had
> been in the past any users of this ABI that now would have issues with
this?
>

There were no users of the libxl Remus api in the past. Certainly not an
API level user like libvirt.

> Also, how important is this patch? Does it have to go in or
> can it be dropped from the patchset?
>

Well, this defbool thing came up as part of the feedback. Dropping this may
be of little consequence but there are new additions to the libxl Remus
struct (enable or disable net/disk buffer), that will break compatibility.
But we can't drop these patches.

> >

[-- Attachment #1.2: Type: text/html, Size: 1197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool
  2014-09-25 20:03     ` Shriram Rajagopalan
@ 2014-09-25 23:38       ` Ian Jackson
  2014-09-26 14:02         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Jackson @ 2014-09-25 23:38 UTC (permalink / raw)
  To: rshriram
  Cc: ian.campbell, wency, Jiang Yunhong, eddie.dong, xen-devel,
	Yang Hongyang, laijs

Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool"):
> On Sep 25, 2014 3:21 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>
> wrote:
> > On Thu, Sep 25, 2014 at 02:16:19PM +0800, Yang Hongyang wrote:
> > > Use defbool instead of bool for boolean flags in remus_info struct.
> >
> > While that change by itself looks OK, the change in 'libxl_types.idl'
> > break the ABI.

(We don't provide ABI stability in libxl (across Xen releases), only
API stability.  However, this change does break the API too:)

> > Could you say a bit of why that is OK? As in, would there had
> > been in the past any users of this ABI that now would have issues with this?
>
> There were no users of the libxl Remus api in the past. Certainly not an API
> level user like libvirt.

This is IMO a good answer.  As I said earlier, the libxl Remus API in
previous version does not really work properly so it seems unlikely
that this change will break any uses.

> > Also, how important is this patch? Does it have to go in or
> > can it be dropped from the patchset?
> 
> Well, this defbool thing came up as part of the feedback.

Ian C and I think that this variable should be a defbool.  That it
wasn't, beforehand, was a mistake.  We have an opportinity to fix that
now, but we won't after 4.5 if we release working libxl Remus support
in 4.5 (as we intend.)

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk
  2014-09-25 19:28 ` [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Konrad Rzeszutek Wilk
@ 2014-09-26  5:40   ` Hongyang Yang
  0 siblings, 0 replies; 21+ messages in thread
From: Hongyang Yang @ 2014-09-26  5:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, laijs



在 09/26/2014 03:28 AM, Konrad Rzeszutek Wilk 写道:
> On Thu, Sep 25, 2014 at 02:16:12PM +0800, Yang Hongyang wrote:
>> This patch series adds support for network buffering and drbd disk
>> in the Remus codebase in libxl.
>>
>> the code is also hosted on github:
>> url: https://github.com/macrosheep/xen/tree/remus-v20
>
> I only had one question in regards to patch:
>   [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool
>
> Otherwise all the other patches that did not have an Review
> from me look good (and as such I have replied with 'Reviewed-by'
> on them).
>
> All of those that had Ian's Ack on them, looked OK to me.
> I didn't respond 'Acked-by' on them as I figured I would do
> it here.
>
> Regardless of the #7 question I believe the patches can
> go in 4.5 and can have 'Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>'
> on them.

Thank you! I have add all your Ack together witn Ian's Ack

>
> Thank you!
>>
>> Changes in v20:
>>    Rebased.
>>
>> Changes in v19:
>>    Use defbool for cmdline switch.
>>    Restruct of subkind init and cleanup operation.
>>    Use libxl__device_kind instead of libxl__remus_device_kind
>>    Fix a layer violation issue pointed out by IanJ.
>>    Other minor fixes.
>>    Rebased to the latest staging tree.
>>
>> Changes in v18:
>>    Merge match() and setup() api.
>>    Reuse libxl__multidev and libxl__ao_device.
>>    Commit messages and code comments improved. Thanks to Shriram.
>>    Rebased.
>>
>> Changes in v17:
>>    Make remus device abstract layer more generic.
>>    Addressed Ian J's comments.
>>
>> Changes in v16:
>>    Merge libxl__remus_state and libxl__remus_device_state.
>>    Pass the ops to device abstract layer instead of defined it in the layer.
>>    Optimized subkind ops APIs.
>>    Addressed Ian J's comments.
>>    Rebased.
>>
>> Changes in v15:
>>    The first patch in v14 has been taken, so remove it from the patchset.
>>    Add a patch to Update maintained files of REMUS.
>>    Rebased.
>>
>> Changes in v14:
>>    Addressed IanJ's comments.
>>    Rebased.
>>
>> Changes in v13:
>>    Addressed Konrad's comments.
>>    Rebased.
>>
>> Changes in v12:
>>    Add disk buffering cmdline switch.
>>
>> Changes in v11:
>>    Addressed comments from Ian J and Shriram.
>>    Add drbd disk implement into this patch series.
>>
>> Changes in V10:
>>    Restructured the whole patch series.
>>    Introduce the remus device abstract layer.
>>    Make remus checkpoint asynchronous.
>>
>> Changes in V9:
>>    Use async exec script api to exec scripts.
>>
>> Changes in V8:
>>    Applied some comments(by IanJ).
>>    Merge some struct definitions to it's implementation.
>>    (2/3/5 in V7 => 3 in V8)
>>
>> Changes in V7:
>>    Applied missing comments(by IanJ).
>>    Applied Shriram comments.
>>
>>    merge netbufering tangled setup/teardown code into one patch.
>>    (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
>>
>> Changes in V6:
>>    Applied Ian Jackson's comments of V5 series.
>>    the [PATCH 2/4 V5] is split by small functionalities.
>>
>>    [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
>>
>> Changes in V5:
>>
>> Merge hotplug script patch (2/5) and hotplug script setup/teardown
>> patch (3/5) into a single patch.
>>
>> Changes in V4:
>>
>> [1/5] Remove check for libnl command line utils in autoconf checks
>>
>> [2/5] minor nits
>>
>> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
>>
>> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
>>
>> [5/5] minor nits
>>
>> Changes in V3:
>> [1/5] Fix redundant checks in configure scripts
>>        (based on Ian Campbell's suggestions)
>>
>> [2/5] Introduce locking in the script, during IFB setup.
>>        Add xenstore paths used by netbuf scripts
>>        to xenstore-paths.markdown
>>
>> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
>>        following IanJ's feedback.  However, the invocations are still
>>        sequential.
>>
>> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
>>        commmand.
>>
>> And minor nits throughout the series based on feedback from
>> the last version
>>
>> Changes in V2:
>> [1/5] Configure script will automatically enable/disable network
>>        buffer support depending on the availability of the appropriate
>>        libnl3 version. [If libnl3 is unavailable, a warning message will be
>>        printed to let the user know that the feature has been disabled.]
>>
>>        use macros from pkg.m4 instead of pkg-config commands
>>        removed redundant checks for libnl3 libraries.
>>
>> [3,4/5] - Minor nits.
>>
>> Version 1:
>>
>> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
>>        to libxl Makefile.
>>
>> [2/5] External script to setup/teardown network buffering using libnl3's
>>        CLI. This script will be invoked by libxl before starting Remus.
>>        The script's main job is to bring up an IFB device with plug qdisc
>>        attached to it.  It then re-routes egress traffic from the guest's
>>        vif to the IFB device.
>>
>> [3/5] Libxl code to invoke the external setup script, followed by netlink
>>        related setup to obtain a handle on the output buffers attached
>>        to each vif.
>>
>> [4/5] Libxl interaction with network buffer module in the kernel via
>>        libnl3 API.
>>
>> [5/5] xl cmdline switch to explicitly enable network buffering when
>>        starting remus.
>>
>>
>>    Few things to note(by shriram):
>>
>>      a) Based on previous email discussions, the setup/teardown task has
>>      been moved to a hotplug style shell script which can be customized as
>>      desired, instead of implementing it as C code inside libxl.
>>
>>      b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
>>     (Linux).  So I have made network buffering support an optional feature
>>     so that it can be disabled if desired.
>>
>>     c) NetBSD does not have libnl3. So I have put the setup script under
>>     tools/hotplug/Linux folder.
>>
>> thanks,
>> Yang.
>>
>> Legend:
>>    A - acked
>>    D - previous acked, but new change introduced so acked-by dropped
>>    M - Modified
>>    S - the same version as last round
>>    No marker - new patch
>>
>> Yang Hongyang (12):
>>    A libxl: introduce libxl__multidev_prepare_with_aodev
>>    A libxl: Extend libxl__ao_device with a libxl__ev_child member
>>    A autoconf: add libnl3 dependency for Remus network buffering support
>>    S libxl/remus: introduce an abstract Remus device layer
>>    A libxl/remus: setup and control network output buffering
>>    A libxl/remus: setup and control disk replication for DRBD backends
>>    S xl/remus: change bool to defbool
>>    S xl/remus: cmdline switch to explicitly enable unsafe configurations
>>    A xl/remus: cmdline switches and config vars to control network
>>        buffering
>>    A xl/remus: add a cmdline switch to disable disk replication
>>    A libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
>>    S MAINTAINERS: update maintained files of Remus
>>
>>   MAINTAINERS                            |   7 +
>>   README                                 |   4 +
>>   config/Tools.mk.in                     |   4 +
>>   docs/README.remus                      |  16 +
>>   docs/man/xl.conf.pod.5                 |   6 +
>>   docs/man/xl.pod.1                      |  30 +-
>>   docs/misc/xenstore-paths.markdown      |   4 +
>>   tools/configure.ac                     |  16 +
>>   tools/hotplug/Linux/Makefile           |   2 +
>>   tools/hotplug/Linux/block-drbd-probe   |  87 ++++++
>>   tools/hotplug/Linux/remus-netbuf-setup | 230 +++++++++++++++
>>   tools/libxl/Makefile                   |  15 +
>>   tools/libxl/libxl.c                    |  75 ++++-
>>   tools/libxl/libxl.h                    |   6 +
>>   tools/libxl/libxl_device.c             |  14 +-
>>   tools/libxl/libxl_dom.c                | 170 ++++++++++-
>>   tools/libxl/libxl_internal.h           | 195 ++++++++++++-
>>   tools/libxl/libxl_netbuffer.c          | 517 +++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_nonetbuffer.c        |  54 ++++
>>   tools/libxl/libxl_remus_device.c       | 296 +++++++++++++++++++
>>   tools/libxl/libxl_remus_disk_drbd.c    | 257 ++++++++++++++++
>>   tools/libxl/libxl_types.idl            |  10 +-
>>   tools/libxl/libxl_types_internal.idl   |   2 +
>>   tools/libxl/xl.c                       |   4 +
>>   tools/libxl/xl.h                       |   1 +
>>   tools/libxl/xl_cmdimpl.c               |  42 ++-
>>   tools/libxl/xl_cmdtable.c              |  11 +-
>>   27 files changed, 2030 insertions(+), 45 deletions(-)
>>   create mode 100755 tools/hotplug/Linux/block-drbd-probe
>>   create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
>>   create mode 100644 tools/libxl/libxl_netbuffer.c
>>   create mode 100644 tools/libxl/libxl_nonetbuffer.c
>>   create mode 100644 tools/libxl/libxl_remus_device.c
>>   create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
>>
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool
  2014-09-25 23:38       ` Ian Jackson
@ 2014-09-26 14:02         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 14:02 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, Jiang Yunhong, eddie.dong, xen-devel,
	rshriram, Yang Hongyang, laijs

On Fri, Sep 26, 2014 at 12:38:49AM +0100, Ian Jackson wrote:
> Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool"):
> > On Sep 25, 2014 3:21 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>
> > wrote:
> > > On Thu, Sep 25, 2014 at 02:16:19PM +0800, Yang Hongyang wrote:
> > > > Use defbool instead of bool for boolean flags in remus_info struct.
> > >
> > > While that change by itself looks OK, the change in 'libxl_types.idl'
> > > break the ABI.
> 
> (We don't provide ABI stability in libxl (across Xen releases), only
> API stability.  However, this change does break the API too:)
> 
> > > Could you say a bit of why that is OK? As in, would there had
> > > been in the past any users of this ABI that now would have issues with this?
> >
> > There were no users of the libxl Remus api in the past. Certainly not an API
> > level user like libvirt.
> 
> This is IMO a good answer.  As I said earlier, the libxl Remus API in
> previous version does not really work properly so it seems unlikely
> that this change will break any uses.
> 
> > > Also, how important is this patch? Does it have to go in or
> > > can it be dropped from the patchset?
> > 
> > Well, this defbool thing came up as part of the feedback.
> 
> Ian C and I think that this variable should be a defbool.  That it
> wasn't, beforehand, was a mistake.  We have an opportinity to fix that
> now, but we won't after 4.5 if we release working libxl Remus support
> in 4.5 (as we intend.)

OK, defbool it is. Thank you for clearing that up.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-09-26 14:02 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-25  6:16 [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 01/12] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 02/12] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 03/12] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 04/12] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 05/12] libxl/remus: setup and control network output buffering Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 06/12] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 07/12] xl/remus: change bool to defbool Yang Hongyang
2014-09-25 19:21   ` Konrad Rzeszutek Wilk
2014-09-25 20:03     ` Shriram Rajagopalan
2014-09-25 23:38       ` Ian Jackson
2014-09-26 14:02         ` Konrad Rzeszutek Wilk
2014-09-25  6:16 ` [PATCH for-4.5 v20 08/12] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
2014-09-25 19:23   ` Konrad Rzeszutek Wilk
2014-09-25  6:16 ` [PATCH for-4.5 v20 09/12] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 10/12] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 11/12] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
2014-09-25  6:16 ` [PATCH for-4.5 v20 12/12] MAINTAINERS: update maintained files of Remus Yang Hongyang
2014-09-25 19:24   ` Konrad Rzeszutek Wilk
2014-09-25 19:28 ` [PATCH for-4.5 v20 00/12] Remus/Libxl: Remus network buffering and drbd disk Konrad Rzeszutek Wilk
2014-09-26  5:40   ` Hongyang Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.