All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk
@ 2014-06-20  7:04 Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 1/7] remus: make postcopy callback asynchronous Yang Hongyang
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

This patch series adds support for network buffering and drbd disk
in the Remus codebase in libxl.

the code is also hosted on github:

url: https://github.com/laijs/xen
branch: remus-v12

Changes in v12:
  Add disk buffering cmdline switch

Changes in v11:
  Addressed comments from Ian J and Shriram
  Add drbd disk implement into this patch series

Changes in V10:
  Restructured the whole patch series.
  Introduce the remus device abstract layer.
  Make remus checkpoint asynchronous.

Changes in V9:
  Use async exec script api to exec scripts.

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks

Legend:
  A - acked
  D - previous acked, but new change introduced so acked-by dropped
  S - the same version as last round
  No marker - new patch

Yang Hongyang (7):
  A remus: make postcopy callback asynchronous
  S remus: add libnl3 dependency for network buffering support
    remus: introduce remus device
    remus netbuffer: implement remus network buffering for nic devices
  A remus drbd: Implement remus drbd replicated disk
    libxl: network buffering cmdline switch
    libxl: disk buffering cmdline switch

 README                                 |   4 +
 config/Tools.mk.in                     |   4 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  15 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  16 +
 tools/hotplug/Linux/Makefile           |   2 +
 tools/hotplug/Linux/block-drbd-probe   |  84 +++++
 tools/hotplug/Linux/remus-netbuf-setup | 203 +++++++++++++
 tools/libxl/Makefile                   |  15 +
 tools/libxl/libxl.c                    |  55 +++-
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_dom.c                | 132 +++++++-
 tools/libxl/libxl_internal.h           | 188 ++++++++++++
 tools/libxl/libxl_netbuffer.c          | 540 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  98 ++++++
 tools/libxl/libxl_remus_device.c       | 370 ++++++++++++++++++++++
 tools/libxl/libxl_remus_disk_drbd.c    | 249 +++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl     |   2 +-
 tools/libxl/libxl_types.idl            |   4 +
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  32 +-
 tools/libxl/xl_cmdtable.c              |   4 +
 tools/remus/README                     |   6 +
 25 files changed, 2029 insertions(+), 22 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus_device.c
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v12 1/7] remus: make postcopy callback asynchronous
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

Make postcopy callback asynchronous.
This is a prepare patch of the following patch series.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_dom.c            | 10 +++++++---
 tools/libxl/libxl_save_msgs_gen.pl |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 661999c..c11993d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1451,18 +1451,22 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-static int libxl__remus_domain_resume_callback(void *data)
+static void libxl__remus_domain_resume_callback(void *data)
 {
+    int ok = 0;
     libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
     /* Resumes the domain and the device model */
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
-        return 0;
+        goto out;
 
     /* REMUS TODO: Deal with disk. Start a new network output buffer */
-    return 1;
+    ok = 1;
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 745e2ac..36bae04 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -24,7 +24,7 @@ our @msgs = (
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
     [  3, 'scxA',   "suspend", [] ],
-    [  4, 'scxW',   "postcopy", [] ],
+    [  4, 'scxA',   "postcopy", [] ],
     [  5, 'scxA',   "checkpoint", [] ],
     [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 1/7] remus: make postcopy callback asynchronous Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  2014-06-20 14:43   ` Konrad Rzeszutek Wilk
  2014-06-20  7:04 ` [PATCH v12 3/7] remus: introduce remus device Yang Hongyang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

when there's no network buffering support,libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.

NOTE: This patch changes tools/configure.ac, please rerun
      autogen.sh while apply the patch.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 README                          |  4 ++++
 config/Tools.mk.in              |  4 ++++
 tools/configure.ac              | 16 ++++++++++++++++
 tools/libxl/Makefile            | 13 +++++++++++++
 tools/libxl/libxl_internal.h    |  1 +
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 tools/remus/README              |  6 ++++++
 8 files changed, 106 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/README b/README
index 9bbe734..e770932 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
     * cmake (if building vtpm stub domains)
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 84b2612..06c9d25 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -38,6 +38,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
+
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -56,6 +59,7 @@ CONFIG_QEMU_XEN     := @qemu_xen@
 CONFIG_BLKTAP1      := @blktap1@
 CONFIG_VTPM         := @vtpm@
 CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff --git a/tools/configure.ac b/tools/configure.ac
index 9db798b..6deed8f 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -267,5 +267,21 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+    [libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+    AC_MSG_WARN([Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher])
+    AC_SUBST(remus_netbuf, [n])
+    ],[
+    AC_SUBST(remus_netbuf, [y])
+])
+
+AC_SUBST(LIBNL3_LIBS)
+AC_SUBST(LIBNL3_CFLAGS)
+
 AC_OUTPUT()
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 7fc42c8..fdffff3 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,17 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
@@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a0d4f24..3fc90e2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2470,6 +2470,7 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
 
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..8e23d75
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..6aa4bf1
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/remus/README b/tools/remus/README
index 9e8140b..4736252 100644
--- a/tools/remus/README
+++ b/tools/remus/README
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://nss.cs.ubc.ca/remus/ for details.
+
+Using Remus with libxl on Xen 4.4 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v12 3/7] remus: introduce remus device
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 1/7] remus: make postcopy callback asynchronous Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  2014-06-27 17:38   ` Ian Jackson
  2014-06-20  7:04 ` [PATCH v12 4/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

introduce remus device, an abstract layer of remus devices(nic, disk,
etc).It provides the following APIs for libxl:
  >libxl__remus_device_setup
    setup remus devices, like attach qdisc, enable disk buffering, etc
  >libxl__remus_device_teardown
    teardown devices
  >libxl__remus_device_postsuspend
  >libxl__remus_device_preresume
  >libxl__remus_device_commit
    above three are for checkpoint.
through remus device layer, the remus execution flow will be like
this:
  xl remus -> remus device setup
                |-> remus checkpoint(postsuspend, preresume, commit)
                      ...
                       |-> remus device teardown, failover or abort
the remus device layer provides an interface
  libxl__remus_device_ops
which a remus device must implement. the whole remus structure:
                            |remus|
                               |
                        |remus device|
                               |
                |nic| |drbd disks| |qemu disks| ...
a device(nic, drbd disks, qemu disks, etc) must implement
libxl__remus_device_ops to support remus.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 tools/libxl/Makefile             |   2 +
 tools/libxl/libxl.c              |  34 +++-
 tools/libxl/libxl_dom.c          | 132 +++++++++++++--
 tools/libxl/libxl_internal.h     | 182 +++++++++++++++++++++
 tools/libxl/libxl_remus_device.c | 340 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl      |   1 +
 6 files changed, 675 insertions(+), 16 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index fdffff3..cb2efdf 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,6 +56,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus_device.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 62e251a..f99477d 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -733,6 +733,31 @@ out:
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_state *rs, int rc)
+{
+    STATE_AO_GC(rs->ao);
+    libxl__ao_complete(egc, ao, rc);
+}
+
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    STATE_AO_GC(rs->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
+        dss->domid);
+    rs->saved_rc = rc;
+    rs->callback = libxl__remus_setup_failed;
+    libxl__remus_device_teardown(egc, rs);
+}
+
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
                              uint32_t domid, int send_fd, int recv_fd,
@@ -761,10 +786,15 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* Convenience aliases */
+    libxl__remus_state *const rs = &dss->rs;
+    rs->ao = ao;
+    rs->domid = domid;
+    rs->saved_rc = 0;
+    rs->callback = libxl__remus_setup_done;
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_device_setup(egc, rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index c11993d..dde8bf6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1426,6 +1426,17 @@ static void libxl__domain_suspend_callback(void *data)
     domain_suspend_callback_common(egc, dss);
 }
 
+static void remus_device_postsuspend_cb(libxl__egc *egc,
+                                        libxl__remus_state *rs, int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    if (!rc)
+        ok = 1;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
@@ -1447,32 +1458,51 @@ static void libxl__remus_domain_suspend_callback(void *data)
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
+    if (!ok)
+        goto out;
+
+    libxl__remus_state *const rs = &dss->rs;
+    rs->callback = remus_device_postsuspend_cb;
+    libxl__remus_device_postsuspend(egc, rs);
+    return;
+
+out:
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-static void libxl__remus_domain_resume_callback(void *data)
+static void remus_device_preresume_cb(libxl__egc *egc,
+                                      libxl__remus_state *rs, int rc)
 {
     int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        /* Resumes the domain and the device model */
+        if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+            ok = 1;
+    }
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void libxl__remus_domain_resume_callback(void *data)
+{
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    /* Resumes the domain and the device model */
-    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
-        goto out;
-
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
-    ok = 1;
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
+    libxl__remus_state *const rs = &dss->rs;
+    rs->callback = remus_device_preresume_cb;
+    libxl__remus_device_preresume(egc, rs);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
 
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1489,13 +1519,67 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
     }
 }
 
+static void remus_device_commit_cb(libxl__egc *egc,
+                                   libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    } else {
+        /* Set checkpoint interval timeout */
+        rc = libxl__ev_time_register_rel(gc, &rs->timeout,
+                                         remus_next_checkpoint,
+                                         dss->interval);
+        if (rc) {
+            LOG(ERROR, "unable to register timeout for next epoch."
+                " Terminating Remus..");
+            goto out;
+        }
+    }
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* Convenience aliases */
+    libxl__remus_state *const rs = &dss->rs;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rs->callback = remus_device_commit_cb;
+    libxl__remus_device_commit(egc, rs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__remus_state *rs = CONTAINER_OF(ev, *rs, timeout);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dss = CONTAINER_OF(rs, *dss, rs);
+
+    STATE_AO_GC(dss->ao);
+
+    libxl__ev_time_deregister(gc, &rs->timeout);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1720,6 +1804,13 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
+static void libxl__remus_teardown_done(libxl__egc *egc,
+                                       libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    dss->callback(egc, dss, rc);
+}
+
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
@@ -1734,6 +1825,19 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (dss->remus) {
+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        dss->rs.saved_rc = rc;
+        dss->rs.callback = libxl__remus_teardown_done;
+        libxl__remus_device_teardown(egc, &dss->rs);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3fc90e2..5521a42 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2470,6 +2470,187 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+/*----- remus device related state structure -----*/
+/* remus device is an abstract layer of remus devices(nic, disk,
+ * etc).It provides the following APIs for libxl:
+ *   >libxl__remus_device_setup
+ *     setup remus devices, like attach qdisc, enable disk buffering, etc
+ *   >libxl__remus_device_teardown
+ *     teardown devices
+ *   >libxl__remus_device_postsuspend
+ *   >libxl__remus_device_preresume
+ *   >libxl__remus_device_commit
+ *     above three are for checkpoint.
+ * through remus device layer, the remus execution flow will be like
+ * this:
+ * xl remus -> remus device setup
+ *               |-> remus checkpoint(postsuspend, preresume, commit)
+ *                     ...
+ *                      |-> remus device teardown, failover or abort
+ * the remus device layer provides an interface
+ *   libxl__remus_device_ops
+ * which a remus device must implement. the whole remus structure:
+ *                           |remus|
+ *                              |
+ *                       |remus device|
+ *                              |
+ *               |nic| |drbd disks| |qemu disks| ...
+ * a device(nic, drbd disks, qemu disks, etc) must implement
+ * libxl__remus_device_ops to support remus.
+ */
+
+typedef enum libxl__remus_device_kind {
+    LIBXL__REMUS_DEVICE_NIC,
+    LIBXL__REMUS_DEVICE_DISK,
+} libxl__remus_device_kind;
+
+typedef struct libxl__remus_state libxl__remus_state;
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_device_state libxl__remus_device_state;
+typedef struct libxl__remus_device_ops libxl__remus_device_ops;
+
+struct libxl__remus_device_ops {
+    /*
+     * init() and destroy() APIs are produced by a device type and
+     * consumed by the main remus code, a device type must implement
+     * these two APIs.
+     */
+    /* init device ops private data, etc. must implement */
+    int (*init)(libxl__remus_device_ops *self,
+                libxl__remus_state *rs);
+    /* free device ops private data, etc. must implement */
+    void (*destroy)(libxl__remus_device_ops *self);
+    /*
+     * This is device ops's private data, for different device types,
+     * the data structs are different
+     */
+    void *data;
+
+    /*
+     * checkpoint callbacks, these are async ops, call dev->callback
+     * when done. These function pointers may be NULL, means the op is
+     * not implemented, and it will do nothing when checkpoint.
+     * The callers of these APIs must check the function pointer first.
+     * These callbacks can be implemented synchronously, call
+     * dev->callback at last directly.
+     */
+    void (*postsuspend)(libxl__remus_device *dev);
+    void (*preresume)(libxl__remus_device *dev);
+    void (*commit)(libxl__remus_device *dev);
+
+    /*
+     * This API determines whether the ops matchs the specific device. In the
+     * implementation, we first init all device ops, for example, NIC ops,
+     * DRBD ops ... Then we will find out the libxl devices, and match the
+     * device with the ops, if the device is a drbd disk, then it will be
+     * matched with DRBD ops, and the further ops(such as checkpoint ops etc.)
+     * of this device will using DRBD ops. This API is mainly for disks,
+     * because we must use an external script to determine whether a
+     * libxl_disk is a DRBD disk. a device type must implement this API.
+     * It's an async op and must be implemented asynchronously,
+     * call dev->callback when done.
+     */
+    void (*match)(libxl__remus_device_ops *self,
+                  libxl__remus_device *dev);
+
+    /*
+     * setup() and teardown() are refer to the actual remus device,
+     * a device type must implement these two APIs. They are async
+     * ops, and call dev->callback when done.
+     * These callbacks can be implemented synchronously, call
+     * dev->callback at last directly.
+     */
+    /* setup the remus device */
+    void (*setup)(libxl__remus_device *dev);
+
+    /* teardown the remus device */
+    void (*teardown)(libxl__remus_device *dev);
+};
+
+/*
+ * This structure is for remus device layer, it records remus devices
+ * that have been setuped.
+ */
+struct libxl__remus_device_state {
+    libxl__ao *ao;
+    libxl__egc *egc;
+
+    /* devices that have been setuped */
+    libxl__remus_device **dev;
+
+    int num_nics;
+    int num_disks;
+
+    /* for counting devices that have been handled */
+    int num_devices;
+    /* for counting devices that matched and setuped */
+    int num_setuped;
+};
+
+typedef void libxl__remus_device_callback(libxl__egc *,
+                                          libxl__remus_device *,
+                                          int rc);
+/*
+ * This structure is init and setup by remus device abstruct layer,
+ * and pass to remus device ops
+ */
+struct libxl__remus_device {
+    /* set by remus device abstruct layer */
+    int devid;
+    /* libxl__device_* which this remus device related to */
+    const void *backend_dev;
+    libxl__remus_device_kind kind;
+    /*
+     * This is for matching, we must go through all device ops until we
+     * find a matched op for the device. The ops_index record which ops
+     * we are matching.
+     */
+    int ops_index;
+    libxl__remus_device_ops *ops;
+    libxl__remus_device_callback *callback;
+    libxl__remus_device_state *rds;
+
+    /* used by remus device implementation */
+    /* *kind* of device's private data */
+    void *data;
+    /* for calling scripts, eg. setup|teardown|match scripts */
+    libxl__async_exec_state aes;
+    /*
+     * for async func calls, in the implenmentation of device ops, we
+     * may use fork to do async ops. this is owned by device-specific
+     * ops methods
+     */
+    libxl__ev_child child;
+};
+
+typedef void libxl__remus_callback(libxl__egc *,
+                                   libxl__remus_state *, int rc);
+
+struct libxl__remus_state {
+    /* must set by caller of libxl__remus_device_(setup|teardown) */
+    libxl__ao *ao;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+
+    /* private */
+    int saved_rc;
+    /* context containing device related stuff */
+    libxl__remus_device_state dev_state;
+
+    libxl__ev_time timeout; /* used for checkpoint */
+};
+
+/* the following 5 APIs are async ops, call rs->callback when done */
+_hidden void libxl__remus_device_setup(libxl__egc *egc,
+                                       libxl__remus_state *rs);
+_hidden void libxl__remus_device_teardown(libxl__egc *egc,
+                                          libxl__remus_state *rs);
+_hidden void libxl__remus_device_postsuspend(libxl__egc *egc,
+                                             libxl__remus_state *rs);
+_hidden void libxl__remus_device_preresume(libxl__egc *egc,
+                                           libxl__remus_state *rs);
+_hidden void libxl__remus_device_commit(libxl__egc *egc,
+                                        libxl__remus_state *rs);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2500,6 +2681,7 @@ struct libxl__domain_suspend_state {
     int live;
     int debug;
     const libxl_domain_remus_info *remus;
+    libxl__remus_state rs;
     /* private */
     libxl__ev_evtchn guest_evtchn;
     int guest_evtchn_lockfd;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..07e298b
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,340 @@
+/*
+ * Copyright (C) 2014
+ * Author: Lai Jiangshan <laijs@cn.fujitsu.com>
+ *         Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static libxl__remus_device_ops *dev_ops[] = {
+};
+
+static void device_common_cb(libxl__egc *egc,
+                             libxl__remus_device *dev,
+                             int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *const rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    rds->num_devices++;
+
+    if (rc)
+        rs->saved_rc = ERROR_FAIL;
+
+    if (rds->num_devices == rds->num_setuped)
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    rs->saved_rc = 0;
+
+    if(rds->num_setuped == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->postsuspend) {
+            dev->ops->postsuspend(dev);
+        } else {
+            rds->num_devices++;
+            if (rds->num_devices == rds->num_setuped)
+                rs->callback(egc, rs, rs->saved_rc);
+        }
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    rs->saved_rc = 0;
+
+    if(rds->num_setuped == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->preresume) {
+            dev->ops->preresume(dev);
+        } else {
+            rds->num_devices++;
+            if (rds->num_devices == rds->num_setuped)
+                rs->callback(egc, rs, rs->saved_rc);
+        }
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /*
+     * REMUS TODO: Wait for disk and explicit memory ack (through restore
+     * callback from remote) before releasing network buffer.
+     */
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    rs->saved_rc = 0;
+
+    if(rds->num_setuped == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->commit) {
+            dev->ops->commit(dev);
+        } else {
+            rds->num_devices++;
+            if (rds->num_devices == rds->num_setuped)
+                rs->callback(egc, rs, rs->saved_rc);
+        }
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+}
+
+static void device_setup_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *const rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    rds->num_devices++;
+    /*
+     * we add devices that have been setuped to the array no matter
+     * the setup process succeed or failed because we need to ensure
+     * the device been teardown while setup failed. If any of the
+     * device setup failed, we will quit remus, but before we exit,
+     * we will teardown the devices that have been added to **dev
+     */
+    rds->dev[rds->num_setuped++] = dev;
+    if (rc) {
+        /* setup failed */
+        rs->saved_rc = ERROR_FAIL;
+    }
+
+    if (rds->num_devices == (rds->num_nics + rds->num_disks))
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+static void device_match_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+{
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    if (rc) {
+        if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||
+            rc != ERROR_NOT_MATCH) {
+            /* the device can not be matched */
+            rds->num_devices++;
+            rs->saved_rc = ERROR_FAIL;
+            if (rds->num_devices == (rds->num_nics + rds->num_disks))
+                rs->callback(egc, rs, rs->saved_rc);
+            return;
+        }
+        /* the ops does not match, try next ops */
+        dev->ops = dev_ops[dev->ops_index];
+        dev->ops->match(dev->ops, dev);
+    } else {
+        /* the ops matched, setup the device */
+        dev->callback = device_setup_cb;
+        dev->ops->setup(dev);
+    }
+}
+
+static void device_teardown_cb(libxl__egc *egc,
+                               libxl__remus_device *dev,
+                               int rc)
+{
+    int i;
+    libxl__remus_device_ops *ops;
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    /* ignore teardown errors to teardown as many devs as possible*/
+    rds->num_setuped--;
+
+    if (rds->num_setuped == 0) {
+        /* clean device ops */
+        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+            ops = dev_ops[i];
+            ops->destroy(ops);
+        }
+        rs->callback(egc, rs, rs->saved_rc);
+    }
+}
+
+static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+                                     libxl__remus_device_state *rds,
+                                     libxl__remus_device_kind kind,
+                                     void *libxl_dev)
+{
+    libxl__remus_device *dev = NULL;
+    libxl_device_nic *nic = NULL;
+    libxl_device_disk *disk = NULL;
+
+    STATE_AO_GC(rds->ao);
+    GCNEW(dev);
+    dev->ops_index = 0; /* we will match the ops later */
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->rds = rds;
+
+    switch (kind) {
+        case LIBXL__REMUS_DEVICE_NIC:
+            nic = libxl_dev;
+            dev->devid = nic->devid;
+            break;
+        case LIBXL__REMUS_DEVICE_DISK:
+            disk = libxl_dev;
+            /* there are no dev id for disk devices */
+            dev->devid = -1;
+            break;
+        default:
+            return;
+    }
+
+    libxl__async_exec_init(&dev->aes);
+    libxl__ev_child_init(&dev->child);
+
+    /* match the ops begin */
+    dev->callback = device_match_cb;
+    dev->ops = dev_ops[dev->ops_index];
+    dev->ops->match(dev->ops, dev);
+}
+
+void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device_ops *ops;
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    STATE_AO_GC(rs->ao);
+
+    if (ARRAY_SIZE(dev_ops) == 0)
+        goto out;
+
+    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+        ops = dev_ops[i];
+        if (ops->init(ops, rs)) {
+            rs->saved_rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
+    rds->ao = rs->ao;
+    rds->egc = egc;
+    rds->num_devices = 0;
+    rds->num_nics = 0;
+    rds->num_disks = 0;
+
+    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+    if (rds->num_nics == 0 && rds->num_disks == 0)
+        goto out;
+
+    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
+
+    /* TBD: CALL libxl__remus_device_init to init remus devices */
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+    return;
+}
+
+void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    libxl__remus_device_ops *ops;
+
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    if (rds->num_setuped == 0) {
+        /* clean device ops */
+        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+            ops = dev_ops[i];
+            ops->destroy(ops);
+        }
+        goto out;
+    }
+
+    for (i = 0; i < rds->num_setuped; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_teardown_cb;
+        dev->ops->teardown(dev);
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+    return;
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1018142..cc5d390 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
     (-12, "OSEVENT_REG_FAIL"),
     (-13, "BUFFERFULL"),
     (-14, "UNKNOWN_CHILD"),
+    (-15, "NOT_MATCH"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v12 4/7] remus netbuffer: implement remus network buffering for nic devices
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-06-20  7:04 ` [PATCH v12 3/7] remus: introduce remus device Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 5/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

1.Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
       and tear down the guest's interface.
2.Introduce remus-netbuf-setup hotplug script responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering in Remus.  This script is intended to be invoked
  by libxl for each guest interface, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

  The following steps are taken during init:
    a) establish a dedicated remus context containing libnl related
       state (netlink sockets)

  The following steps are taken for each vif during setup:
    a) call the hotplug script to setup its network buffer and
       init qdisc caches

    b) Obtain handles to plug qdiscs installed on the IFB devices
       chosen by the hotplug scripts.

  And during teardown, the netlink resources are released, followed by
  invocation of hotplug scripts to remove the ifb devices.
3.Implement the remus device interface. setup, teardown, etc. The
  checkpoint callbacks for netbuffer are implemented as sync op because
  net ops are quick enough and will not block.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 203 +++++++++++++
 tools/libxl/libxl.c                    |  18 ++
 tools/libxl/libxl.h                    |  13 +
 tools/libxl/libxl_internal.h           |   3 +
 tools/libxl/libxl_netbuffer.c          | 509 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  67 +++++
 tools/libxl/libxl_remus_device.c       |  19 +-
 tools/libxl/libxl_types.idl            |   2 +
 10 files changed, 837 insertions(+), 2 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 70ab7f4..039eaea 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 4874ec5..13e1f5f 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -15,6 +15,7 @@ XEN_SCRIPTS += vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..58c46f3
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,203 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB         ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+#return 0 if the ifb is free
+check_ifb() {
+    local installed=`nl-qdisc-list -d $1`
+    [ -n "$installed" ] && return 1
+
+    for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+    do
+        [ $domid -eq 0 ] && continue
+        xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+        for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true`
+        do
+            local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+            xenstore-exists $path || continue
+            local ifb=`xenstore-read "$path" 2>/dev/null || true`
+            [ "$ifb" = "$1" ] && return 1
+        done
+    done
+
+    return 0
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        check_ifb "$ifb" || continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    #not using xenstore_write that automatically exits on error
+    #because we need to cleanup
+    _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$IFB"
+        add_plug_qdisc "$vifname" "$IFB"
+        release_lock "pickifb"
+
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index f99477d..c0d2bd2 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -788,6 +788,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Convenience aliases */
     libxl__remus_state *const rs = &dss->rs;
+
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
+        if (info->netbufscript) {
+            rs->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            rs->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                libxl__xen_script_dir_path());
+        }
+    }
+
     rs->ao = ao;
     rs->domid = domid;
     rs->saved_rc = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 69ceac8..4f36f5a 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -448,6 +448,19 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
 
 /*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
+/*
  * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
  *
  * If this is defined:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5521a42..47648cd 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2578,6 +2578,7 @@ struct libxl__remus_device_state {
     /* devices that have been setuped */
     libxl__remus_device **dev;
 
+    libxl_device_nic *nics;
     int num_nics;
     int num_disks;
 
@@ -2631,6 +2632,8 @@ struct libxl__remus_state {
     libxl__ao *ao;
     uint32_t domid;
     libxl__remus_callback *callback;
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
 
     /* private */
     int saved_rc;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 8e23d75..6487ce6 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,520 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_state {
+    libxl__ao *ao;
+    uint32_t domid;
+    const char *netbufscript;
+
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+} libxl__remus_netbuf_state;
+
+typedef struct libxl__remus_device_nic {
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+                               const libxl_device_nic *nic)
+{
+    libxl__remus_netbuf_state *netbuf_state = dev->ops->data;
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = netbuf_state->domid;
+
+    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        /* use the default name */
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    /* free qdiscs */
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Now that we have brought up IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    ret = nl_cache_refill(netbuf_state->nlsock, netbuf_state->qdisc_cache);
+    if (ret < 0) {
+        LOG(ERROR, "cannot refill qdisc cache");
+        goto out;
+    }
+
+    /* get a handle to the IFB interface */
+    ifb = NULL;
+    ret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+                               remus_nic->ifb, &ifb);
+    if (ret) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(ret));
+        ret = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = ERROR_FAIL;
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+                                     TC_H_ROOT);
+
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            nl_object_put((struct nl_object *)qdisc);
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+        ret = 0;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+    }
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    return ret;
+}
+
+/*
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *netbuf_state = dev->ops->data;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    /* Convenience aliases */
+    const uint32_t domid = netbuf_state->domid;
+    const int devid = dev->devid;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /*
+     * we need to get ifb first because it's needed for teardown
+     */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            netbuf_state->netbufscript, vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(netbuf_state, remus_nic);
+
+out:
+    dev->callback(egc, dev, rc);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    int rc;
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__remus_device_nic *remus_nic = dev->data;
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    free_qdisc(remus_nic);
+
+    dev->callback(egc, dev, rc);
+}
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__async_exec_state *aes,
+                             char *op, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    STATE_AO_GC(ns->ao);
+
+    /* Convenience aliases */
+    char *const script = libxl__strdup(gc, ns->netbufscript);
+    const uint32_t domid = ns->domid;
+    const int dev_id = dev->devid;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = ns->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = netbuf_teardown_script_cb;
+    else
+        aes->callback = netbuf_setup_script_cb;
+}
+
+static int nic_init(libxl__remus_device_ops *self,
+                    libxl__remus_state *rs)
+{
+    int rc;
+    libxl__remus_netbuf_state *ns;
+
+    STATE_AO_GC(rs->ao);
+
+    GCNEW(ns);
+    self->data = ns;
+
+    ns->nlsock = nl_socket_alloc();
+    if (!ns->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        goto out;
+    }
+
+    rc = nl_connect(ns->nlsock, NETLINK_ROUTE);
+    if (rc) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(rc));
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    rc = rtnl_qdisc_alloc_cache(ns->nlsock, &ns->qdisc_cache);
+    if (rc) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(rc));
+        goto out;
+    }
+
+    ns->ao = rs->ao;
+    ns->domid = rs->domid;
+    ns->netbufscript = rs->netbufscript;
+
+    return 0;
+
+out:
+    return ERROR_FAIL;
+}
+
+static void nic_destroy(libxl__remus_device_ops *self)
+{
+    libxl__remus_netbuf_state *ns = self->data;
+
+    if (!self->data)
+        return;
+
+    /* free qdisc cache */
+    if (ns->qdisc_cache) {
+        nl_cache_clear(ns->qdisc_cache);
+        nl_cache_free(ns->qdisc_cache);
+        ns->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (ns->nlsock) {
+        nl_close(ns->nlsock);
+        nl_socket_free(ns->nlsock);
+        ns->nlsock = NULL;
+    }
+}
+
+static void async_call_done(libxl__egc *egc,
+                            libxl__ev_child *child,
+                            pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_device_state *rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+
+    if (WIFEXITED(status)) {
+        dev->callback(egc, dev, -WEXITSTATUS(status));
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void nic_match_async(const libxl__remus_device_ops *self,
+                            libxl__remus_device *dev)
+{
+    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+        _exit(0);
+
+    _exit(-ERROR_NOT_MATCH);
+}
+
+static void nic_match(libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        nic_match_async(self, dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void nic_setup(libxl__remus_device *dev)
+{
+    libxl__remus_device_nic *remus_nic;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(ns->ao);
+
+    GCNEW(remus_nic);
+    dev->data = remus_nic;
+    remus_nic->vif = get_vifname(dev, nic);
+    if (!remus_nic->vif)
+        goto out;
+
+    setup_async_exec(&dev->aes, "setup", dev);
+    if (libxl__async_exec_start(gc, &dev->aes)) {
+        goto out;
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/*
+ * Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+static void nic_teardown(libxl__remus_device *dev)
+{
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+
+    STATE_AO_GC(ns->ao);
+
+    setup_async_exec(&dev->aes, "teardown", dev);
+
+    if (libxl__async_exec_start(gc, &dev->aes)) {
+        goto out;
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+                           libxl__remus_netbuf_state *netbuf_state,
+                           int buffer_op)
+{
+    int ret;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    if (buffer_op == tc_buffer_start)
+        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (!ret) {
+        if (rtnl_qdisc_add(netbuf_state->nlsock,
+                           remus_nic->qdisc,
+                           NLM_F_REQUEST))
+            goto out;
+    }
+
+    return 0;
+
+out:
+    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+        ((buffer_op == tc_buffer_start) ?
+        "start_new_epoch" : "release_prev_epoch"),
+        remus_nic->ifb, nl_geterror(ret));
+    return ERROR_FAIL;
+}
+
+static void nic_postsuspend(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, ns, tc_buffer_start);
+    dev->callback(dev->rds->egc, dev, rc);
+}
+
+static void nic_commit(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    libxl__remus_netbuf_state *ns = dev->ops->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, ns, tc_buffer_release);
+    dev->callback(dev->rds->egc, dev, rc);
+}
+
+libxl__remus_device_ops remus_device_nic = {
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .postsuspend = nic_postsuspend,
+    .commit = nic_commit,
+    .match = nic_match,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 6aa4bf1..2ce7422 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,73 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+static void async_call_done(libxl__egc *egc,
+                            libxl__ev_child *child,
+                            pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_device_state *rds = dev->rds;
+    STATE_AO_GC(rds->ao);
+
+    if (WIFEXITED(status)) {
+        dev->callback(egc, dev, -WEXITSTATUS(status));
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void nic_match_async(const libxl__remus_device_ops *self,
+                            libxl__remus_device *dev)
+{
+    if (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+        _exit(-ERROR_FAIL);
+
+    _exit(-ERROR_NOT_MATCH);
+}
+
+static void nic_match(libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        nic_match_async(self, dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static int nic_init(libxl__remus_device_ops *self,
+                    libxl__remus_state *rs)
+{
+    return 0;
+}
+
+static void nic_destroy(libxl__remus_device_ops *self)
+{
+    return;
+}
+
+libxl__remus_device_ops remus_device_nic = {
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .match = nic_match,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 07e298b..f7b0651 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -18,7 +18,9 @@
 
 #include "libxl_internal.h"
 
+extern libxl__remus_device_ops remus_device_nic;
 static libxl__remus_device_ops *dev_ops[] = {
+    &remus_device_nic,
 };
 
 static void device_common_cb(libxl__egc *egc,
@@ -215,6 +217,13 @@ static void device_teardown_cb(libxl__egc *egc,
     rds->num_setuped--;
 
     if (rds->num_setuped == 0) {
+        /* clean nic */
+        for (i = 0; i < rds->num_nics; i++)
+            libxl_device_nic_dispose(&rds->nics[i]);
+        free(rds->nics);
+        rds->nics = NULL;
+        rds->num_nics = 0;
+
         /* clean device ops */
         for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
             ops = dev_ops[i];
@@ -224,7 +233,7 @@ static void device_teardown_cb(libxl__egc *egc,
     }
 }
 
-static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+static void libxl__remus_device_init(libxl__egc *egc,
                                      libxl__remus_device_state *rds,
                                      libxl__remus_device_kind kind,
                                      void *libxl_dev)
@@ -290,7 +299,9 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     rds->num_nics = 0;
     rds->num_disks = 0;
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* TBD: Remus setup - i.e. enable disk buffering, etc */
+    if (rs->netbufscript)
+        rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
 
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
@@ -298,6 +309,10 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
 
     /* TBD: CALL libxl__remus_device_init to init remus devices */
+    for (i = 0; i < rds->num_nics; i++) {
+        libxl__remus_device_init(egc, rds,
+                                 LIBXL__REMUS_DEVICE_NIC, &rds->nics[i]);
+    }
 
     return;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index cc5d390..4299298 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -570,6 +570,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v12 5/7] remus drbd: Implement remus drbd replicated disk
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-06-20  7:04 ` [PATCH v12 4/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 6/7] libxl: network buffering cmdline switch Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 7/7] libxl: disk " Yang Hongyang
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

Implement remus-drbd-replicated-checkpointing-disk based on
generic remus devices framework.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/block-drbd-probe |  84 ++++++++++++
 tools/libxl/Makefile                 |   2 +-
 tools/libxl/libxl_internal.h         |   1 +
 tools/libxl/libxl_remus_device.c     |  18 ++-
 tools/libxl/libxl_remus_disk_drbd.c  | 249 +++++++++++++++++++++++++++++++++++
 6 files changed, 352 insertions(+), 3 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 13e1f5f..5dd8599 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -23,6 +23,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..163ad04
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,84 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+#     2: unkown error
+#     3: the drbd device does not use protocol D
+#     4: the drbd device is not ready
+
+drbd_res=
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            drbd_res="$temp_drbd_res"
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+if [[ $? -ne 0 ]]; then
+    exit $?
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+    exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+    exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+    exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index cb2efdf..f8a3fff 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 47648cd..2424be0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2580,6 +2580,7 @@ struct libxl__remus_device_state {
 
     libxl_device_nic *nics;
     int num_nics;
+    libxl_device_disk *disks;
     int num_disks;
 
     /* for counting devices that have been handled */
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index f7b0651..d2344b0 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -19,8 +19,10 @@
 #include "libxl_internal.h"
 
 extern libxl__remus_device_ops remus_device_nic;
+extern libxl__remus_device_ops remus_device_drbd_disk;
 static libxl__remus_device_ops *dev_ops[] = {
     &remus_device_nic,
+    &remus_device_drbd_disk,
 };
 
 static void device_common_cb(libxl__egc *egc,
@@ -224,6 +226,13 @@ static void device_teardown_cb(libxl__egc *egc,
         rds->nics = NULL;
         rds->num_nics = 0;
 
+        /* clean disk */
+        for (i = 0; i < rds->num_disks; i++)
+            libxl_device_disk_dispose(&rds->disks[i]);
+        free(rds->disks);
+        rds->disks = NULL;
+        rds->num_disks = 0;
+
         /* clean device ops */
         for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
             ops = dev_ops[i];
@@ -299,21 +308,26 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     rds->num_nics = 0;
     rds->num_disks = 0;
 
-    /* TBD: Remus setup - i.e. enable disk buffering, etc */
     if (rs->netbufscript)
         rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
 
+    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
+
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
 
     GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
 
-    /* TBD: CALL libxl__remus_device_init to init remus devices */
     for (i = 0; i < rds->num_nics; i++) {
         libxl__remus_device_init(egc, rds,
                                  LIBXL__REMUS_DEVICE_NIC, &rds->nics[i]);
     }
 
+    for (i = 0; i < rds->num_disks; i++) {
+        libxl__remus_device_init(egc, rds,
+                                 LIBXL__REMUS_DEVICE_DISK, &rds->disks[i]);
+    }
+
     return;
 
 out:
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..69fdd34
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,249 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+typedef struct libxl__remus_drbd_disk {
+    libxl__remus_device remus_dev;
+    int ctl_fd;
+    int ackwait;
+    const char *path;
+} libxl__remus_drbd_disk;
+
+typedef struct libxl__remus_drbd_state {
+    libxl__ao *ao;
+    char *drbd_probe_script;
+} libxl__remus_drbd_state;
+
+static void drbd_async_call(libxl__remus_device *dev,
+                            void func(libxl__remus_device *),
+                            libxl__ev_child_callback callback)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void chekpoint_async_call_done(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_drbd_disk *rdd = dev->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    if (WIFEXITED(status)) {
+        rdd->ackwait = WEXITSTATUS(status);
+        dev->callback(egc, dev, 0);
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+/* this op will not wait and block, so implement as sync op */
+static void drbd_postsuspend(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->data;
+
+    if (!rdd->ackwait) {
+        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            rdd->ackwait = 1;
+    }
+
+    dev->callback(dev->rds->egc, dev, 0);
+}
+
+static void drbd_preresume_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->data;
+    int ackwait = rdd->ackwait;
+
+    if (ackwait) {
+        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        ackwait = 0;
+    }
+
+    _exit(ackwait);
+}
+
+static void drbd_preresume(libxl__remus_device *dev)
+{
+    drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
+}
+
+static int drbd_init(libxl__remus_device_ops *self,
+                     libxl__remus_state *rs)
+{
+    libxl__remus_drbd_state *drbd_state;
+
+    STATE_AO_GC(rs->ao);
+
+    GCNEW(drbd_state);
+    self->data = drbd_state;
+    drbd_state->ao = ao;
+    drbd_state->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                              libxl__xen_script_dir_path());
+
+
+    return 0;
+}
+
+static void drbd_destroy(libxl__remus_device_ops *self)
+{
+    return;
+}
+
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+
+    if (status) {
+        dev->callback(egc, dev, ERROR_NOT_MATCH);
+    } else {
+        dev->callback(egc, dev, 0);
+    }
+}
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0;
+    const libxl_device_disk *disk = dev->backend_dev;
+    libxl__remus_drbd_state *drbd_state = dev->ops->data;
+    libxl__async_exec_state *aes = &dev->aes;
+    STATE_AO_GC(drbd_state->ao);
+
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(aes->env, arraysize);
+    aes->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(aes->args, arraysize);
+    aes->args[nr++] = drbd_state->drbd_probe_script;
+    aes->args[nr++] = disk->pdev_path;
+    aes->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    aes->ao = drbd_state->ao;
+    aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->callback = match_async_exec_cb;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (libxl__async_exec_start(gc, aes))
+        goto out;
+
+    return;
+
+out:
+    dev->callback(egc, dev, ERROR_FAIL);
+}
+
+static void match_async_call_done(libxl__egc *egc,
+                                  libxl__ev_child *child,
+                                  pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    STATE_AO_GC(dev->rds->ao);
+
+    if (WIFEXITED(status)) {
+        if (-WEXITSTATUS(status) == ERROR_NOT_MATCH) {
+            dev->callback(egc, dev, ERROR_NOT_MATCH);
+        } else {
+            match_async_exec(egc, dev);
+        }
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+static void drbd_match_async(libxl__remus_device *dev)
+{
+    if (dev->kind != LIBXL__REMUS_DEVICE_DISK)
+        _exit(-ERROR_NOT_MATCH);
+
+    _exit(0);
+}
+
+static void drbd_match(libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+{
+    drbd_async_call(dev, drbd_match_async, match_async_call_done);
+}
+
+static void drbd_setup(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk;
+    const libxl_device_disk *disk = dev->backend_dev;
+    STATE_AO_GC(dev->rds->ao);
+
+    GCNEW(drbd_disk);
+    dev->data = drbd_disk;
+    drbd_disk->path = disk->pdev_path;
+    drbd_disk->ackwait = 0;
+    drbd_disk->ctl_fd = open(drbd_disk->path, O_RDONLY);
+    if (drbd_disk->ctl_fd < 0)
+        dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+    else
+        dev->callback(dev->rds->egc, dev, 0);
+}
+
+static void drbd_teardown(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk = dev->data;
+
+    close(drbd_disk->ctl_fd);
+    dev->callback(dev->rds->egc, dev, 0);
+}
+
+libxl__remus_device_ops remus_device_drbd_disk = {
+    .init = drbd_init,
+    .destroy = drbd_destroy,
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+    .match = drbd_match,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v12 6/7] libxl: network buffering cmdline switch
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-06-20  7:04 ` [PATCH v12 5/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  2014-06-20  7:04 ` [PATCH v12 7/7] libxl: disk " Yang Hongyang
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

Command line switch to 'xl remus' command.
Also update man pages to reflect the addition of a new option to
'xl remus' command.

Note: the network buffering is enabled as default. If you want to
disable it, please use -n option.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5    |  6 ++++++
 docs/man/xl.pod.1         | 11 ++++++++++-
 tools/libxl/xl.c          |  4 ++++
 tools/libxl/xl.h          |  1 +
 tools/libxl/xl_cmdimpl.c  | 28 ++++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c |  3 +++
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 30bd4bf..e9e4a83 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -431,7 +431,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     Disk buffering support is limited to drbd disks.
 
 B<OPTIONS>
 
@@ -450,6 +450,15 @@ Generally useful for debugging.
 
 Disable memory checkpoint compression.
 
+=item B<-n>
+
+Disable network output buffering.
+
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the instead of
+the default (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 4c5a5ee..f014306 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -176,6 +177,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+        &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..087eb8c 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -170,6 +170,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index be041f2..1a6119a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7201,8 +7201,9 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     r_info.blackhole = 0;
     r_info.compression = 1;
+    r_info.netbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7212,6 +7213,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         r_info.compression = 0;
         break;
+    case 'n':
+        r_info.netbuf = 0;
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7223,6 +7230,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (r_info.blackhole) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7260,13 +7270,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4279b9f..3f7520d 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -487,6 +487,9 @@ struct cmd_spec cmd_table[] = {
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
+      "-n                      Disable network output buffering.\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v12 7/7] libxl: disk buffering cmdline switch
  2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-06-20  7:04 ` [PATCH v12 6/7] libxl: network buffering cmdline switch Yang Hongyang
@ 2014-06-20  7:04 ` Yang Hongyang
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Hongyang @ 2014-06-20  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, eddie.dong, rshriram, roger.pau,
	laijs

disk buffering is enabled as default.
add an option "-d" to disable it.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/man/xl.pod.1                | 4 ++++
 tools/libxl/libxl.c              | 3 +++
 tools/libxl/libxl_internal.h     | 1 +
 tools/libxl/libxl_remus_device.c | 3 ++-
 tools/libxl/libxl_types.idl      | 1 +
 tools/libxl/xl_cmdimpl.c         | 4 ++++
 tools/libxl/xl_cmdtable.c        | 1 +
 7 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index e9e4a83..547e8a2 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -459,6 +459,10 @@ Disable network output buffering.
 Use <netbufscript> to setup network buffering instead of the instead of
 the default (/etc/xen/scripts/remus-netbuf-setup).
 
+=item B<-d>
+
+Disable disk buffering.
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c0d2bd2..85b6e9a 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -806,6 +806,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         }
     }
 
+    /* Setup disk buffering */
+    rs->diskbuf = info->diskbuf;
+
     rs->ao = ao;
     rs->domid = domid;
     rs->saved_rc = 0;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2424be0..a9f0b01 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2635,6 +2635,7 @@ struct libxl__remus_state {
     libxl__remus_callback *callback;
     /* Script to setup/teardown network buffers */
     const char *netbufscript;
+    bool diskbuf;
 
     /* private */
     int saved_rc;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index d2344b0..b1cae84 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -311,7 +311,8 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     if (rs->netbufscript)
         rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
 
-    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
+    if (rs->diskbuf)
+        rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
 
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4299298..dfaa137 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -572,6 +572,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("compression",  bool),
     ("netbuf",       bool),
     ("netbufscript", string),
+    ("diskbuf",      bool),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 1a6119a..ce9921c 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7202,6 +7202,7 @@ int main_remus(int argc, char **argv)
     r_info.blackhole = 0;
     r_info.compression = 1;
     r_info.netbuf = 1;
+    r_info.diskbuf = 1;
 
     SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
@@ -7219,6 +7220,9 @@ int main_remus(int argc, char **argv)
     case 'N':
         r_info.netbufscript = optarg;
         break;
+    case 'd':
+        r_info.diskbuf = 0;
+        break;
     case 's':
         ssh_command = optarg;
         break;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 3f7520d..246aa11 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -490,6 +490,7 @@ struct cmd_spec cmd_table[] = {
       "-n                      Disable network output buffering.\n"
       "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
       "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
+      "-d                      Disable disk buffering.\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support
  2014-06-20  7:04 ` [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
@ 2014-06-20 14:43   ` Konrad Rzeszutek Wilk
  2014-06-21  6:17     ` Hongyang Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-20 14:43 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, xen-devel, eddie.dong, rshriram,
	laijs, roger.pau

On Fri, Jun 20, 2014 at 03:04:27PM +0800, Yang Hongyang wrote:
> Libnl3 is required for controlling Remus network buffering.
> This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
> Also provide ability to configure tools without libnl3 support, that
> is without network buffering support.
> 
> when there's no network buffering support,libxl__netbuffer_enabled()
> returns 0, otherwise returns 1. The callers of this api will be
> introduced in the rest of the series.
> 
> NOTE: This patch changes tools/configure.ac, please rerun
>       autogen.sh while apply the patch.
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  README                          |  4 ++++
>  config/Tools.mk.in              |  4 ++++
>  tools/configure.ac              | 16 ++++++++++++++++
>  tools/libxl/Makefile            | 13 +++++++++++++
>  tools/libxl/libxl_internal.h    |  1 +
>  tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
>  tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
>  tools/remus/README              |  6 ++++++
>  8 files changed, 106 insertions(+)
>  create mode 100644 tools/libxl/libxl_netbuffer.c
>  create mode 100644 tools/libxl/libxl_nonetbuffer.c
> 
> diff --git a/README b/README
> index 9bbe734..e770932 100644
> --- a/README
> +++ b/README
> @@ -72,6 +72,10 @@ disabled at compile time:
>      * cmake (if building vtpm stub domains)
>      * markdown
>      * figlet (for generating the traditional Xen start of day banner)
> +    * Development install of libnl3 (e.g., libnl-3-200,
> +      libnl-3-dev, etc).  Required if network buffering is desired
> +      when using Remus with libxl.  See tools/remus/README for detailed
> +      information.
>  
>  Second, you need to acquire a suitable kernel for use in domain 0. If
>  possible you should use a kernel provided by your OS distributor. If
> diff --git a/config/Tools.mk.in b/config/Tools.mk.in
> index 84b2612..06c9d25 100644
> --- a/config/Tools.mk.in
> +++ b/config/Tools.mk.in
> @@ -38,6 +38,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
>  
>  PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
>  
> +LIBNL3_LIBS         := @LIBNL3_LIBS@
> +LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
> +
>  # Download GIT repositories via HTTP or GIT's own protocol?
>  # GIT's protocol is faster and more robust, when it works at all (firewalls
>  # may block it). We make it the default, but if your GIT repository downloads
> @@ -56,6 +59,7 @@ CONFIG_QEMU_XEN     := @qemu_xen@
>  CONFIG_BLKTAP1      := @blktap1@
>  CONFIG_VTPM         := @vtpm@
>  CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
> +CONFIG_REMUS_NETBUF := @remus_netbuf@
>  
>  #System options
>  ZLIB                := @zlib@
> diff --git a/tools/configure.ac b/tools/configure.ac
> index 9db798b..6deed8f 100644
> --- a/tools/configure.ac
> +++ b/tools/configure.ac
> @@ -267,5 +267,21 @@ esac
>  # Checks for header files.
>  AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
>  
> +# Check for libnl3 >=3.2.8. If present enable remus network buffering.
> +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
> +    [libnl3_lib="y"], [libnl3_lib="n"])
> +
> +AS_IF([test "x$libnl3_lib" = "xn" ], [
> +    AC_MSG_WARN([Disabling support for Remus network buffering.
> +    Please install libnl3 libraries, command line tools and devel
> +    headers - version 3.2.8 or higher])
> +    AC_SUBST(remus_netbuf, [n])
> +    ],[
> +    AC_SUBST(remus_netbuf, [y])
> +])
> +
> +AC_SUBST(LIBNL3_LIBS)
> +AC_SUBST(LIBNL3_CFLAGS)
> +
>  AC_OUTPUT()
>  
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 7fc42c8..fdffff3 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -21,11 +21,17 @@ endif
>  
>  LIBXL_LIBS =
>  LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +LIBXL_LIBS += $(LIBNL3_LIBS)
> +endif
>  
>  CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
>  CFLAGS_LIBXL += $(CFLAGS_libxenguest)
>  CFLAGS_LIBXL += $(CFLAGS_libxenstore)
>  CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
> +endif
>  CFLAGS_LIBXL += -Wshadow
>  
>  LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
> @@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
>  else
>  LIBXL_OBJS-y += libxl_noblktap2.o
>  endif
> +
> +ifeq ($(CONFIG_REMUS_NETBUF),y)
> +LIBXL_OBJS-y += libxl_netbuffer.o
> +else
> +LIBXL_OBJS-y += libxl_nonetbuffer.o
> +endif
> +
>  LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
>  LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index a0d4f24..3fc90e2 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2470,6 +2470,7 @@ typedef struct libxl__save_helper_state {
>                        * marshalling and xc callback functions */
>  } libxl__save_helper_state;
>  
> +_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>  
>  /*----- Domain suspend (save) state structure -----*/
>  
> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
> new file mode 100644
> index 0000000..8e23d75
> --- /dev/null
> +++ b/tools/libxl/libxl_netbuffer.c
> @@ -0,0 +1,31 @@
> +/*
> + * Copyright (C) 2013

.. There should be something past 2013. And you can just make
it 2014

> + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +
> +#include "libxl_internal.h"
> +
> +int libxl__netbuffer_enabled(libxl__gc *gc)
> +{
> +    return 1;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
> new file mode 100644
> index 0000000..6aa4bf1
> --- /dev/null
> +++ b/tools/libxl/libxl_nonetbuffer.c
> @@ -0,0 +1,31 @@
> +/*
> + * Copyright (C) 2013

Ditto.
> + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +
> +#include "libxl_internal.h"
> +
> +int libxl__netbuffer_enabled(libxl__gc *gc)
> +{
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/remus/README b/tools/remus/README
> index 9e8140b..4736252 100644
> --- a/tools/remus/README
> +++ b/tools/remus/README
> @@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
>  checkpoints to a backup, which will activate if the target VM fails.
>  
>  See the website at http://nss.cs.ubc.ca/remus/ for details.
> +
> +Using Remus with libxl on Xen 4.4 and higher:

Might want to say Xen 4.5 and higher since this is for Xen 4.5
> + To enable network buffering, you need libnl 3.2.8
> + or higher along with the development headers and command line utilities.
> + If your distro does not have the appropriate libnl3 version, you can find
> + the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support
  2014-06-20 14:43   ` Konrad Rzeszutek Wilk
@ 2014-06-21  6:17     ` Hongyang Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Hongyang Yang @ 2014-06-21  6:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, ian.jackson, xen-devel, eddie.dong, rshriram,
	laijs, roger.pau

Thank you Konrad, will fix those in next version.

On 06/20/2014 10:43 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 20, 2014 at 03:04:27PM +0800, Yang Hongyang wrote:
>> Libnl3 is required for controlling Remus network buffering.
>> This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
>> Also provide ability to configure tools without libnl3 support, that
>> is without network buffering support.
>>
>> when there's no network buffering support,libxl__netbuffer_enabled()
>> returns 0, otherwise returns 1. The callers of this api will be
>> introduced in the rest of the series.
>>
>> NOTE: This patch changes tools/configure.ac, please rerun
>>        autogen.sh while apply the patch.
>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>   README                          |  4 ++++
>>   config/Tools.mk.in              |  4 ++++
>>   tools/configure.ac              | 16 ++++++++++++++++
>>   tools/libxl/Makefile            | 13 +++++++++++++
>>   tools/libxl/libxl_internal.h    |  1 +
>>   tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
>>   tools/remus/README              |  6 ++++++
>>   8 files changed, 106 insertions(+)
>>   create mode 100644 tools/libxl/libxl_netbuffer.c
>>   create mode 100644 tools/libxl/libxl_nonetbuffer.c
>>
>> diff --git a/README b/README
>> index 9bbe734..e770932 100644
>> --- a/README
>> +++ b/README
>> @@ -72,6 +72,10 @@ disabled at compile time:
>>       * cmake (if building vtpm stub domains)
>>       * markdown
>>       * figlet (for generating the traditional Xen start of day banner)
>> +    * Development install of libnl3 (e.g., libnl-3-200,
>> +      libnl-3-dev, etc).  Required if network buffering is desired
>> +      when using Remus with libxl.  See tools/remus/README for detailed
>> +      information.
>>
>>   Second, you need to acquire a suitable kernel for use in domain 0. If
>>   possible you should use a kernel provided by your OS distributor. If
>> diff --git a/config/Tools.mk.in b/config/Tools.mk.in
>> index 84b2612..06c9d25 100644
>> --- a/config/Tools.mk.in
>> +++ b/config/Tools.mk.in
>> @@ -38,6 +38,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
>>
>>   PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
>>
>> +LIBNL3_LIBS         := @LIBNL3_LIBS@
>> +LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
>> +
>>   # Download GIT repositories via HTTP or GIT's own protocol?
>>   # GIT's protocol is faster and more robust, when it works at all (firewalls
>>   # may block it). We make it the default, but if your GIT repository downloads
>> @@ -56,6 +59,7 @@ CONFIG_QEMU_XEN     := @qemu_xen@
>>   CONFIG_BLKTAP1      := @blktap1@
>>   CONFIG_VTPM         := @vtpm@
>>   CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
>> +CONFIG_REMUS_NETBUF := @remus_netbuf@
>>
>>   #System options
>>   ZLIB                := @zlib@
>> diff --git a/tools/configure.ac b/tools/configure.ac
>> index 9db798b..6deed8f 100644
>> --- a/tools/configure.ac
>> +++ b/tools/configure.ac
>> @@ -267,5 +267,21 @@ esac
>>   # Checks for header files.
>>   AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
>>
>> +# Check for libnl3 >=3.2.8. If present enable remus network buffering.
>> +PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
>> +    [libnl3_lib="y"], [libnl3_lib="n"])
>> +
>> +AS_IF([test "x$libnl3_lib" = "xn" ], [
>> +    AC_MSG_WARN([Disabling support for Remus network buffering.
>> +    Please install libnl3 libraries, command line tools and devel
>> +    headers - version 3.2.8 or higher])
>> +    AC_SUBST(remus_netbuf, [n])
>> +    ],[
>> +    AC_SUBST(remus_netbuf, [y])
>> +])
>> +
>> +AC_SUBST(LIBNL3_LIBS)
>> +AC_SUBST(LIBNL3_CFLAGS)
>> +
>>   AC_OUTPUT()
>>
>> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
>> index 7fc42c8..fdffff3 100644
>> --- a/tools/libxl/Makefile
>> +++ b/tools/libxl/Makefile
>> @@ -21,11 +21,17 @@ endif
>>
>>   LIBXL_LIBS =
>>   LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
>> +ifeq ($(CONFIG_REMUS_NETBUF),y)
>> +LIBXL_LIBS += $(LIBNL3_LIBS)
>> +endif
>>
>>   CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
>>   CFLAGS_LIBXL += $(CFLAGS_libxenguest)
>>   CFLAGS_LIBXL += $(CFLAGS_libxenstore)
>>   CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
>> +ifeq ($(CONFIG_REMUS_NETBUF),y)
>> +CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
>> +endif
>>   CFLAGS_LIBXL += -Wshadow
>>
>>   LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
>> @@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
>>   else
>>   LIBXL_OBJS-y += libxl_noblktap2.o
>>   endif
>> +
>> +ifeq ($(CONFIG_REMUS_NETBUF),y)
>> +LIBXL_OBJS-y += libxl_netbuffer.o
>> +else
>> +LIBXL_OBJS-y += libxl_nonetbuffer.o
>> +endif
>> +
>>   LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
>>   LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
>>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index a0d4f24..3fc90e2 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -2470,6 +2470,7 @@ typedef struct libxl__save_helper_state {
>>                         * marshalling and xc callback functions */
>>   } libxl__save_helper_state;
>>
>> +_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
>>
>>   /*----- Domain suspend (save) state structure -----*/
>>
>> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
>> new file mode 100644
>> index 0000000..8e23d75
>> --- /dev/null
>> +++ b/tools/libxl/libxl_netbuffer.c
>> @@ -0,0 +1,31 @@
>> +/*
>> + * Copyright (C) 2013
>
> .. There should be something past 2013. And you can just make
> it 2014
>
>> + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU Lesser General Public License as published
>> + * by the Free Software Foundation; version 2.1 only. with the special
>> + * exception on linking described in file LICENSE.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU Lesser General Public License for more details.
>> + */
>> +
>> +#include "libxl_osdeps.h" /* must come before any other headers */
>> +
>> +#include "libxl_internal.h"
>> +
>> +int libxl__netbuffer_enabled(libxl__gc *gc)
>> +{
>> +    return 1;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-basic-offset: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
>> new file mode 100644
>> index 0000000..6aa4bf1
>> --- /dev/null
>> +++ b/tools/libxl/libxl_nonetbuffer.c
>> @@ -0,0 +1,31 @@
>> +/*
>> + * Copyright (C) 2013
>
> Ditto.
>> + * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU Lesser General Public License as published
>> + * by the Free Software Foundation; version 2.1 only. with the special
>> + * exception on linking described in file LICENSE.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU Lesser General Public License for more details.
>> + */
>> +
>> +#include "libxl_osdeps.h" /* must come before any other headers */
>> +
>> +#include "libxl_internal.h"
>> +
>> +int libxl__netbuffer_enabled(libxl__gc *gc)
>> +{
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-basic-offset: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/tools/remus/README b/tools/remus/README
>> index 9e8140b..4736252 100644
>> --- a/tools/remus/README
>> +++ b/tools/remus/README
>> @@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
>>   checkpoints to a backup, which will activate if the target VM fails.
>>
>>   See the website at http://nss.cs.ubc.ca/remus/ for details.
>> +
>> +Using Remus with libxl on Xen 4.4 and higher:
>
> Might want to say Xen 4.5 and higher since this is for Xen 4.5
>> + To enable network buffering, you need libnl 3.2.8
>> + or higher along with the development headers and command line utilities.
>> + If your distro does not have the appropriate libnl3 version, you can find
>> + the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 3/7] remus: introduce remus device
  2014-06-20  7:04 ` [PATCH v12 3/7] remus: introduce remus device Yang Hongyang
@ 2014-06-27 17:38   ` Ian Jackson
  2014-06-30  5:02     ` Hongyang Yang
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ian Jackson @ 2014-06-27 17:38 UTC (permalink / raw)
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, roger.pau, laijs

Yang Hongyang writes ("[PATCH v12 3/7] remus: introduce remus device"):
> introduce remus device, an abstract layer of remus devices(nic, disk,
> etc).It provides the following APIs for libxl:

Thanks for this.  Things are much clearer for me now.  I have some
more detailed comments.

> +struct libxl__remus_device_ops {
> +    /*
> +     * init() and destroy() APIs are produced by a device type and
> +     * consumed by the main remus code, a device type must implement
> +     * these two APIs.
> +     */
> +    /* init device ops private data, etc. must implement */
> +    int (*init)(libxl__remus_device_ops *self,
> +                libxl__remus_state *rs);
> +    /* free device ops private data, etc. must implement */
> +    void (*destroy)(libxl__remus_device_ops *self);
> +    /*
> +     * This is device ops's private data, for different device types,
> +     * the data structs are different
> +     */
> +    void *data;

I see that in 4/7 you use this to store global state; it's not const.

libxl is not supposed to to have any global state that's not hung off
the libxl_ctx (or some similar) structure.  So I think things like the
NIC data need to be in libxl_ctx.

You have two reasonable options, I think: either, include the
variables you need directly in the libxl_ctx.  Or, make libxl_ctx
contain a pointer to a struct which is declared in libxl_internal.h
but only defined in (say) libxl_netbuffer.c.  The latter is probably
better because it more easily allows different
platorms'/implementations' versions of the same device kind to have
different variables.

It is IMO fine for the different devices to each have their own
member in libxl_ctx.


All the libxl__remus_device_ops structs need to be const.


Taking as an example:

> +    if (rds->num_devices == rds->num_setuped)

Can you please write "num_set_up" ?  I'm afraid that "setuped" isn't a
word in English and it reads very oddly.  Thanks.


> +    /*
> +     * This is for matching, we must go through all device ops until we
> +     * find a matched op for the device. The ops_index record which ops
> +     * we are matching.
> +     */
> +    int ops_index;
> +    libxl__remus_device_ops *ops;
> +    libxl__remus_device_callback *callback;
> +    libxl__remus_device_state *rds;

I think this is confusing.  Are all of these private for the abstract
layer ?  I think the remus concrete device ought to be allowed to use
libxl__remus_device_ops (which needs to be const).

It is more important to say who owns a struct field (who is
allowed to read it; who is responsible for setting it, etc.) than to
say precisely what it is for.

If you're going to have both kinds of comments in the same struct
(that is, both sections which have particular ownership and individual
usage comments) it would be helpful to make ownership section comments
more obvious.

Perhaps something like:

  struct libxl__remus_device {
      /*----- shared between abstract and concrete layers -----*/
      /* set by remus device abstruct layer */
      int devid;
      /* libxl__device_* which this remus device related to */
      const void *backend_dev;
      libxl__remus_device_kind kind;

      /*----- private for abstract layer only -----*/
      /*
       * This is for matching, we must go through all device ops until we
       * find a matched op for the device. The ops_index record which ops
       * we are matching.
       */
      int ops_index;
      libxl__remus_device_ops *ops;
      libxl__remus_device_callback *callback;
      libxl__remus_device_state *rds;

      /*----- private for concrete (device-specific) layer -----*/
      /* *kind* of device's private data */
      void *data;
      /* for calling scripts, eg. setup|teardown|match scripts */
      libxl__async_exec_state aes;
      /*
       * for async func calls, in the implenmentation of device ops, we
       * may use fork to do async ops. this is owned by device-specific
       * ops methods
       */
      libxl__ev_child child;
  };

Writing it like that shows another anomaly: why do we have fields in
the libxl__remus_device struct that are private for the device
specific layer ?

After all the device-specific layer has "data", which could contain
anything it likes.



Why is the libxl__remus_device_state a separate structure ?
Can it be combined with libxl__remus_device_state ?


> +static libxl__remus_device_ops *dev_ops[] = {
> +};

As I say, this needs to be const.


> +static void device_common_cb(libxl__egc *egc,
> +                             libxl__remus_device *dev,
> +                             int rc)

When we write this kind of callback threaded code, we write it in
chronological order.  That is:
 - banner comment at the top
 - necessary forward declarations (and data types)
 - entrypoint
 - callbacks, in order in which they are called
 - banner comment for next section

See for example the utilities in libxl_aoutils.c.

Or for a bigger example see libxl_bootloader.c, in particular the
section after the comment "main flow of control" near line 304.

> +static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
> +                                     libxl__remus_device_state *rds,
> +                                     libxl__remus_device_kind kind,
> +                                     void *libxl_dev)
...
> +    switch (kind) {
> +        case LIBXL__REMUS_DEVICE_NIC:
> +            nic = libxl_dev;
> +            dev->devid = nic->devid;
> +            break;
> +        case LIBXL__REMUS_DEVICE_DISK:
> +            disk = libxl_dev;
> +            /* there are no dev id for disk devices */
> +            dev->devid = -1;
> +            break;

Wouldn't it be better to move this into the concrete (device type
specific) code ?  After all you're switching on the device type
already, and the device type specific code has all the same
information ?

Or am I missing something ?

> +void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
> +{
> +    int i;
> +    libxl__remus_device *dev;
> +    libxl__remus_device_ops *ops;
> +
> +    STATE_AO_GC(rs->ao);
> +
> +    /* Convenience aliases */
> +    libxl__remus_device_state *const rds = &rs->dev_state;
> +
> +    if (rds->num_setuped == 0) {
> +        /* clean device ops */
> +        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
> +            ops = dev_ops[i];
> +            ops->destroy(ops);
> +        }
> +        goto out;
> +    }

I was a bit confused by the nonlinear flow of control in this file,
but it is definitely clear to me that you have two copies of this
loop.

Why don't you just make the callback to your own callback function
yourself (under appropriate conditions) ?

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 1018142..cc5d390 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>      (-12, "OSEVENT_REG_FAIL"),
>      (-13, "BUFFERFULL"),
>      (-14, "UNKNOWN_CHILD"),
> +    (-15, "NOT_MATCH"),

I still think this probably wants to be a better error message.

REMUS_UNSUPPORTED_DEVICE ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 3/7] remus: introduce remus device
  2014-06-27 17:38   ` Ian Jackson
@ 2014-06-30  5:02     ` Hongyang Yang
  2014-07-01  2:04       ` Wen Congyang
  2014-07-01  3:48     ` Hongyang Yang
  2014-07-01  8:31     ` Hongyang Yang
  2 siblings, 1 reply; 15+ messages in thread
From: Hongyang Yang @ 2014-06-30  5:02 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, laijs, roger.pau

Hi Ian,
   Thanks for the review!

On 06/28/2014 01:38 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v12 3/7] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provides the following APIs for libxl:
>
> Thanks for this.  Things are much clearer for me now.  I have some
> more detailed comments.
>
>> +struct libxl__remus_device_ops {
>> +    /*
>> +     * init() and destroy() APIs are produced by a device type and
>> +     * consumed by the main remus code, a device type must implement
>> +     * these two APIs.
>> +     */
>> +    /* init device ops private data, etc. must implement */
>> +    int (*init)(libxl__remus_device_ops *self,
>> +                libxl__remus_state *rs);
>> +    /* free device ops private data, etc. must implement */
>> +    void (*destroy)(libxl__remus_device_ops *self);
>> +    /*
>> +     * This is device ops's private data, for different device types,
>> +     * the data structs are different
>> +     */
>> +    void *data;
>
> I see that in 4/7 you use this to store global state; it's not const.
>
> libxl is not supposed to to have any global state that's not hung off
> the libxl_ctx (or some similar) structure.  So I think things like the
> NIC data need to be in libxl_ctx.
>
> You have two reasonable options, I think: either, include the
> variables you need directly in the libxl_ctx.  Or, make libxl_ctx
> contain a pointer to a struct which is declared in libxl_internal.h
> but only defined in (say) libxl_netbuffer.c.  The latter is probably
> better because it more easily allows different
> platorms'/implementations' versions of the same device kind to have
> different variables.

I will take option 2.

>
> It is IMO fine for the different devices to each have their own
> member in libxl_ctx.
>
>
> All the libxl__remus_device_ops structs need to be const.
>
>
> Taking as an example:
>
>> +    if (rds->num_devices == rds->num_setuped)
>
> Can you please write "num_set_up" ?  I'm afraid that "setuped" isn't a
> word in English and it reads very oddly.  Thanks.

Sorry for the bad English...will fix that.

>
>
>> +    /*
>> +     * This is for matching, we must go through all device ops until we
>> +     * find a matched op for the device. The ops_index record which ops
>> +     * we are matching.
>> +     */
>> +    int ops_index;
>> +    libxl__remus_device_ops *ops;
>> +    libxl__remus_device_callback *callback;
>> +    libxl__remus_device_state *rds;
>
> I think this is confusing.  Are all of these private for the abstract
> layer ?  I think the remus concrete device ought to be allowed to use
> libxl__remus_device_ops (which needs to be const).
>
> It is more important to say who owns a struct field (who is
> allowed to read it; who is responsible for setting it, etc.) than to
> say precisely what it is for.
>
> If you're going to have both kinds of comments in the same struct
> (that is, both sections which have particular ownership and individual
> usage comments) it would be helpful to make ownership section comments
> more obvious.
>
> Perhaps something like:
>
>    struct libxl__remus_device {
>        /*----- shared between abstract and concrete layers -----*/
>        /* set by remus device abstruct layer */
>        int devid;
>        /* libxl__device_* which this remus device related to */
>        const void *backend_dev;
>        libxl__remus_device_kind kind;
>
>        /*----- private for abstract layer only -----*/
>        /*
>         * This is for matching, we must go through all device ops until we
>         * find a matched op for the device. The ops_index record which ops
>         * we are matching.
>         */
>        int ops_index;
>        libxl__remus_device_ops *ops;
>        libxl__remus_device_callback *callback;
>        libxl__remus_device_state *rds;
>
>        /*----- private for concrete (device-specific) layer -----*/
>        /* *kind* of device's private data */
>        void *data;
>        /* for calling scripts, eg. setup|teardown|match scripts */
>        libxl__async_exec_state aes;
>        /*
>         * for async func calls, in the implenmentation of device ops, we
>         * may use fork to do async ops. this is owned by device-specific
>         * ops methods
>         */
>        libxl__ev_child child;
>    };
>
> Writing it like that shows another anomaly: why do we have fields in
> the libxl__remus_device struct that are private for the device
> specific layer ?
>
> After all the device-specific layer has "data", which could contain
> anything it likes.
>
>
>
> Why is the libxl__remus_device_state a separate structure ?
> Can it be combined with libxl__remus_device_state ?

I think you mean combined with libxl__remus_device here, yes, it can be
combined because it is a member of libxl__remus_device, I separate it
because I want to make the structure more clear, but seems it is
confusing here, I will combine it in the next version.

>
>
>> +static libxl__remus_device_ops *dev_ops[] = {
>> +};
>
> As I say, this needs to be const.

Sure, thanks.

>
>
>> +static void device_common_cb(libxl__egc *egc,
>> +                             libxl__remus_device *dev,
>> +                             int rc)
>
> When we write this kind of callback threaded code, we write it in
> chronological order.  That is:
>   - banner comment at the top
>   - necessary forward declarations (and data types)
>   - entrypoint
>   - callbacks, in order in which they are called
>   - banner comment for next section
>
> See for example the utilities in libxl_aoutils.c.
>
> Or for a bigger example see libxl_bootloader.c, in particular the
> section after the comment "main flow of control" near line 304.

OK, thank you!

>
>> +static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
>> +                                     libxl__remus_device_state *rds,
>> +                                     libxl__remus_device_kind kind,
>> +                                     void *libxl_dev)
> ...
>> +    switch (kind) {
>> +        case LIBXL__REMUS_DEVICE_NIC:
>> +            nic = libxl_dev;
>> +            dev->devid = nic->devid;
>> +            break;
>> +        case LIBXL__REMUS_DEVICE_DISK:
>> +            disk = libxl_dev;
>> +            /* there are no dev id for disk devices */
>> +            dev->devid = -1;
>> +            break;
>
> Wouldn't it be better to move this into the concrete (device type
> specific) code ?  After all you're switching on the device type
> already, and the device type specific code has all the same
> information ?

Do you mean to have two functions like:
libxl__remus_nic_init
libxl__remus_disk_init
There are some common code in this init function, if we separate
out, there should be two copy of those common codes, or I
could separate those common codes into a common function.

>
> Or am I missing something ?
>
>> +void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
>> +{
>> +    int i;
>> +    libxl__remus_device *dev;
>> +    libxl__remus_device_ops *ops;
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* Convenience aliases */
>> +    libxl__remus_device_state *const rds = &rs->dev_state;
>> +
>> +    if (rds->num_setuped == 0) {
>> +        /* clean device ops */
>> +        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>> +            ops = dev_ops[i];
>> +            ops->destroy(ops);
>> +        }
>> +        goto out;
>> +    }
>
> I was a bit confused by the nonlinear flow of control in this file,
> but it is definitely clear to me that you have two copies of this
> loop.
>
> Why don't you just make the callback to your own callback function
> yourself (under appropriate conditions) ?

Good suggestion, thank you.

>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 1018142..cc5d390 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "NOT_MATCH"),
>
> I still think this probably wants to be a better error message.
>
> REMUS_UNSUPPORTED_DEVICE ?

I think it does not mean unsupported device here. when we return not match, it
means this devices ops does not match the given device, we then should
try to match next devices ops with the given device, so I think
REMUS_DEVOPS_NOT_MATCH should be better? What do you think?

>
> Thanks,
> Ian.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 3/7] remus: introduce remus device
  2014-06-30  5:02     ` Hongyang Yang
@ 2014-07-01  2:04       ` Wen Congyang
  0 siblings, 0 replies; 15+ messages in thread
From: Wen Congyang @ 2014-07-01  2:04 UTC (permalink / raw)
  To: Hongyang Yang, Ian Jackson
  Cc: ian.campbell, stefano.stabellini, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, rshriram, laijs, roger.pau

At 06/30/2014 01:02 PM, Hongyang Yang Wrote:
> Hi Ian,
>   Thanks for the review!
> 
> On 06/28/2014 01:38 AM, Ian Jackson wrote:
>> Yang Hongyang writes ("[PATCH v12 3/7] remus: introduce remus device"):
>>> introduce remus device, an abstract layer of remus devices(nic, disk,
>>> etc).It provides the following APIs for libxl:
>>
>> Thanks for this.  Things are much clearer for me now.  I have some
>> more detailed comments.
>>
>>> +struct libxl__remus_device_ops {
>>> +    /*
>>> +     * init() and destroy() APIs are produced by a device type and
>>> +     * consumed by the main remus code, a device type must implement
>>> +     * these two APIs.
>>> +     */
>>> +    /* init device ops private data, etc. must implement */
>>> +    int (*init)(libxl__remus_device_ops *self,
>>> +                libxl__remus_state *rs);
>>> +    /* free device ops private data, etc. must implement */
>>> +    void (*destroy)(libxl__remus_device_ops *self);
>>> +    /*
>>> +     * This is device ops's private data, for different device types,
>>> +     * the data structs are different
>>> +     */
>>> +    void *data;
>>
>> I see that in 4/7 you use this to store global state; it's not const.
>>
>> libxl is not supposed to to have any global state that's not hung off
>> the libxl_ctx (or some similar) structure.  So I think things like the
>> NIC data need to be in libxl_ctx.
>>
>> You have two reasonable options, I think: either, include the
>> variables you need directly in the libxl_ctx.  Or, make libxl_ctx
>> contain a pointer to a struct which is declared in libxl_internal.h
>> but only defined in (say) libxl_netbuffer.c.  The latter is probably
>> better because it more easily allows different
>> platorms'/implementations' versions of the same device kind to have
>> different variables.
> 
> I will take option 2.
> 
>>
>> It is IMO fine for the different devices to each have their own
>> member in libxl_ctx.
>>
>>
>> All the libxl__remus_device_ops structs need to be const.
>>
>>
>> Taking as an example:
>>
>>> +    if (rds->num_devices == rds->num_setuped)
>>
>> Can you please write "num_set_up" ?  I'm afraid that "setuped" isn't a
>> word in English and it reads very oddly.  Thanks.
> 
> Sorry for the bad English...will fix that.
> 
>>
>>
>>> +    /*
>>> +     * This is for matching, we must go through all device ops until we
>>> +     * find a matched op for the device. The ops_index record which ops
>>> +     * we are matching.
>>> +     */
>>> +    int ops_index;
>>> +    libxl__remus_device_ops *ops;
>>> +    libxl__remus_device_callback *callback;
>>> +    libxl__remus_device_state *rds;
>>
>> I think this is confusing.  Are all of these private for the abstract
>> layer ?  I think the remus concrete device ought to be allowed to use
>> libxl__remus_device_ops (which needs to be const).
>>
>> It is more important to say who owns a struct field (who is
>> allowed to read it; who is responsible for setting it, etc.) than to
>> say precisely what it is for.
>>
>> If you're going to have both kinds of comments in the same struct
>> (that is, both sections which have particular ownership and individual
>> usage comments) it would be helpful to make ownership section comments
>> more obvious.
>>
>> Perhaps something like:
>>
>>    struct libxl__remus_device {
>>        /*----- shared between abstract and concrete layers -----*/
>>        /* set by remus device abstruct layer */
>>        int devid;
>>        /* libxl__device_* which this remus device related to */
>>        const void *backend_dev;
>>        libxl__remus_device_kind kind;
>>
>>        /*----- private for abstract layer only -----*/
>>        /*
>>         * This is for matching, we must go through all device ops until we
>>         * find a matched op for the device. The ops_index record which ops
>>         * we are matching.
>>         */
>>        int ops_index;
>>        libxl__remus_device_ops *ops;
>>        libxl__remus_device_callback *callback;
>>        libxl__remus_device_state *rds;
>>
>>        /*----- private for concrete (device-specific) layer -----*/
>>        /* *kind* of device's private data */
>>        void *data;
>>        /* for calling scripts, eg. setup|teardown|match scripts */
>>        libxl__async_exec_state aes;
>>        /*
>>         * for async func calls, in the implenmentation of device ops, we
>>         * may use fork to do async ops. this is owned by device-specific
>>         * ops methods
>>         */
>>        libxl__ev_child child;
>>    };
>>
>> Writing it like that shows another anomaly: why do we have fields in
>> the libxl__remus_device struct that are private for the device
>> specific layer ?
>>
>> After all the device-specific layer has "data", which could contain
>> anything it likes.
>>
>>
>>
>> Why is the libxl__remus_device_state a separate structure ?
>> Can it be combined with libxl__remus_device_state ?
> 
> I think you mean combined with libxl__remus_device here, yes, it can be
> combined because it is a member of libxl__remus_device, I separate it
> because I want to make the structure more clear, but seems it is
> confusing here, I will combine it in the next version.

Hmm, I think this abstract layer can be reused by colo, and the ops can
not be reused.

So I think we should not combine it with libxl__remus_state.

Thanks
Wen Congyang

> 
>>
>>
>>> +static libxl__remus_device_ops *dev_ops[] = {
>>> +};
>>
>> As I say, this needs to be const.
> 
> Sure, thanks.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 3/7] remus: introduce remus device
  2014-06-27 17:38   ` Ian Jackson
  2014-06-30  5:02     ` Hongyang Yang
@ 2014-07-01  3:48     ` Hongyang Yang
  2014-07-01  8:31     ` Hongyang Yang
  2 siblings, 0 replies; 15+ messages in thread
From: Hongyang Yang @ 2014-07-01  3:48 UTC (permalink / raw)
  To: Ian Jackson
  Cc: ian.campbell, wency, stefano.stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, laijs, roger.pau

Hi Ian,

On 06/28/2014 01:38 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v12 3/7] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provides the following APIs for libxl:
>
> Thanks for this.  Things are much clearer for me now.  I have some
> more detailed comments.
>
>> +struct libxl__remus_device_ops {
>> +    /*
>> +     * init() and destroy() APIs are produced by a device type and
>> +     * consumed by the main remus code, a device type must implement
>> +     * these two APIs.
>> +     */
>> +    /* init device ops private data, etc. must implement */
>> +    int (*init)(libxl__remus_device_ops *self,
>> +                libxl__remus_state *rs);
>> +    /* free device ops private data, etc. must implement */
>> +    void (*destroy)(libxl__remus_device_ops *self);
>> +    /*
>> +     * This is device ops's private data, for different device types,
>> +     * the data structs are different
>> +     */
>> +    void *data;
>
> I see that in 4/7 you use this to store global state; it's not const.
>
> libxl is not supposed to to have any global state that's not hung off
> the libxl_ctx (or some similar) structure.  So I think things like the
> NIC data need to be in libxl_ctx.
>
> You have two reasonable options, I think: either, include the
> variables you need directly in the libxl_ctx.  Or, make libxl_ctx
> contain a pointer to a struct which is declared in libxl_internal.h
> but only defined in (say) libxl_netbuffer.c.  The latter is probably
> better because it more easily allows different
> platorms'/implementations' versions of the same device kind to have
> different variables.
>
> It is IMO fine for the different devices to each have their own
> member in libxl_ctx.
>
>
> All the libxl__remus_device_ops structs need to be const.
>
>
> Taking as an example:
>
>> +    if (rds->num_devices == rds->num_setuped)
>
> Can you please write "num_set_up" ?  I'm afraid that "setuped" isn't a
> word in English and it reads very oddly.  Thanks.
>
>
>> +    /*
>> +     * This is for matching, we must go through all device ops until we
>> +     * find a matched op for the device. The ops_index record which ops
>> +     * we are matching.
>> +     */
>> +    int ops_index;
>> +    libxl__remus_device_ops *ops;
>> +    libxl__remus_device_callback *callback;
>> +    libxl__remus_device_state *rds;
>
> I think this is confusing.  Are all of these private for the abstract
> layer ?  I think the remus concrete device ought to be allowed to use
> libxl__remus_device_ops (which needs to be const).
>
> It is more important to say who owns a struct field (who is
> allowed to read it; who is responsible for setting it, etc.) than to
> say precisely what it is for.
>
> If you're going to have both kinds of comments in the same struct
> (that is, both sections which have particular ownership and individual
> usage comments) it would be helpful to make ownership section comments
> more obvious.
>
> Perhaps something like:
>
>    struct libxl__remus_device {
>        /*----- shared between abstract and concrete layers -----*/
>        /* set by remus device abstruct layer */
>        int devid;
>        /* libxl__device_* which this remus device related to */
>        const void *backend_dev;
>        libxl__remus_device_kind kind;
>
>        /*----- private for abstract layer only -----*/
>        /*
>         * This is for matching, we must go through all device ops until we
>         * find a matched op for the device. The ops_index record which ops
>         * we are matching.
>         */
>        int ops_index;
>        libxl__remus_device_ops *ops;
>        libxl__remus_device_callback *callback;
>        libxl__remus_device_state *rds;
>
>        /*----- private for concrete (device-specific) layer -----*/
>        /* *kind* of device's private data */
>        void *data;
>        /* for calling scripts, eg. setup|teardown|match scripts */
>        libxl__async_exec_state aes;
>        /*
>         * for async func calls, in the implementation of device ops, we
>         * may use fork to do async ops. this is owned by device-specific
>         * ops methods
>         */
>        libxl__ev_child child;
>    };
>
> Writing it like that shows another anomaly: why do we have fields in
> the libxl__remus_device struct that are private for the device
> specific layer ?

"aes" and "child" are common fields for device specific layer,
it's common for an async device. We define them here
in order to not define it in every device specific implementation.

>
> After all the device-specific layer has "data", which could contain
> anything it likes.
>
>
>
> Why is the libxl__remus_device_state a separate structure ?
> Can it be combined with libxl__remus_device_state ?
>
>
>> +static libxl__remus_device_ops *dev_ops[] = {
>> +};
>
> As I say, this needs to be const.
>
>
>> +static void device_common_cb(libxl__egc *egc,
>> +                             libxl__remus_device *dev,
>> +                             int rc)
>
> When we write this kind of callback threaded code, we write it in
> chronological order.  That is:
>   - banner comment at the top
>   - necessary forward declarations (and data types)
>   - entrypoint
>   - callbacks, in order in which they are called
>   - banner comment for next section
>
> See for example the utilities in libxl_aoutils.c.
>
> Or for a bigger example see libxl_bootloader.c, in particular the
> section after the comment "main flow of control" near line 304.
>
>> +static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
>> +                                     libxl__remus_device_state *rds,
>> +                                     libxl__remus_device_kind kind,
>> +                                     void *libxl_dev)
> ...
>> +    switch (kind) {
>> +        case LIBXL__REMUS_DEVICE_NIC:
>> +            nic = libxl_dev;
>> +            dev->devid = nic->devid;
>> +            break;
>> +        case LIBXL__REMUS_DEVICE_DISK:
>> +            disk = libxl_dev;
>> +            /* there are no dev id for disk devices */
>> +            dev->devid = -1;
>> +            break;
>
> Wouldn't it be better to move this into the concrete (device type
> specific) code ?  After all you're switching on the device type
> already, and the device type specific code has all the same
> information ?
>
> Or am I missing something ?
>
>> +void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
>> +{
>> +    int i;
>> +    libxl__remus_device *dev;
>> +    libxl__remus_device_ops *ops;
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* Convenience aliases */
>> +    libxl__remus_device_state *const rds = &rs->dev_state;
>> +
>> +    if (rds->num_setuped == 0) {
>> +        /* clean device ops */
>> +        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>> +            ops = dev_ops[i];
>> +            ops->destroy(ops);
>> +        }
>> +        goto out;
>> +    }
>
> I was a bit confused by the nonlinear flow of control in this file,
> but it is definitely clear to me that you have two copies of this
> loop.
>
> Why don't you just make the callback to your own callback function
> yourself (under appropriate conditions) ?
>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 1018142..cc5d390 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "NOT_MATCH"),
>
> I still think this probably wants to be a better error message.
>
> REMUS_UNSUPPORTED_DEVICE ?
>
> Thanks,
> Ian.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v12 3/7] remus: introduce remus device
  2014-06-27 17:38   ` Ian Jackson
  2014-06-30  5:02     ` Hongyang Yang
  2014-07-01  3:48     ` Hongyang Yang
@ 2014-07-01  8:31     ` Hongyang Yang
  2 siblings, 0 replies; 15+ messages in thread
From: Hongyang Yang @ 2014-07-01  8:31 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson



On 06/28/2014 01:38 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v12 3/7] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provides the following APIs for libxl:
>
> Thanks for this.  Things are much clearer for me now.  I have some
> more detailed comments.
>
>> +struct libxl__remus_device_ops {
>> +    /*
>> +     * init() and destroy() APIs are produced by a device type and
>> +     * consumed by the main remus code, a device type must implement
>> +     * these two APIs.
>> +     */
>> +    /* init device ops private data, etc. must implement */
>> +    int (*init)(libxl__remus_device_ops *self,
>> +                libxl__remus_state *rs);
>> +    /* free device ops private data, etc. must implement */
>> +    void (*destroy)(libxl__remus_device_ops *self);
>> +    /*
>> +     * This is device ops's private data, for different device types,
>> +     * the data structs are different
>> +     */
>> +    void *data;
>
> I see that in 4/7 you use this to store global state; it's not const.
>
> libxl is not supposed to to have any global state that's not hung off
> the libxl_ctx (or some similar) structure.  So I think things like the
> NIC data need to be in libxl_ctx.
>
> You have two reasonable options, I think: either, include the
> variables you need directly in the libxl_ctx.  Or, make libxl_ctx
> contain a pointer to a struct which is declared in libxl_internal.h
> but only defined in (say) libxl_netbuffer.c.  The latter is probably
> better because it more easily allows different
> platorms'/implementations' versions of the same device kind to have
> different variables.
>
> It is IMO fine for the different devices to each have their own
> member in libxl_ctx.
>
>
> All the libxl__remus_device_ops structs need to be const.
>
>
> Taking as an example:
>
>> +    if (rds->num_devices == rds->num_setuped)
>
> Can you please write "num_set_up" ?  I'm afraid that "setuped" isn't a
> word in English and it reads very oddly.  Thanks.
>
>
>> +    /*
>> +     * This is for matching, we must go through all device ops until we
>> +     * find a matched op for the device. The ops_index record which ops
>> +     * we are matching.
>> +     */
>> +    int ops_index;
>> +    libxl__remus_device_ops *ops;
>> +    libxl__remus_device_callback *callback;
>> +    libxl__remus_device_state *rds;
>
> I think this is confusing.  Are all of these private for the abstract
> layer ?  I think the remus concrete device ought to be allowed to use
> libxl__remus_device_ops (which needs to be const).
>
> It is more important to say who owns a struct field (who is
> allowed to read it; who is responsible for setting it, etc.) than to
> say precisely what it is for.
>
> If you're going to have both kinds of comments in the same struct
> (that is, both sections which have particular ownership and individual
> usage comments) it would be helpful to make ownership section comments
> more obvious.
>
> Perhaps something like:
>
>    struct libxl__remus_device {
>        /*----- shared between abstract and concrete layers -----*/
>        /* set by remus device abstruct layer */
>        int devid;
>        /* libxl__device_* which this remus device related to */
>        const void *backend_dev;
>        libxl__remus_device_kind kind;
>
>        /*----- private for abstract layer only -----*/
>        /*
>         * This is for matching, we must go through all device ops until we
>         * find a matched op for the device. The ops_index record which ops
>         * we are matching.
>         */
>        int ops_index;
>        libxl__remus_device_ops *ops;
>        libxl__remus_device_callback *callback;
>        libxl__remus_device_state *rds;
>
>        /*----- private for concrete (device-specific) layer -----*/
>        /* *kind* of device's private data */
>        void *data;
>        /* for calling scripts, eg. setup|teardown|match scripts */
>        libxl__async_exec_state aes;
>        /*
>         * for async func calls, in the implenmentation of device ops, we
>         * may use fork to do async ops. this is owned by device-specific
>         * ops methods
>         */
>        libxl__ev_child child;
>    };
>
> Writing it like that shows another anomaly: why do we have fields in
> the libxl__remus_device struct that are private for the device
> specific layer ?
>
> After all the device-specific layer has "data", which could contain
> anything it likes.
>
>
>
> Why is the libxl__remus_device_state a separate structure ?
> Can it be combined with libxl__remus_device_state ?
>
>
>> +static libxl__remus_device_ops *dev_ops[] = {
>> +};
>
> As I say, this needs to be const.
>
>
>> +static void device_common_cb(libxl__egc *egc,
>> +                             libxl__remus_device *dev,
>> +                             int rc)
>
> When we write this kind of callback threaded code, we write it in
> chronological order.  That is:
>   - banner comment at the top
>   - necessary forward declarations (and data types)
>   - entrypoint
>   - callbacks, in order in which they are called
>   - banner comment for next section
>
> See for example the utilities in libxl_aoutils.c.
>
> Or for a bigger example see libxl_bootloader.c, in particular the
> section after the comment "main flow of control" near line 304.
>
>> +static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
>> +                                     libxl__remus_device_state *rds,
>> +                                     libxl__remus_device_kind kind,
>> +                                     void *libxl_dev)
> ...
>> +    switch (kind) {
>> +        case LIBXL__REMUS_DEVICE_NIC:
>> +            nic = libxl_dev;
>> +            dev->devid = nic->devid;
>> +            break;
>> +        case LIBXL__REMUS_DEVICE_DISK:
>> +            disk = libxl_dev;
>> +            /* there are no dev id for disk devices */
>> +            dev->devid = -1;
>> +            break;
>
> Wouldn't it be better to move this into the concrete (device type
> specific) code ?  After all you're switching on the device type
> already, and the device type specific code has all the same
> information ?
>
> Or am I missing something ?
>
>> +void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
>> +{
>> +    int i;
>> +    libxl__remus_device *dev;
>> +    libxl__remus_device_ops *ops;
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* Convenience aliases */
>> +    libxl__remus_device_state *const rds = &rs->dev_state;
>> +
>> +    if (rds->num_setuped == 0) {
>> +        /* clean device ops */
>> +        for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>> +            ops = dev_ops[i];
>> +            ops->destroy(ops);
>> +        }
>> +        goto out;
>> +    }
>
> I was a bit confused by the nonlinear flow of control in this file,
> but it is definitely clear to me that you have two copies of this
> loop.
>
> Why don't you just make the callback to your own callback function
> yourself (under appropriate conditions) ?

Here we just need to clean device ops, in the callback function, it does
more than this, we can make this loop a function to avoid the duplication.

>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 1018142..cc5d390 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "NOT_MATCH"),
>
> I still think this probably wants to be a better error message.
>
> REMUS_UNSUPPORTED_DEVICE ?
>
> Thanks,
> Ian.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-07-01  8:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-20  7:04 [PATCH v12 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-06-20  7:04 ` [PATCH v12 1/7] remus: make postcopy callback asynchronous Yang Hongyang
2014-06-20  7:04 ` [PATCH v12 2/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
2014-06-20 14:43   ` Konrad Rzeszutek Wilk
2014-06-21  6:17     ` Hongyang Yang
2014-06-20  7:04 ` [PATCH v12 3/7] remus: introduce remus device Yang Hongyang
2014-06-27 17:38   ` Ian Jackson
2014-06-30  5:02     ` Hongyang Yang
2014-07-01  2:04       ` Wen Congyang
2014-07-01  3:48     ` Hongyang Yang
2014-07-01  8:31     ` Hongyang Yang
2014-06-20  7:04 ` [PATCH v12 4/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
2014-06-20  7:04 ` [PATCH v12 5/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
2014-06-20  7:04 ` [PATCH v12 6/7] libxl: network buffering cmdline switch Yang Hongyang
2014-06-20  7:04 ` [PATCH v12 7/7] libxl: disk " Yang Hongyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.