All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Hongyang <yanghy@cn.fujitsu.com>
To: xen-devel@lists.xen.org
Cc: ian.campbell@citrix.com, wency@cn.fujitsu.com,
	ian.jackson@eu.citrix.com, yunhong.jiang@intel.com,
	eddie.dong@intel.com, rshriram@cs.ubc.ca, laijs@cn.fujitsu.com
Subject: [PATCH for-4.5 v21 07/14] libxl/remus: setup and control network output buffering
Date: Fri, 26 Sep 2014 14:13:12 +0800	[thread overview]
Message-ID: <1411711999-3183-8-git-send-email-yanghy@cn.fujitsu.com> (raw)
In-Reply-To: <1411711999-3183-1-git-send-email-yanghy@cn.fujitsu.com>

This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:

1. Hotplug scripts: The remus-netbuf-setup script is responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering.  This script should be invoked by libxl for
  each of the guest's network interfaces, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the REMUS_IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

2. Remus network device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
      invocation. They establish and free netlink related state respectively.

   b) setup() and teardown are called for each vif attached to the
      guest.
      During setup():
      i) The hotplug script is called to setup a network buffer on a
         given vif. The script chooses an available IFB device from
         the system, redirects vif egress traffic to the IFB device
         and sets up the plug qdisc (output buffer) on the IFB device.
         The name of the IFB device is communicated via xenstore to
         libxl.

      ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
          and subsequently controls output buffering using this handle
          in the checkpoint callbacks.

      During teardown(), the hotplug scripts are called again to remove
      the vif->ifb traffic redirection, release the ifb and the plug
      qdisc associated with it.

   c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
      are implemented as synchronous ops as the netlink calls associated
      with the qdisc subsystem are very fast.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 230 ++++++++++++++++
 tools/libxl/libxl.c                    |   7 +
 tools/libxl/libxl_internal.h           |  10 +
 tools/libxl/libxl_netbuffer.c          | 481 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  23 ++
 tools/libxl/libxl_remus_device.c       |  18 +-
 8 files changed, 773 insertions(+), 1 deletion(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index ea67536..d94ea9d 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -393,6 +393,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index d5a9ed2..31e57f7 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..87dfa69
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,230 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the REMUS_IFB device details will be
+#             stored or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# REMUS_IFB   ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the REMUS_IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+
+# Remus network buffering requirements:
+
+# We need to buffer (queue) egress traffic from every vif attached to
+# the guest and release the buffers when the checkpoint associated
+# with them has been committed at the backup host. We achieve this
+# with the help of the plug queuing discipline (sch_plug module).
+# Simply put, Remus' network buffering imposes traffic
+# shaping on the guest's vif(s).
+
+# Limitations and Workarounds:
+
+# Egress traffic from a vif appears as ingress traffic to dom0. Linux
+# supports policing (dropping packets) but not traffic shaping
+# (queuing packets) on ingress traffic. The standard workaround to
+# this limitation is to attach an ingress qdisc to the guest vif,
+# redirect all egress traffic from the guest to an intermediate
+# queuing interface, and apply egress rules to it. The IFB
+# (Intermediate Functional Block) device serves the purpose of an
+# intermediate queuing interface.
+#
+
+# The following commands install a network buffer on a
+# guest's vif (vif1.0) using an IFB device (ifb0):
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+# Note:
+# 1. If the setup process fails, the script's cleanup is limited to removing the
+#    ingress qdisc on the guest vif, so that its traffic can flow normally.
+#    The chosen ifb device is not torn down. Libxl has to execute the
+#    teardown op to remove other qdiscs and subsequently free the IFB device.
+#
+# 2. The teardown op may be invoked multiple times by libxl.
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+#return 0 if the ifb is free
+check_ifb() {
+    local installed=`nl-qdisc-list -d $1`
+    [ -n "$installed" ] && return 1
+
+    for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+    do
+        [ $domid -eq 0 ] && continue
+        xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+        for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true`
+        do
+            local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+            xenstore-exists $path || continue
+            local ifb=`xenstore-read "$path" 2>/dev/null || true`
+            [ "$ifb" = "$1" ] && return 1
+        done
+    done
+
+    return 0
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        check_ifb "$ifb" || continue
+        REMUS_IFB="$ifb"
+        break
+    done
+
+    if [ -z "$REMUS_IFB" ]
+    then
+        fatal "Unable to find a free ifb device for $vifname"
+    fi
+
+    #not using xenstore_write that automatically exits on error
+    #because we need to cleanup
+    xenstore_write "$XENBUS_PATH/ifb" "$REMUS_IFB"
+    do_or_die ip link set dev "$REMUS_IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    #Check if the XENBUS_PATH/ifb exists and has IFB name same as REMUS_IFB.
+    #Otherwise, if the teardown op is called multiple times, then we may end
+    #up freeing another domain's allocated IFB inside the if loop.
+    xenstore-exists "$XENBUS_PATH/ifb" && \
+        local ifb2=`xenstore-read "$XENBUS_PATH/ifb" 2>/dev/null || true`
+
+    if [[ "$ifb2" && "$ifb2" == "$ifb" ]]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$REMUS_IFB"
+        add_plug_qdisc "$vifname" "$REMUS_IFB"
+        release_lock "pickifb"
+
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$REMUS_IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $REMUS_IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index e108e40..27fdfc2 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -819,6 +819,13 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Convenience aliases */
     libxl__remus_devices_state *const rds = &dss->rds;
+
+    if (!libxl__netbuffer_enabled(gc)) {
+        LOG(ERROR, "Remus: No support for network buffering");
+        goto out;
+    }
+    rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+
     rds->ao = ao;
     rds->domid = domid;
     rds->callback = libxl__remus_setup_done;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 35fbdcd..2776d19 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2665,6 +2665,9 @@ struct libxl__remus_device_instance_ops {
     void (*teardown)(libxl__egc *egc, libxl__remus_device *dev);
 };
 
+int init_subkind_nic(libxl__remus_devices_state *rds);
+void cleanup_subkind_nic(libxl__remus_devices_state *rds);
+
 typedef void libxl__remus_callback(libxl__egc *,
                                    libxl__remus_devices_state *, int rc);
 
@@ -2699,6 +2702,13 @@ struct libxl__remus_devices_state {
     int num_disks;
 
     libxl__multidev multidev;
+
+    /*----- private for concrete (device-specific) layer only -----*/
+
+    /* private for nic device subkind ops */
+    char *netbufscript;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
 };
 
 /*
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 52d593c..72e0ad0 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,492 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_device_nic {
+    int devid;
+
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+int init_subkind_nic(libxl__remus_devices_state *rds)
+{
+    int rc, ret;
+
+    STATE_AO_GC(rds->ao);
+
+    rds->nlsock = nl_socket_alloc();
+    if (!rds->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                  libxl__xen_script_dir_path());
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+{
+    STATE_AO_GC(rds->ao);
+
+    /* free qdisc cache */
+    if (rds->qdisc_cache) {
+        nl_cache_clear(rds->qdisc_cache);
+        nl_cache_free(rds->qdisc_cache);
+        rds->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (rds->nlsock) {
+        nl_close(rds->nlsock);
+        nl_socket_free(rds->nlsock);
+        rds->nlsock = NULL;
+    }
+}
+
+/*----- setup() and teardown() -----*/
+
+/* helper functions */
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->rds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_devices_state *rds,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int rc, ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Now that we have brought up REMUS_IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get a handle to the REMUS_IFB interface */
+    ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+    if (ret) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(ret));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the REMUS_IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the REMUS_IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    if (rc && qdisc)
+        nl_object_put((struct nl_object *)qdisc);
+
+    return rc;
+}
+
+/* callbacks */
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status);
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status);
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $REMUS_IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__remus_device *dev, char *op)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+    libxl__remus_devices_state *rds = dev->rds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Convenience aliases */
+    char *const script = libxl__strdup(gc, rds->netbufscript);
+    const uint32_t domid = rds->domid;
+    const int dev_id = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "REMUS_IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->rds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = netbuf_teardown_script_cb;
+    else
+        aes->callback = netbuf_setup_script_cb;
+}
+
+/* setup() and teardown() */
+
+static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices
+     */
+    dev->matched = true;
+
+    GCNEW(remus_nic);
+    dev->concrete_data = remus_nic;
+    remus_nic->devid = nic->devid;
+    remus_nic->vif = get_vifname(dev, nic);
+    if (!remus_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup");
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/*
+ * In return, the script writes the name of REMUS_IFB device (during setup)
+ * to be used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+    libxl__remus_devices_state *rds = dev->rds;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    STATE_AO_GC(rds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = rds->domid;
+    const int devid = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    /*
+     * we need to get ifb first because it's needed for teardown
+     */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc)
+        goto out;
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            rds->netbufscript, vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(rds, remus_nic);
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void nic_teardown(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    STATE_AO_GC(dev->rds->ao);
+
+    setup_async_exec(dev, "teardown");
+
+    rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    int rc;
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    free_qdisc(remus_nic);
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* The value of buffer_op, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+/* API implementations */
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+                           libxl__remus_devices_state *rds,
+                           int buffer_op)
+{
+    int rc, ret;
+
+    STATE_AO_GC(rds->ao);
+
+    if (buffer_op == tc_buffer_start)
+        ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (ret) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    if (ret) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    if (rc)
+        LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+            ((buffer_op == tc_buffer_start) ?
+            "start_new_epoch" : "release_prev_epoch"),
+            remus_nic->ifb, nl_geterror(ret));
+    return rc;
+}
+
+static void nic_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void nic_commit(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+    .postsuspend = nic_postsuspend,
+    .commit = nic_commit,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 1c72a7f..3c659c2 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,29 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+int init_subkind_nic(libxl__remus_devices_state *rds)
+{
+    return 0;
+}
+
+void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+{
+    return;
+}
+
+static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    dev->aodev.rc = ERROR_FAIL;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+const libxl__remus_device_instance_ops remus_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = nic_setup,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 4e77587..b20168f 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -17,7 +17,9 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__remus_device_instance_ops remus_device_nic;
 static const libxl__remus_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
     NULL,
 };
 
@@ -26,12 +28,26 @@ static const libxl__remus_device_instance_ops *remus_ops[] = {
 static int init_device_subkind(libxl__remus_devices_state *rds)
 {
     /* init device subkind-specific state in the libxl ctx */
-    return 0;
+    int rc;
+    STATE_AO_GC(rds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(rds);
+        if (rc) goto out;
+    }
+
+    rc = 0;
+out:
+    return rc;
 }
 
 static void cleanup_device_subkind(libxl__remus_devices_state *rds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(rds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(rds);
 }
 
 /*----- setup() and teardown() -----*/
-- 
1.9.1

  parent reply	other threads:[~2014-09-26  6:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-26  6:13 [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 01/14] libxl: multidev: Clarify comments about which callbacks are meant Yang Hongyang
2014-09-26 13:56   ` Wei Liu
2014-09-26  6:13 ` [PATCH for-4.5 v21 02/14] libxl: multidev: Expose libxl__multidev_one_callback Yang Hongyang
2014-09-26 13:58   ` Wei Liu
2014-09-26  6:13 ` [PATCH for-4.5 v21 03/14] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 04/14] libxl: Extend libxl__ao_device with a libxl__ev_child member Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 05/14] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
2014-10-06 14:48   ` Ian Campbell
2014-09-26  6:13 ` [PATCH for-4.5 v21 06/14] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
2014-09-26 12:59   ` Ian Jackson
2014-09-26  6:13 ` Yang Hongyang [this message]
2014-09-26  6:13 ` [PATCH for-4.5 v21 08/14] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 09/14] xl/remus: change bool to defbool Yang Hongyang
2014-09-26 12:57   ` Ian Jackson
2014-09-26  6:13 ` [PATCH for-4.5 v21 10/14] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
2014-09-26 12:57   ` Ian Jackson
2014-09-26  6:13 ` [PATCH for-4.5 v21 11/14] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 12/14] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 13/14] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
2014-09-26  6:13 ` [PATCH for-4.5 v21 14/14] MAINTAINERS: update maintained files of Remus Yang Hongyang
2014-09-26 13:10 ` [PATCH for-4.5 v21 00/14] Remus/Libxl: Remus network buffering and drbd disk Ian Jackson
2014-09-26 14:14   ` Ian Jackson
2014-09-26 14:20     ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1411711999-3183-8-git-send-email-yanghy@cn.fujitsu.com \
    --to=yanghy@cn.fujitsu.com \
    --cc=eddie.dong@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=wency@cn.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.