All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0 of 4 RFC] xl - Remus network buffering support
@ 2013-07-25  7:09 Shriram Rajagopalan
  2013-07-25  7:09 ` [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions Shriram Rajagopalan
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-25  7:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell, Stefano Stabellini

This patch series adds support for network buffering in the Remus
codebase in xl/libxl. In previous emails, I had proposed for a script
invocation to setup network buffering. After digging through libnl API,
I managed to find most of what I needed (except for one command, which
right now is executed through system() call).

The patch series in its current state would allow xl to dynamically setup
and teardown buffering devices, qdiscs, etc associated with the guest,
instead of resorting to clunky one time configurations.

The series is organized as follows:
 1/4 - Network buffering setup functions - abstractions built on top of libnl3 API
       to implement functionality such as add/delete qdisc, interface up/down,
       search for free ifb devices, etc.           

 2/4 - xl cmdline utility uses these abstractions to setup network buffers and
       provides libxl with a list of ifb devices where packets would be buffered

 3/4 - Libxl interaction with network buffer module in the kernel via libnl3.

 4/4 - adds libnl3 (>= v3.2.17) dependency to autoconf scripts and linker flags
       in tools/libxl/Makefile. 

Functionality tested on debian squeeze (kernel 3.4) + openvswitch + 64-bit PV domU (kernel 3.4).

Couple of things to note:
 1. I am not well versed with the autoconf stuff. I fixed the configure.ac
    as best as I could. However, the libxl/Makefile patching is still clunky
    [hard coded -I/usr/local/include/libnl3..]

 2. I have kept most of the setup related C code in a separate file, that
    I am also planning to submit to libvirt mailing list.  Other toolstacks can choose to 
    setup network buffers in their own way (scripts or a different implementation).
    As long as libxl gets as input a list of IFB devices to act on, things are good.

    The code to control the network buffer, when Remus is operational, 
    is incorporated into libxl in the remus callbacks. This is something that only libxl 
    can do and [should do].


thanks
shriram

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
  2013-07-25  7:09 [PATCH 0 of 4 RFC] xl - Remus network buffering support Shriram Rajagopalan
@ 2013-07-25  7:09 ` Shriram Rajagopalan
  2013-07-29 15:42   ` Ian Campbell
  2013-07-25  7:09 ` [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown Shriram Rajagopalan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-25  7:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell, Stefano Stabellini

The functions contained in xl_netbuf.c are built on top of libnl3 API.
They setup the necessary infrastructure required to for network output
buffering in Remus.  In xend, the Remus python code performed this setup
by invoking shell commands.  This code is built purely on top of APIs
supplied by libnl3.0.

There are two public helper functions:
1) remus_install_netbuf_on_dev to install a network buffer on a vif.
Its basically equivalent to:
a)find free ifb (say ifb0)
b)ip link set ifb0 up
c)tc qdisc add dev vif1.0 ingress
d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \
	action mirred egress redirect dev ifb0
e)tc qdisc add dev ifb0 root plug
f)get handle to plug qdisc and control it programmatically after suspending
  a VM/receiving checkpoint ack from remote


2) remus_uninstall_netbufs to remove all
network buffers and ingress qdiscs from the supplied interface list.

I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building
on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection)
and it is pretty complicated to implement. So I am currently resorting to system() lib call
to get the job done.

N.B. This implementation is just xl (cmdline utility) specific. Other toolstacks may choose
to reuse this code or do this setup in their own way (scripts/shell commands, etc.).

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 711fe8ed54e4 -r 3ae38cbe535c tools/libxl/xl.h
--- a/tools/libxl/xl.h	Wed Jul 24 22:48:09 2013 -0700
+++ b/tools/libxl/xl.h	Wed Jul 24 22:55:00 2013 -0700
@@ -161,6 +161,9 @@ enum output_format {
 extern enum output_format default_output_format;
 
 extern void printf_info_sexp(int domid, libxl_domain_config *d_config);
+extern char *remus_install_netbuf_on_dev(char *vifname);
+extern void remus_uninstall_netbufs(char **vifs, int num_vifs,
+                                    char **ifbs, int num_ifbs);
 
 #define XL_GLOBAL_CONFIG XEN_CONFIG_DIR "/xl.conf"
 #define XL_LOCK_FILE XEN_LOCK_DIR "/xl"
diff -r 711fe8ed54e4 -r 3ae38cbe535c tools/libxl/xl_netbuf.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/libxl/xl_netbuf.c	Wed Jul 24 22:55:00 2013 -0700
@@ -0,0 +1,441 @@
+/*
+ * xl command line utility - helper functions for setting up 
+ *  remus network buffering.
+ */
+
+#include "libxl_osdeps.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <regex.h>
+#include <xentoollog.h>
+
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+#include "libxl.h"
+#include "libxl_utils.h"
+#include "xl.h"
+
+#define RET_BAD_ARG(x, y)                                                 \
+    do {                                                                \
+        fprintf(stderr, "%s:%d bad argument "#x"\n",__FUNCTION__, __LINE__); \
+        return y;                                                       \
+    } while (0)
+
+static struct rtnl_link *get_free_ifbdev(struct nl_cache *link_cache,
+                                         struct nl_cache *qdisc_cache)
+{
+  struct nl_object *o = NULL;
+  struct rtnl_link *temp = NULL, *ifb = NULL;
+  struct rtnl_qdisc *qdisc = NULL;
+  char *ifbname = NULL;
+  regex_t ifbregex;
+  int ifindex;
+
+  if (!link_cache) RET_BAD_ARG(link_cache, NULL);
+  if (!qdisc_cache) RET_BAD_ARG(qdisc_cache, NULL);
+
+  if (regcomp(&ifbregex, "^ifb[0-9]+$", REG_EXTENDED|REG_NOSUB)) {
+      fprintf(stderr, "Failed to alloc regex while "
+              "searching for free ifbs\n");
+      return NULL;
+  }
+
+  for (o = nl_cache_get_first(link_cache); o; o = nl_cache_get_next(o)) {
+
+      temp = (struct rtnl_link *)o;
+      ifbname = rtnl_link_get_name(temp);
+
+      if (!ifbname || regexec(&ifbregex, ifbname, 0, NULL, 0))
+	continue;
+
+      ifindex = rtnl_link_get_ifindex(temp);
+      if (!ifindex) continue;
+
+      /* found an IFB. check if it has a qdisc on root */
+      qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT);
+      if (qdisc) {
+	char *kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+	rtnl_qdisc_put(qdisc);
+
+	if (strcmp(kind, "pfifo_fast") && strcmp(kind, "mq")
+            && strcmp(kind, "ingress"))
+	  continue;
+      }
+
+      ifb = rtnl_link_get(link_cache, ifindex);
+      printf("Acquired IFB dev %s for network buffering\n", ifbname);
+      break;
+  }
+
+  if (!ifb)
+      fprintf(stderr, "No free ifb device available\n");
+
+  regfree(&ifbregex);
+  return ifb;
+}
+
+static int do_net_ifupdown(struct nl_sock *sock, char *ifname, int ifup)
+{
+  struct nl_msg *msg;
+  struct nlmsghdr *nlh;
+  struct ifinfomsg ifm;
+  int ret;
+
+  if (!sock) RET_BAD_ARG(sock, -1);
+  if (!ifname || !strlen(ifname)) RET_BAD_ARG(ifname, -1);
+
+  msg = nlmsg_alloc(); //allocates 4K msg by default
+  nlh = nlmsg_hdr(msg);
+  nlh->nlmsg_type = RTM_NEWLINK;
+  nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+
+  ifm.ifi_family = AF_UNSPEC;
+  ifm.ifi_change = IFF_UP;
+  ifm.ifi_flags = ifup ? IFF_UP : ~IFF_UP;
+
+  if (nlmsg_append(msg, &ifm, sizeof(ifm), NLMSG_ALIGNTO) < 0)
+    goto nla_put_failure;
+  NLA_PUT_STRING(msg, IFLA_IFNAME, ifname);
+
+  ret = nl_send_sync(sock, msg); //this call also frees msg
+
+  if (!ret) {
+      printf("IFB dev %s link %s\n", ifname, ifup ? "up" : "down");
+      return 0;
+  } else {
+      fprintf(stderr, "Unable to bring %s interface %s : %s\n",
+              ifup ? "up" : "down",
+              ifname, nl_geterror(ret));
+      return -1;
+  }
+
+ nla_put_failure:
+  fprintf(stderr, "failed to create nl_msg for bringing %s interface %s\n",
+          ifup ? "up" : "down", ifname);
+  nlmsg_free(msg);
+  return -1;
+}
+
+static int tc_redirect_traffic(char *fromdev, char *todev)
+{
+    /* command:
+     * tc filter add dev vif1.0 parent ffff: proto ip pref 10
+     *    u32 match u32 0 0 action mirred egress redirect dev ifb0
+     */
+    char *tc_cmd = NULL;
+    int ret = 0;
+
+    if (!fromdev) RET_BAD_ARG(fromdev, -1);
+    if (!todev) RET_BAD_ARG(todev, -1);
+
+    if (asprintf(&tc_cmd,
+                 "exec tc filter add dev %s parent ffff: "
+                 "proto ip pref 10 u32 match u32 0 0 "
+                 "act mirred egress redirect dev %s",
+                 fromdev, todev) < 0) {
+        fprintf(stderr, "Failed to compose tc redirection command (%s->%s)\n",
+                fromdev, todev);
+        return -1;
+    }
+
+    ret = system(tc_cmd);
+    if (ret) ret = errno;
+    free(tc_cmd);
+    return -ret;
+}
+
+static int do_qdisc_add_or_del(struct nl_sock *sock,
+                               struct rtnl_link *interface,
+                               char *kind, uint32_t parent, int add)
+{
+    struct rtnl_qdisc *qdisc = NULL;
+    int ret = -1;
+
+    if (!sock) RET_BAD_ARG(sock, -1);
+    if (!interface) RET_BAD_ARG(interface, -1);
+    if (!kind) RET_BAD_ARG(kind, -1);
+
+    qdisc = rtnl_qdisc_alloc();
+    if (!qdisc) {
+        fprintf(stderr, "Failed to allocate %s qdisc\n", kind);
+        return -1;
+    }
+
+    rtnl_tc_set_link(TC_CAST(qdisc), interface);
+    rtnl_tc_set_parent(TC_CAST(qdisc), parent);
+    rtnl_tc_set_kind(TC_CAST(qdisc), kind);
+
+    /* Submit request to kernel and wait for response */
+    if (add)
+        ret = rtnl_qdisc_add(sock, qdisc, NLM_F_CREATE|NLM_F_EXCL);
+    else
+        ret = rtnl_qdisc_delete(sock, qdisc);
+    rtnl_qdisc_put(qdisc);
+
+    return ret;
+}
+
+static int netdev_ifup(struct nl_sock *sock, char *ifname)
+{
+    return do_net_ifupdown(sock, ifname, 1);
+}
+
+static int netdev_ifdown(struct nl_sock *sock, char *ifname)
+{
+    return do_net_ifupdown(sock, ifname, 0);
+}
+
+/* Need to reimplement code from do_qdisc_add_or_del because for plug
+ * qdisc, we need to set an initial buffer size of 4MB atleast. The default
+ * buffer size of 100 packets (set by sch_plug kernel module) is too
+ * small to hold all packets from a VM during an epoch, resulting in
+ * packet drops.
+ */
+#define PLUG_QDISC_LIMIT (4 * 1024 * 1024)
+static int add_plug_qdisc(struct nl_sock *sock, struct rtnl_link *ifb)
+{
+    struct rtnl_qdisc *qdisc = NULL;
+    int ret = -1;
+    char *kind = "plug";
+
+    if (!sock) RET_BAD_ARG(sock, -1);
+    if (!ifb) RET_BAD_ARG(ifb, -1);
+
+    qdisc = rtnl_qdisc_alloc();
+    if (!qdisc) {
+        fprintf(stderr, "Failed to allocate %s qdisc\n", kind);
+        return -1;
+    }
+
+    rtnl_tc_set_link(TC_CAST(qdisc), ifb);
+    rtnl_tc_set_parent(TC_CAST(qdisc), TC_H_ROOT);
+    rtnl_tc_set_kind(TC_CAST(qdisc), kind);
+
+    if ((ret = rtnl_qdisc_plug_set_limit(qdisc, PLUG_QDISC_LIMIT)) < 0) {
+        fprintf(stderr, "Unable to change plug buffer size: %s\n",
+                nl_geterror(ret));
+        rtnl_qdisc_put(qdisc);
+        return ret;
+    } 
+
+    /* Submit request to kernel and wait for response */
+    ret = rtnl_qdisc_add(sock, qdisc, NLM_F_CREATE|NLM_F_EXCL);
+    rtnl_qdisc_put(qdisc);
+
+    return ret;
+}
+
+static int del_plug_qdisc(struct nl_sock *sock, struct rtnl_link *ifb)
+{
+    return do_qdisc_add_or_del(sock, ifb, "plug", TC_H_ROOT, 0);
+}
+
+static int add_ingress_qdisc(struct nl_sock *sock, struct rtnl_link *vif)
+{
+    int ret;
+    ret = do_qdisc_add_or_del(sock, vif, "ingress", TC_H_INGRESS, 1);
+    if (ret == -NLE_EXIST)
+        ret = 0; /* ignore error, if ingress qdisc is already installed
+                    on vif. */
+    return ret;
+}
+
+static int del_ingress_qdisc(struct nl_sock *sock, struct rtnl_link *vif)
+{
+    return do_qdisc_add_or_del(sock, vif, "ingress", TC_H_INGRESS, 0);
+}
+
+/*
+ * When Remus (xend version) installs a network buffer on a guest vif,
+ * it does the following (using a mix of shell commands and netlink messages)
+ *
+ *  ip link set dev ifb0 up
+ *  tc qdisc add dev vif1.0 ingress
+ *  tc filter add dev vif1.0 parent ffff: proto ip \
+ *    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+ *  send netlink message to kernel to add plug_qdisc to ifb0 and get
+ *     handle
+ *  use handle to control operations of plug qdisc.
+ *
+ * So order of operations when installing a network buffer on vif1.0
+ * 1. find a free ifb and bring up the device
+ * 2. add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+ * 3. redirect traffic from vif1.0 to ifb device
+ * 4. install plug_qdisc on ifb device, with which we can buffer/release
+ *    guest's network output from vif1.0
+ *
+ * Return value: ifb dev name on success. NULL on failure.
+ */
+char *remus_install_netbuf_on_dev(char *vifname)
+{
+  struct nl_cache *link_cache = NULL, *qdisc_cache = NULL;
+  struct rtnl_link *vif = NULL;
+  struct rtnl_link *ifb = NULL;
+  struct nl_sock *sock = NULL;
+  char *ifbname = NULL;
+  int ret = -1;
+
+  if (!vifname) return NULL;
+
+  sock = nl_socket_alloc();
+  if (!sock) {
+    fprintf(stderr, "failed to allocate libnl socket\n");
+    return NULL;
+  }
+  nl_connect(sock, NETLINK_ROUTE);
+
+  /* get the link link_cache (equivalent to ifconfig -a -s) */
+  if ((ret = rtnl_link_alloc_cache(sock, AF_UNSPEC, &link_cache)) < 0) {
+      fprintf(stderr, "failed to allocate link cache : %s\n", nl_geterror(ret));
+      goto end;
+  }
+
+  /* similarly, get the qdisc cache, i.e. list of all qdiscs
+   * installed currently on various network interfaces.
+   */
+  if ((ret = rtnl_qdisc_alloc_cache(sock, &qdisc_cache)) < 0) {
+      fprintf(stderr, "failed to allocate qdisc cache : %s\n", nl_geterror(ret));
+      goto end;
+  }
+
+  vif = rtnl_link_get_by_name(link_cache, vifname);
+  if (!vif) {
+      /* link does not exist */
+      fprintf(stderr, "interface %s does not exist\n", vifname);
+      ret = -1;
+      goto end;
+  }
+
+  ret = -1;
+  /* 1. find a free ifb and bring up the device */
+  ifb = get_free_ifbdev(link_cache, qdisc_cache);
+  if (!ifb) goto end;
+
+  ret = netdev_ifup(sock, rtnl_link_get_name(ifb));
+  if (ret) goto end;
+
+  /* 2. add ingress qdisc to vif1.0 (to catch outgoing packets from guest) */
+  ret = add_ingress_qdisc(sock, vif);
+  if (ret) goto end;
+
+  /* 3. redirect traffic from vif1.0 to ifb device */
+  ret = tc_redirect_traffic(vifname, rtnl_link_get_name(ifb));
+  if (ret) goto end;
+
+  /* 4. install plug_qdisc on ifb device, with which we can buffer/release
+   *    guest's network output from vif1.0
+   */
+  ret = add_plug_qdisc(sock, ifb);
+  if (!ret) {
+      ifbname = strdup(rtnl_link_get_name(ifb));
+      if (!ifbname) {
+          perror("Failed to allocate mem for ifbname!");
+          exit(-1);
+      }
+      fprintf(stderr, "Added plug qdisc to %s\n", ifbname);
+  }
+  else
+      fprintf(stderr, "Unable to add plug qdisc: %s\n", nl_geterror(ret));
+
+end:
+  if (ret) {
+      if (vif) {
+          del_ingress_qdisc(sock, vif);
+          if (ifb) {
+              /*
+               * To release ifb dev, first bring it down and then remove
+               * the qdisc. Otherwise, the kernel will replace plug with
+               * pfifo_fast and it will remain attached to the ifb even
+               * if we bring down the device.
+               */
+              netdev_ifdown(sock, rtnl_link_get_name(ifb));
+              del_plug_qdisc(sock, ifb);
+          }
+      }
+  }
+
+  if (vif) rtnl_link_put(vif);
+  if (ifb) rtnl_link_put(ifb);
+  if (link_cache) nl_cache_free(link_cache);
+  if (qdisc_cache) nl_cache_free(qdisc_cache);
+  if (sock) nl_close(sock);
+  return ifbname;
+}
+
+/*
+ * Cleanup code. Remove ingress qdisc from the supplied list of vif
+ * interfaces. Bring down the ifb devices and remove the plug qdiscs
+ * from each of them, thereby releasing the ifbs back into the ifb pool.
+ */
+void remus_uninstall_netbufs(char **vifs, int num_vifs,
+                             char **ifbs, int num_ifbs)
+{
+  struct rtnl_link *netdev = NULL;
+  struct nl_sock *sock = NULL;
+  int i;
+
+  if (!vifs && !ifbs) return;
+
+  sock = nl_socket_alloc();
+  if (!sock) {
+    fprintf(stderr, "failed to allocate libnl socket "
+            "while uninstalling netbufs\n");
+    return;
+  }
+  nl_connect(sock, NETLINK_ROUTE);
+
+  /* We cant use link caches when tearing down network buffering because
+   * someone may have destroyed the domU, thus tearing down the
+   * interfaces itself. As a safer alternative, we will resort to
+   * un-cached direct queries to the kernel for each vif interface.
+   */
+  netdev = rtnl_link_alloc();
+  if (!netdev) {
+      fprintf(stderr, "cannot allocate netdev\n");
+      nl_close(sock);
+      return;
+  }
+
+  for (i = 0; i < num_ifbs; i++) {
+      if (ifbs[i]) {
+          if (rtnl_link_get_kernel(sock, 0, ifbs[i], &netdev) < 0)
+              continue;
+          netdev_ifdown(sock, ifbs[i]);
+          del_plug_qdisc(sock, netdev);
+      }
+  }
+
+  for (i = 0; i < num_vifs; i++) {
+      if (vifs[i]) {
+          if (rtnl_link_get_kernel(sock, 0, vifs[i], &netdev) < 0)
+              continue;
+          del_ingress_qdisc(sock, netdev);
+      }
+  }
+
+  rtnl_link_put(netdev);
+  nl_close(sock);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
  2013-07-25  7:09 [PATCH 0 of 4 RFC] xl - Remus network buffering support Shriram Rajagopalan
  2013-07-25  7:09 ` [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions Shriram Rajagopalan
@ 2013-07-25  7:09 ` Shriram Rajagopalan
  2013-07-29 15:49   ` Ian Campbell
  2013-07-25  7:09 ` [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks Shriram Rajagopalan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-25  7:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell, Stefano Stabellini

Add appropriate code to xl_cmdline.c to setup network buffers for
each vif belonging to the guest.  Also provide a command line switch
to explicitly "enable" network buffering.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl	Wed Jul 24 22:55:00 2013 -0700
+++ b/tools/libxl/libxl_types.idl	Thu Jul 25 00:02:19 2013 -0700
@@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf_iflist", libxl_string_list),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Wed Jul 24 22:55:00 2013 -0700
+++ b/tools/libxl/xl_cmdimpl.c	Thu Jul 25 00:02:19 2013 -0700
@@ -7039,10 +7039,109 @@ done:
     return ret;
 }
 
+static char **get_guest_vifnames(uint32_t domid, int *num_vifs)
+{
+    char **viflist;
+    libxl_device_nic *nics;
+    libxl_nicinfo nicinfo;
+    int nb, i;
+
+    nics = libxl_device_nic_list(ctx, domid, &nb);
+    if (!nics) { *num_vifs = 0; return NULL;}
+
+    viflist = calloc((nb + 1), sizeof(char *));
+    if (!viflist) {
+        perror("failed to allocate memory to hold vif names!");
+        exit(-1);
+    }
+
+    for (i = 0; i < nb; ++i) {
+        if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo))  {
+            if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) {
+                perror("Cannot alloc memory while getting guest vif names");
+                exit(-1);
+            }
+            libxl_nicinfo_dispose(&nicinfo);
+        }
+        libxl_device_nic_dispose(&nics[i]);
+    }
+    free(nics);
+
+    *num_vifs = nb;
+    return viflist;
+}
+
+static int remus_setup_network_buffers(uint32_t domid,
+                                          char ***pifblist, int *num_ifbs)
+{
+
+    char **viflist, **ifblist;
+    int nb, i, j;
+
+    viflist = get_guest_vifnames(domid, &nb);
+    fprintf(stderr, "Domain %u has %d vifs\n", domid, nb);
+
+    if (!viflist) { *num_ifbs = 0; *ifblist = NULL; return 0;}
+
+    ifblist = calloc((nb + 1), sizeof(char *));
+    if (!ifblist) {
+        perror("failed to allocate memory for ifb list!");
+        exit(-1);
+    }
+
+    /* For each vif, install the network buffer */
+    for (i = 0; i < nb; ++i) {
+        ifblist[i] = remus_install_netbuf_on_dev(viflist[i]);
+        if (ifblist[i] == NULL) {
+            fprintf(stderr, "Failed to setup output buffer for interface %s\n",
+                    viflist[i]);
+            break;
+        }
+    }
+
+    if (i < nb) {
+        j = i;
+        remus_uninstall_netbufs(viflist, nb, ifblist, j);
+        for (i = 0; i < nb; i++)
+            free(viflist[i]);
+        free(viflist);
+
+        for (i = 0; i < j; i++)
+            free(ifblist[i]);
+        free(ifblist);
+        *num_ifbs = 0;
+        *pifblist = NULL;
+        return -1;
+    }
+
+    for (i = 0; i < nb; i++)
+        free(viflist[i]);
+    free(viflist);
+
+    *num_ifbs = nb;
+    *pifblist = ifblist;
+    return 0;
+}
+
+static void remus_teardown_network_buffers(uint32_t domid, char **ifblist,
+                                           int num_ifbs)
+{
+    int nb, i;
+    char **viflist;
+ 
+    viflist = get_guest_vifnames(domid, &nb);
+    remus_uninstall_netbufs(viflist, nb, ifblist, num_ifbs);
+
+    for (i = 0; i < nb; i++)
+        free(viflist[i]);
+    free(viflist);
+
+}
+
 int main_remus(int argc, char **argv)
 {
     uint32_t domid;
-    int opt, rc, daemonize = 1;
+    int opt, rc, daemonize = 1, netbuf = 0, num_ifbs = 0;
     const char *ssh_command = "ssh";
     char *host = NULL, *rune = NULL;
     libxl_domain_remus_info r_info;
@@ -7057,7 +7156,7 @@ int main_remus(int argc, char **argv)
     r_info.blackhole = 0;
     r_info.compression = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "bui:s:en", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7073,6 +7172,9 @@ int main_remus(int argc, char **argv)
     case 'e':
         daemonize = 0;
         break;
+    case 'n':
+        netbuf = 1;
+        break;
     }
 
     domid = find_domain(argv[optind]);
@@ -7109,6 +7211,19 @@ int main_remus(int argc, char **argv)
                             rune);
     }
 
+    if (netbuf) {
+        rc = remus_setup_network_buffers(domid,
+                                         (char ***)(&r_info.netbuf_iflist),
+                                         &num_ifbs);
+        if (rc) {
+            fprintf(stderr, "Failed to properly setup network "
+                    "buffering. Exiting..\n");
+            close(send_fd);
+            return -ERROR_FAIL;
+        } else
+            fprintf(stderr, "Network buffers setup on %d interfaces\n", num_ifbs);
+    }
+
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
@@ -7126,6 +7241,9 @@ int main_remus(int argc, char **argv)
         libxl_domain_resume(ctx, domid, 1, 0);
     }
 
+    if (netbuf)
+        remus_teardown_network_buffers(domid, (char **)r_info.netbuf_iflist, num_ifbs);
+
     close(send_fd);
     return -ERROR_FAIL;
 }
diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c	Wed Jul 24 22:55:00 2013 -0700
+++ b/tools/libxl/xl_cmdtable.c	Thu Jul 25 00:02:19 2013 -0700
@@ -485,8 +485,8 @@ struct cmd_spec cmd_table[] = {
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
       "-e                      Do not wait in the background (on <host>) for the death\n"
-      "                        of the domain."
-
+      "                        of the domain.\n"
+      "-n                      Enable Remus network buffering\n"
     },
 };

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
  2013-07-25  7:09 [PATCH 0 of 4 RFC] xl - Remus network buffering support Shriram Rajagopalan
  2013-07-25  7:09 ` [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions Shriram Rajagopalan
  2013-07-25  7:09 ` [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown Shriram Rajagopalan
@ 2013-07-25  7:09 ` Shriram Rajagopalan
  2013-07-29 16:06   ` Ian Campbell
  2013-08-07 15:38   ` Ian Jackson
  2013-07-25  7:09 ` [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile Shriram Rajagopalan
  2013-07-30 16:11 ` [PATCH 0 of 4 RFC] xl - Remus network buffering support Roger Pau Monné
  4 siblings, 2 replies; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-25  7:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell, Stefano Stabellini

This patch constitutes the core network buffering logic.
Libxl would receive a list of IFB devices that collectively act as
network buffering devices, for a given guest. Also, libxl expects that
every ifb device in the list supplied by a toolstack (netbuf_iflist)
should have a plug qdisc installed.

This patch does the following:
 a) establish a dedicated remus context containing libnl related
    state (netlink sockets, qdisc caches, etc.,)
 b) Obtain handles to plug qdiscs installed on the supplied list of
    IFB devices
 c) add a new network buffer (i.e., create a new one) when the domain is
    suspended (remus_domain_suspend_callback)
 d) release the network buffer pertaining to the acknowledged checkpoint
    in remus_domain_checkpoint_dm_saved, which is invoked for both PV & HVM.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c	Thu Jul 25 00:02:19 2013 -0700
+++ b/tools/libxl/libxl_dom.c	Thu Jul 25 00:02:22 2013 -0700
@@ -20,6 +20,7 @@
 #include "libxl_internal.h"
 #include "libxl_arch.h"
 
+#include <netlink/route/qdisc/plug.h>
 #include <xc_dom.h>
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/hvm_xs_strings.h>
@@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid
 
 /*----- remus callbacks -----*/
 
+/* REMUS TODO: Issue disk checkpoint reqs. */
 static int libxl__remus_domain_suspend_callback(void *data)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
-    return libxl__domain_suspend_common_callback(data);
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    STATE_AO_GC(dss->ao);
+
+    int is_suspended = 0, i, ret;
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    is_suspended = libxl__domain_suspend_common_callback(data);
+    if (!remus_ctx->num_netbufs) return is_suspended;
+
+    if (is_suspended) {
+        for (i = 0; i < remus_ctx->num_netbufs; ++i) {
+            ret = rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]);
+            if (!ret)
+                ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],
+                                     NLM_F_REQUEST);
+            if (ret) {
+                LOG(ERROR, "Cannot create new buffer on %s:%s",
+                    dss->remus->netbuf_iflist[i], nl_geterror(ret));
+                return 0;
+            }
+        }
+    }
+
+    return is_suspended;
 }
 
+/* REMUS TODO: Deal with disk. */
 static int libxl__remus_domain_resume_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
@@ -1228,7 +1253,6 @@ static int libxl__remus_domain_resume_ca
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
         return 0;
 
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
     return 1;
 }
 
@@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* REMUS TODO: Wait for disk and explicit memory ack (through restore
+       callback from remote) before release network buffer. */
+    int i, ret;
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    STATE_AO_GC(dss->ao);
+
+    assert(!rc);
+
+    if (remus_ctx->num_netbufs  > 0) {
+        for (i = 0; i < remus_ctx->num_netbufs; ++i) {
+            ret = rtnl_qdisc_plug_release_one(remus_ctx->netbuf_qdisc_list[i]);
+            if (!ret)
+                ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],
+                                     NLM_F_REQUEST);
+            if (ret) {
+                LOG(ERROR, "Cannot release buffer from %s:%s",
+                    dss->remus->netbuf_iflist[i], nl_geterror(ret));
+                ret= 0;
+                break;
+            }
+        }
+    }
+
+    usleep(dss->remus_ctx->interval * 1000);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
+static libxl__remus_ctx *setup_remus_ctx(libxl__gc *gc,
+                                         const libxl_domain_remus_info *r_info)
+{
+    libxl__remus_ctx *remus_ctx = NULL;
+    int num_netbufs = 0, i;
+    struct nl_cache *qdisc_cache = NULL;
+    struct rtnl_link *ifb = NULL;
+    struct nl_sock *nlsock = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+    libxl_string_list l;
+    int ifindex;
+
+    remus_ctx = libxl__zalloc(gc, sizeof(libxl__remus_ctx));
+    remus_ctx->interval = r_info->interval;
+
+    l = r_info->netbuf_iflist;
+    num_netbufs = libxl_string_list_length(&l);
+    if (!num_netbufs) return remus_ctx;
+
+    nlsock = nl_socket_alloc();
+    if (!nlsock) {
+        LOG(ERROR, "setup_remus_ctx: cannot allocate nl socket");
+        return NULL;
+    }
+
+    nl_connect(nlsock, NETLINK_ROUTE);
+
+    if (rtnl_qdisc_alloc_cache(nlsock, &qdisc_cache) < 0) {
+        LOG(ERROR, "setup_remus_ctx: failed to allocate qdisc cache");
+        goto end;
+    }
+
+    remus_ctx->netbuf_qdisc_list = libxl__calloc(gc, num_netbufs + 1, 
+                                                 sizeof(struct rtnl_qdisc *));
+    remus_ctx->num_netbufs = num_netbufs;
+    remus_ctx->nlsock = nlsock;
+    remus_ctx->qdisc_cache = qdisc_cache;
+    ifb = rtnl_link_alloc();
+
+    for (i = 0; i < num_netbufs; ++i) {
+
+        if (rtnl_link_get_kernel(nlsock, 0, l[i], &ifb) < 0) {
+            LOG(ERROR, "setup_remus_ctx: cannot obtain handle for %s", l[i]);
+            goto end;
+        }
+
+        ifindex = rtnl_link_get_ifindex(ifb);
+        if (!ifindex) {
+            LOG(ERROR, "setup_remus_ctx: invalid interface %s", l[i]);
+            goto end;
+        }
+
+        qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT);
+        if (!qdisc || strcmp(rtnl_tc_get_kind(TC_CAST(qdisc)), "plug")) {
+            LOG(ERROR, "setup_remus_ctx: plug qdisc is not installed on %s", l[i]);
+            goto end;
+        }
+
+        remus_ctx->netbuf_qdisc_list[i] = qdisc;
+    }
+
+    rtnl_link_put(ifb);
+    return remus_ctx;
+
+ end:
+    if (ifb) rtnl_link_put(ifb);
+    if (qdisc_cache) nl_cache_free(qdisc_cache);
+    if (nlsock) nl_close(nlsock);
+    return NULL;
+}
+
 /*----- main code for suspending, in order of execution -----*/
 
 void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
@@ -1312,7 +1427,12 @@ void libxl__domain_suspend(libxl__egc *e
     dss->dm_savefile = libxl__device_model_savefile(gc, domid);
 
     if (r_info != NULL) {
-        dss->interval = r_info->interval;
+        /* This suspend is for Remus. We need to get a handle on
+         * the network output buffers and setup the remus_ctx;
+         */
+        dss->remus_ctx = setup_remus_ctx(gc, r_info);
+        if (!dss->remus_ctx)
+            goto out;
         if (r_info->compression)
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
     }
diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h	Thu Jul 25 00:02:19 2013 -0700
+++ b/tools/libxl/libxl_internal.h	Thu Jul 25 00:02:22 2013 -0700
@@ -44,6 +44,13 @@
 #include <sys/wait.h>
 #include <sys/socket.h>
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+
 #include <xenstore.h>
 #include <xenctrl.h>
 #include <xenguest.h>
@@ -2242,6 +2249,18 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+typedef struct libxl__remus_ctx {
+    /* checkpoint interval */
+    int interval;
+    /* array of plug qdisc pointers, that hold
+     * network output from the guest's vifs.
+     */
+    int num_netbufs;
+    struct rtnl_qdisc **netbuf_qdisc_list;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+} libxl__remus_ctx;
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2260,7 +2279,7 @@ struct libxl__domain_suspend_state {
     int xcflags;
     int guest_responded;
     const char *dm_savefile;
-    int interval; /* checkpoint interval (for Remus) */
+    libxl__remus_ctx *remus_ctx;
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-25  7:09 [PATCH 0 of 4 RFC] xl - Remus network buffering support Shriram Rajagopalan
                   ` (2 preceding siblings ...)
  2013-07-25  7:09 ` [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks Shriram Rajagopalan
@ 2013-07-25  7:09 ` Shriram Rajagopalan
  2013-07-26  9:44   ` Wen Congyang
  2013-07-26  9:56   ` David Vrabel
  2013-07-30 16:11 ` [PATCH 0 of 4 RFC] xl - Remus network buffering support Roger Pau Monné
  4 siblings, 2 replies; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-25  7:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell, Stefano Stabellini

Add dependency on libnl3 version 3.2.17 or higher to autoconf.
Add include flags and link to relevant libraries in tools/libxl/Makefile.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac
--- a/tools/configure.ac	Thu Jul 25 00:02:22 2013 -0700
+++ b/tools/configure.ac	Thu Jul 25 00:02:33 2013 -0700
@@ -171,4 +171,12 @@ AC_SUBST(libiconv)
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
 
+# Checks for libnl3 libraries and headers.
+PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no])
+if (test "${have_libnl3}" = "yes"); then
+        CFLAGS+="-I$LIBNL3_CFLAGS"
+else
+	AC_MSG_ERROR([Need libnl version 3.2.17 or higher])
+fi
+
 AC_OUTPUT()
diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile
--- a/tools/libxl/Makefile	Thu Jul 25 00:02:22 2013 -0700
+++ b/tools/libxl/Makefile	Thu Jul 25 00:02:33 2013 -0700
@@ -13,7 +13,7 @@ XLUMINOR = 0
 
 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
 	-Wno-declaration-after-statement -Wformat-nonliteral
-CFLAGS += -I. -fPIC
+CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/
 
 ifeq ($(CONFIG_Linux),y)
 LIBUUID_LIBS += -luuid
@@ -29,7 +29,7 @@ CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
 CFLAGS_LIBXL += -Wshadow
 
 CFLAGS += $(PTHREAD_CFLAGS)
-LDFLAGS += $(PTHREAD_LDFLAGS)
+LDFLAGS += $(PTHREAD_LDFLAGS) -L/usr/local/lib/libnl3/
 LIBXL_LIBS += $(PTHREAD_LIBS)
 
 LIBXLU_LIBS =
@@ -68,6 +68,7 @@ ifeq ($(BISON),)
 endif
 
 LIBXL_LIBS += -lyajl
+LIBXL_LIBS += -lnl-3 -lnl-route-3
 
 LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
@@ -92,7 +93,7 @@ CLIENTS = xl testidl libxl-save-helper
 CFLAGS_XL += $(CFLAGS_libxenlight)
 CFLAGS_XL += -Wshadow
 
-XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o
+XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o xl_netbuf.o
 $(XL_OBJS) _libxl.api-for-check: \
             CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
 $(XL_OBJS): CFLAGS += $(CFLAGS_XL)
@@ -189,7 +190,7 @@ libxlutil.a: $(LIBXLU_OBJS)
 	$(AR) rcs libxlutil.a $^
 
 xl: $(XL_OBJS) libxlutil.so libxenlight.so
-	$(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS)
+	$(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl -lnl-3 -lnl-route-3 $(APPEND_LDFLAGS)
 
 libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
 	$(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-25  7:09 ` [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile Shriram Rajagopalan
@ 2013-07-26  9:44   ` Wen Congyang
  2013-07-26 13:51     ` Shriram Rajagopalan
  2013-07-26  9:56   ` David Vrabel
  1 sibling, 1 reply; 27+ messages in thread
From: Wen Congyang @ 2013-07-26  9:44 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

At 07/25/2013 03:09 PM, Shriram Rajagopalan Wrote:
> Add dependency on libnl3 version 3.2.17 or higher to autoconf.
> Add include flags and link to relevant libraries in tools/libxl/Makefile.

Is it OK to use libnl, not libnl3?
Some systems(For example: RHEL6) don't have libnl3.

> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac
> --- a/tools/configure.ac	Thu Jul 25 00:02:22 2013 -0700
> +++ b/tools/configure.ac	Thu Jul 25 00:02:33 2013 -0700
> @@ -171,4 +171,12 @@ AC_SUBST(libiconv)
>  # Checks for header files.
>  AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
>  
> +# Checks for libnl3 libraries and headers.
> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no])
> +if (test "${have_libnl3}" = "yes"); then
> +        CFLAGS+="-I$LIBNL3_CFLAGS"
> +else
> +	AC_MSG_ERROR([Need libnl version 3.2.17 or higher])
> +fi
> +
>  AC_OUTPUT()
> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile
> --- a/tools/libxl/Makefile	Thu Jul 25 00:02:22 2013 -0700
> +++ b/tools/libxl/Makefile	Thu Jul 25 00:02:33 2013 -0700
> @@ -13,7 +13,7 @@ XLUMINOR = 0
>  
>  CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
>  	-Wno-declaration-after-statement -Wformat-nonliteral
> -CFLAGS += -I. -fPIC
> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/

Hmm, why use /usr/local?

Thanks
Wen Congyang

>  
>  ifeq ($(CONFIG_Linux),y)
>  LIBUUID_LIBS += -luuid
> @@ -29,7 +29,7 @@ CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
>  CFLAGS_LIBXL += -Wshadow
>  
>  CFLAGS += $(PTHREAD_CFLAGS)
> -LDFLAGS += $(PTHREAD_LDFLAGS)
> +LDFLAGS += $(PTHREAD_LDFLAGS) -L/usr/local/lib/libnl3/
>  LIBXL_LIBS += $(PTHREAD_LIBS)
>  
>  LIBXLU_LIBS =
> @@ -68,6 +68,7 @@ ifeq ($(BISON),)
>  endif
>  
>  LIBXL_LIBS += -lyajl
> +LIBXL_LIBS += -lnl-3 -lnl-route-3
>  
>  LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
>  			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
> @@ -92,7 +93,7 @@ CLIENTS = xl testidl libxl-save-helper
>  CFLAGS_XL += $(CFLAGS_libxenlight)
>  CFLAGS_XL += -Wshadow
>  
> -XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o
> +XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o xl_netbuf.o
>  $(XL_OBJS) _libxl.api-for-check: \
>              CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
>  $(XL_OBJS): CFLAGS += $(CFLAGS_XL)
> @@ -189,7 +190,7 @@ libxlutil.a: $(LIBXLU_OBJS)
>  	$(AR) rcs libxlutil.a $^
>  
>  xl: $(XL_OBJS) libxlutil.so libxenlight.so
> -	$(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS)
> +	$(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl -lnl-3 -lnl-route-3 $(APPEND_LDFLAGS)
>  
>  libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
>  	$(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-25  7:09 ` [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile Shriram Rajagopalan
  2013-07-26  9:44   ` Wen Congyang
@ 2013-07-26  9:56   ` David Vrabel
  2013-07-26 13:56     ` Shriram Rajagopalan
  1 sibling, 1 reply; 27+ messages in thread
From: David Vrabel @ 2013-07-26  9:56 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On 25/07/13 08:09, Shriram Rajagopalan wrote:
> Add dependency on libnl3 version 3.2.17 or higher to autoconf.
> Add include flags and link to relevant libraries in tools/libxl/Makefile.
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac
> --- a/tools/configure.ac	Thu Jul 25 00:02:22 2013 -0700
> +++ b/tools/configure.ac	Thu Jul 25 00:02:33 2013 -0700
> @@ -171,4 +171,12 @@ AC_SUBST(libiconv)
>  # Checks for header files.
>  AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
>  
> +# Checks for libnl3 libraries and headers.
> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no])
> +if (test "${have_libnl3}" = "yes"); then
> +        CFLAGS+="-I$LIBNL3_CFLAGS"
> +else
> +	AC_MSG_ERROR([Need libnl version 3.2.17 or higher])
> +fi

Using pkg-config here...

>  AC_OUTPUT()
> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile
> --- a/tools/libxl/Makefile	Thu Jul 25 00:02:22 2013 -0700
> +++ b/tools/libxl/Makefile	Thu Jul 25 00:02:33 2013 -0700
> @@ -13,7 +13,7 @@ XLUMINOR = 0
>  
>  CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
>  	-Wno-declaration-after-statement -Wformat-nonliteral
> -CFLAGS += -I. -fPIC
> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/

Means you should need to include the -I option here.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-26  9:44   ` Wen Congyang
@ 2013-07-26 13:51     ` Shriram Rajagopalan
  0 siblings, 0 replies; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-26 13:51 UTC (permalink / raw)
  To: Wen Congyang; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On Fri, Jul 26, 2013 at 5:44 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
> At 07/25/2013 03:09 PM, Shriram Rajagopalan Wrote:
>> Add dependency on libnl3 version 3.2.17 or higher to autoconf.
>> Add include flags and link to relevant libraries in tools/libxl/Makefile.
>
> Is it OK to use libnl, not libnl3?
> Some systems(For example: RHEL6) don't have libnl3.
>

Unfortunately no. libnl3 is where the network buffering support
exists. And most of the API
used to setup network buffering were introduced in libnl3.

>>
>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>>
>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac
>> --- a/tools/configure.ac      Thu Jul 25 00:02:22 2013 -0700
>> +++ b/tools/configure.ac      Thu Jul 25 00:02:33 2013 -0700
>> @@ -171,4 +171,12 @@ AC_SUBST(libiconv)
>>  # Checks for header files.
>>  AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
>>
>> +# Checks for libnl3 libraries and headers.
>> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no])
>> +if (test "${have_libnl3}" = "yes"); then
>> +        CFLAGS+="-I$LIBNL3_CFLAGS"
>> +else
>> +     AC_MSG_ERROR([Need libnl version 3.2.17 or higher])
>> +fi
>> +
>>  AC_OUTPUT()
>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile
>> --- a/tools/libxl/Makefile    Thu Jul 25 00:02:22 2013 -0700
>> +++ b/tools/libxl/Makefile    Thu Jul 25 00:02:33 2013 -0700
>> @@ -13,7 +13,7 @@ XLUMINOR = 0
>>
>>  CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
>>       -Wno-declaration-after-statement -Wformat-nonliteral
>> -CFLAGS += -I. -fPIC
>> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/
>
> Hmm, why use /usr/local?
>

That was a hack, as stated in the introductory email. I am not well versed with
autoconf. Using pkg-config and adding the libraries to APPEND_INCLUDES
didnt do the job.

shriram

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-26  9:56   ` David Vrabel
@ 2013-07-26 13:56     ` Shriram Rajagopalan
  2013-07-29  5:58       ` Wen Congyang
  0 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-26 13:56 UTC (permalink / raw)
  To: David Vrabel; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On Fri, Jul 26, 2013 at 5:56 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 25/07/13 08:09, Shriram Rajagopalan wrote:
>> Add dependency on libnl3 version 3.2.17 or higher to autoconf.
>> Add include flags and link to relevant libraries in tools/libxl/Makefile.
>>
>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>>
>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac
>> --- a/tools/configure.ac      Thu Jul 25 00:02:22 2013 -0700
>> +++ b/tools/configure.ac      Thu Jul 25 00:02:33 2013 -0700
>> @@ -171,4 +171,12 @@ AC_SUBST(libiconv)
>>  # Checks for header files.
>>  AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
>>
>> +# Checks for libnl3 libraries and headers.
>> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no])
>> +if (test "${have_libnl3}" = "yes"); then
>> +        CFLAGS+="-I$LIBNL3_CFLAGS"
>> +else
>> +     AC_MSG_ERROR([Need libnl version 3.2.17 or higher])
>> +fi
>
> Using pkg-config here...
>

The pkg-config idea was based on the suggestion here
http://www.carisma.slowglass.com/~tgr/libnl/doc/core.html#_linking_to_this_library

If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the
include path to CFLAGS, such that I wouldnt have to manually specify
-I flags in libxl/Makefile ?

>>  AC_OUTPUT()
>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile
>> --- a/tools/libxl/Makefile    Thu Jul 25 00:02:22 2013 -0700
>> +++ b/tools/libxl/Makefile    Thu Jul 25 00:02:33 2013 -0700
>> @@ -13,7 +13,7 @@ XLUMINOR = 0
>>
>>  CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
>>       -Wno-declaration-after-statement -Wformat-nonliteral
>> -CFLAGS += -I. -fPIC
>> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/
>
> Means you should need to include the -I option here.
>
> David
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-26 13:56     ` Shriram Rajagopalan
@ 2013-07-29  5:58       ` Wen Congyang
  2013-07-29 13:07         ` Shriram Rajagopalan
  0 siblings, 1 reply; 27+ messages in thread
From: Wen Congyang @ 2013-07-29  5:58 UTC (permalink / raw)
  To: rshriram; +Cc: Ian Campbell, xen-devel, David Vrabel, Stefano Stabellini

At 07/26/2013 09:56 PM, Shriram Rajagopalan Wrote:
> On Fri, Jul 26, 2013 at 5:56 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 25/07/13 08:09, Shriram Rajagopalan wrote:
>>> Add dependency on libnl3 version 3.2.17 or higher to autoconf.
>>> Add include flags and link to relevant libraries in tools/libxl/Makefile.
>>>
>>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>>>
>>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/configure.ac
>>> --- a/tools/configure.ac      Thu Jul 25 00:02:22 2013 -0700
>>> +++ b/tools/configure.ac      Thu Jul 25 00:02:33 2013 -0700
>>> @@ -171,4 +171,12 @@ AC_SUBST(libiconv)
>>>  # Checks for header files.
>>>  AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
>>>
>>> +# Checks for libnl3 libraries and headers.
>>> +PKG_CHECK_MODULES(LIBNL3, libnl-3.0 >= 3.2.17 libnl-route-3.0 >= 3.2.17, [have_libnl3=yes], [have_libnl3=no])
>>> +if (test "${have_libnl3}" = "yes"); then
>>> +        CFLAGS+="-I$LIBNL3_CFLAGS"
>>> +else
>>> +     AC_MSG_ERROR([Need libnl version 3.2.17 or higher])
>>> +fi
>>
>> Using pkg-config here...
>>
> 
> The pkg-config idea was based on the suggestion here
> http://www.carisma.slowglass.com/~tgr/libnl/doc/core.html#_linking_to_this_library
> 
> If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the
> include path to CFLAGS, such that I wouldnt have to manually specify
> -I flags in libxl/Makefile ?

No. AC_CHECK_HEADERS can check if the header file is under the default directory.
If the header file is under the directory /usr/include, you can use it.
But why your header file is under the directory /usr/local/include?

Thanks
Wen Congyang
> 
>>>  AC_OUTPUT()
>>> diff -r bef729fc4336 -r c4c05e4e4e02 tools/libxl/Makefile
>>> --- a/tools/libxl/Makefile    Thu Jul 25 00:02:22 2013 -0700
>>> +++ b/tools/libxl/Makefile    Thu Jul 25 00:02:33 2013 -0700
>>> @@ -13,7 +13,7 @@ XLUMINOR = 0
>>>
>>>  CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
>>>       -Wno-declaration-after-statement -Wformat-nonliteral
>>> -CFLAGS += -I. -fPIC
>>> +CFLAGS += -I. -fPIC -I /usr/local/include/libnl3/
>>
>> Means you should need to include the -I option here.
>>
>> David
>>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-29  5:58       ` Wen Congyang
@ 2013-07-29 13:07         ` Shriram Rajagopalan
  2013-07-29 15:41           ` David Vrabel
  0 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-29 13:07 UTC (permalink / raw)
  To: Wen Congyang; +Cc: Ian Campbell, xen-devel, David Vrabel, Stefano Stabellini

On Mon, Jul 29, 2013 at 1:58 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>> If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the
>> include path to CFLAGS, such that I wouldnt have to manually specify
>> -I flags in libxl/Makefile ?
>
> No. AC_CHECK_HEADERS can check if the header file is under the default directory.
> If the header file is under the directory /usr/include, you can use it.
> But why your header file is under the directory /usr/local/include?
>

I think its because I installed libnl3 in /usr/local, from the sources.
If there is a proper way to specify custom install directories for
certain libraries,
then I dont think we need that hard coded -I /usr/local/include/libnl3

shriram

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile
  2013-07-29 13:07         ` Shriram Rajagopalan
@ 2013-07-29 15:41           ` David Vrabel
  0 siblings, 0 replies; 27+ messages in thread
From: David Vrabel @ 2013-07-29 15:41 UTC (permalink / raw)
  To: rshriram; +Cc: xen-devel, Ian Campbell, Wen Congyang, Stefano Stabellini

On 29/07/13 14:07, Shriram Rajagopalan wrote:
> On Mon, Jul 29, 2013 at 1:58 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>>> If I use AC_CHECK_LIB and AC_CHECK_HEADERS, would it automatically add the
>>> include path to CFLAGS, such that I wouldnt have to manually specify
>>> -I flags in libxl/Makefile ?
>>
>> No. AC_CHECK_HEADERS can check if the header file is under the default directory.
>> If the header file is under the directory /usr/include, you can use it.
>> But why your header file is under the directory /usr/local/include?
>>
> 
> I think its because I installed libnl3 in /usr/local, from the sources.
> If there is a proper way to specify custom install directories for
> certain libraries,
> then I dont think we need that hard coded -I /usr/local/include/libnl3

You need to tell pkg-config where to find the .pc file.  I don't recall
the details for how to do this. PKG_CONFIG_PATH perhaps?

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
  2013-07-25  7:09 ` [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions Shriram Rajagopalan
@ 2013-07-29 15:42   ` Ian Campbell
  2013-07-29 18:00     ` Shriram Rajagopalan
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-07-29 15:42 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, xen-devel

On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
> The functions contained in xl_netbuf.c are built on top of libnl3 API.
> They setup the necessary infrastructure required to for network output
> buffering in Remus.  In xend, the Remus python code performed this setup
> by invoking shell commands.  This code is built purely on top of APIs
> supplied by libnl3.0.
> 
> There are two public helper functions:
> 1) remus_install_netbuf_on_dev to install a network buffer on a vif.
> Its basically equivalent to:
> a)find free ifb (say ifb0)
> b)ip link set ifb0 up
> c)tc qdisc add dev vif1.0 ingress
> d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \
> 	action mirred egress redirect dev ifb0
> e)tc qdisc add dev ifb0 root plug
> f)get handle to plug qdisc and control it programmatically after suspending
>   a VM/receiving checkpoint ack from remote
> 
> 
> 2) remus_uninstall_netbufs to remove all
> network buffers and ingress qdiscs from the supplied interface list.
> 
> I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building
> on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection)
> and it is pretty complicated to implement. So I am currently resorting to system() lib call
> to get the job done.
> 
> N.B. This implementation is just xl (cmdline utility) specific.

You haven't included a Makefile change so I can't easily tell if this is
really part of the xl binary, or a separate helper or indeed part of
libxl.

> Other toolstacks may choose
> to reuse this code or do this setup in their own way (scripts/shell commands, etc.).

It's not clear to me why this code wouldn't be part of libxl to be
called by xl and other toolstacks as needed. Why would a toolstack want
to do this differently?

The point of libxl is precisely to relieve toolstacks of the burden of
doing these sorts of complex domain setup activities themselves by
putting them in a common library, I can't see why Remus setup would be
an exception to this.

I think we need to answer this question before reviewing this code,
since doing it libxl would change the requirements of the code (from the
API exposed, to logging, to memory managment, to not using system
directly).

> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
  2013-07-25  7:09 ` [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown Shriram Rajagopalan
@ 2013-07-29 15:49   ` Ian Campbell
  2013-07-29 19:00     ` Shriram Rajagopalan
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-07-29 15:49 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, xen-devel

On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
> Add appropriate code to xl_cmdline.c to setup network buffers for
> each vif belonging to the guest.  Also provide a command line switch
> to explicitly "enable" network buffering.
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl
> --- a/tools/libxl/libxl_types.idl	Wed Jul 24 22:55:00 2013 -0700
> +++ b/tools/libxl/libxl_types.idl	Thu Jul 25 00:02:19 2013 -0700
> @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain
>      ("interval",     integer),
>      ("blackhole",    bool),
>      ("compression",  bool),
> +    ("netbuf_iflist", libxl_string_list),
>      ])
>  
>  libxl_event_type = Enumeration("event_type", [
> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c	Wed Jul 24 22:55:00 2013 -0700
> +++ b/tools/libxl/xl_cmdimpl.c	Thu Jul 25 00:02:19 2013 -0700
> @@ -7039,10 +7039,109 @@ done:
>      return ret;
>  }
>  
> +static char **get_guest_vifnames(uint32_t domid, int *num_vifs)
> +{
> +    char **viflist;
> +    libxl_device_nic *nics;
> +    libxl_nicinfo nicinfo;
> +    int nb, i;
> +
> +    nics = libxl_device_nic_list(ctx, domid, &nb);
> +    if (!nics) { *num_vifs = 0; return NULL;}
> +
> +    viflist = calloc((nb + 1), sizeof(char *));
> +    if (!viflist) {
> +        perror("failed to allocate memory to hold vif names!");
> +        exit(-1);
> +    }
> +
> +    for (i = 0; i < nb; ++i) {
> +        if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo))  {
> +            if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) {

This doesn't account for the ifname field of the vif.

Also, I'm not sure how this is supposed to work when driver domains are
in use either, if this were done via libxl then it would naturally get
incorporated into Roger's work to make hotplug scripts etc work properly
with stub domains.

Most of the other comments I had would become invalid/irrelevant when
this was moved to libxl, since you'd naturally end up doing things
differently anyway.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
  2013-07-25  7:09 ` [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks Shriram Rajagopalan
@ 2013-07-29 16:06   ` Ian Campbell
  2013-08-07 15:41     ` Ian Jackson
  2013-08-07 15:38   ` Ian Jackson
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-07-29 16:06 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, Ian Jackson, xen-devel

On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
> This patch constitutes the core network buffering logic.
> Libxl would receive a list of IFB devices that collectively act as
> network buffering devices, for a given guest. Also, libxl expects that
> every ifb device in the list supplied by a toolstack (netbuf_iflist)
> should have a plug qdisc installed.
> 
> This patch does the following:
>  a) establish a dedicated remus context containing libnl related
>     state (netlink sockets, qdisc caches, etc.,)
>  b) Obtain handles to plug qdiscs installed on the supplied list of
>     IFB devices
>  c) add a new network buffer (i.e., create a new one) when the domain is
>     suspended (remus_domain_suspend_callback)
>  d) release the network buffer pertaining to the acknowledged checkpoint
>     in remus_domain_checkpoint_dm_saved, which is invoked for both PV & HVM.
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

Please can you also CC Ian.Jackson@eu.citrix.com on toolstack patches
such as this.

This really needs Ian's review WRT the lifetime of the remus context
object vs. the ao machinery. IOW I'm not sure if this is the appropriate
gc to be using or if one associated with the ao context isn't required.

On the other hand by simply embedding the struct (rather than a pointer
to the struct) a bunch of those worries might go away.

> 
> diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c	Thu Jul 25 00:02:19 2013 -0700
> +++ b/tools/libxl/libxl_dom.c	Thu Jul 25 00:02:22 2013 -0700
> @@ -20,6 +20,7 @@
>  #include "libxl_internal.h"
>  #include "libxl_arch.h"
>  
> +#include <netlink/route/qdisc/plug.h>
>  #include <xc_dom.h>
>  #include <xen/hvm/hvm_info_table.h>
>  #include <xen/hvm/hvm_xs_strings.h>
> @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid
>  
>  /*----- remus callbacks -----*/
>  
> +/* REMUS TODO: Issue disk checkpoint reqs. */
>  static int libxl__remus_domain_suspend_callback(void *data)
>  {
> -    /* REMUS TODO: Issue disk and network checkpoint reqs. */
> -    return libxl__domain_suspend_common_callback(data);
> +    libxl__save_helper_state *shs = data;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
> +    STATE_AO_GC(dss->ao);
> +
> +    int is_suspended = 0, i, ret;
> +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> +    is_suspended = libxl__domain_suspend_common_callback(data);
> +    if (!remus_ctx->num_netbufs) return is_suspended;
> +
> +    if (is_suspended) {
> +        for (i = 0; i < remus_ctx->num_netbufs; ++i) {
> +            ret = rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]);
> +            if (!ret)
> +                ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],
> +                                     NLM_F_REQUEST);
> +            if (ret) {
> +                LOG(ERROR, "Cannot create new buffer on %s:%s",
> +                    dss->remus->netbuf_iflist[i], nl_geterror(ret));
> +                return 0;
> +            }
> +        }
> +    }
> +
> +    return is_suspended;
>  }
>  
> +/* REMUS TODO: Deal with disk. */
>  static int libxl__remus_domain_resume_callback(void *data)
>  {
>      libxl__save_helper_state *shs = data;
> @@ -1228,7 +1253,6 @@ static int libxl__remus_domain_resume_ca
>      if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>          return 0;
>  
> -    /* REMUS TODO: Deal with disk. Start a new network output buffer */
>      return 1;
>  }
>  
> @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi
>  static void remus_checkpoint_dm_saved(libxl__egc *egc,
>                                        libxl__domain_suspend_state *dss, int rc)
>  {
> -    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
> -    /* REMUS TODO: make this asynchronous */
> -    assert(!rc); /* REMUS TODO handle this error properly */
> -    usleep(dss->interval * 1000);
> +    /* REMUS TODO: Wait for disk and explicit memory ack (through restore
> +       callback from remote) before release network buffer. */
> +    int i, ret;
> +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> +    STATE_AO_GC(dss->ao);
> +
> +    assert(!rc);
> +
> +    if (remus_ctx->num_netbufs  > 0) {
> +        for (i = 0; i < remus_ctx->num_netbufs; ++i) {
> +            ret = rtnl_qdisc_plug_release_one(remus_ctx->netbuf_qdisc_list[i]);
> +            if (!ret)
> +                ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],
> +                                     NLM_F_REQUEST);
> +            if (ret) {
> +                LOG(ERROR, "Cannot release buffer from %s:%s",
> +                    dss->remus->netbuf_iflist[i], nl_geterror(ret));
> +                ret= 0;
> +                break;
> +            }
> +        }
> +    }
> +
> +    usleep(dss->remus_ctx->interval * 1000);
>      libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
>  }
>  
> +static libxl__remus_ctx *setup_remus_ctx(libxl__gc *gc,
> +                                         const libxl_domain_remus_info *r_info)
> +{
> +    libxl__remus_ctx *remus_ctx = NULL;
> +    int num_netbufs = 0, i;
> +    struct nl_cache *qdisc_cache = NULL;
> +    struct rtnl_link *ifb = NULL;
> +    struct nl_sock *nlsock = NULL;
> +    struct rtnl_qdisc *qdisc = NULL;
> +    libxl_string_list l;
> +    int ifindex;
> +
> +    remus_ctx = libxl__zalloc(gc, sizeof(libxl__remus_ctx));
> +    remus_ctx->interval = r_info->interval;
> +
> +    l = r_info->netbuf_iflist;
> +    num_netbufs = libxl_string_list_length(&l);
> +    if (!num_netbufs) return remus_ctx;
> +
> +    nlsock = nl_socket_alloc();
> +    if (!nlsock) {
> +        LOG(ERROR, "setup_remus_ctx: cannot allocate nl socket");
> +        return NULL;
> +    }
> +
> +    nl_connect(nlsock, NETLINK_ROUTE);
> +
> +    if (rtnl_qdisc_alloc_cache(nlsock, &qdisc_cache) < 0) {
> +        LOG(ERROR, "setup_remus_ctx: failed to allocate qdisc cache");
> +        goto end;
> +    }
> +
> +    remus_ctx->netbuf_qdisc_list = libxl__calloc(gc, num_netbufs + 1, 
> +                                                 sizeof(struct rtnl_qdisc *));
> +    remus_ctx->num_netbufs = num_netbufs;
> +    remus_ctx->nlsock = nlsock;
> +    remus_ctx->qdisc_cache = qdisc_cache;
> +    ifb = rtnl_link_alloc();
> +
> +    for (i = 0; i < num_netbufs; ++i) {
> +
> +        if (rtnl_link_get_kernel(nlsock, 0, l[i], &ifb) < 0) {
> +            LOG(ERROR, "setup_remus_ctx: cannot obtain handle for %s", l[i]);
> +            goto end;
> +        }
> +
> +        ifindex = rtnl_link_get_ifindex(ifb);
> +        if (!ifindex) {
> +            LOG(ERROR, "setup_remus_ctx: invalid interface %s", l[i]);
> +            goto end;
> +        }
> +
> +        qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT);
> +        if (!qdisc || strcmp(rtnl_tc_get_kind(TC_CAST(qdisc)), "plug")) {
> +            LOG(ERROR, "setup_remus_ctx: plug qdisc is not installed on %s", l[i]);
> +            goto end;
> +        }
> +
> +        remus_ctx->netbuf_qdisc_list[i] = qdisc;
> +    }
> +
> +    rtnl_link_put(ifb);
> +    return remus_ctx;
> +
> + end:
> +    if (ifb) rtnl_link_put(ifb);
> +    if (qdisc_cache) nl_cache_free(qdisc_cache);
> +    if (nlsock) nl_close(nlsock);
> +    return NULL;
> +}
> +
>  /*----- main code for suspending, in order of execution -----*/
>  
>  void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
> @@ -1312,7 +1427,12 @@ void libxl__domain_suspend(libxl__egc *e
>      dss->dm_savefile = libxl__device_model_savefile(gc, domid);
>  
>      if (r_info != NULL) {
> -        dss->interval = r_info->interval;
> +        /* This suspend is for Remus. We need to get a handle on
> +         * the network output buffers and setup the remus_ctx;
> +         */
> +        dss->remus_ctx = setup_remus_ctx(gc, r_info);
> +        if (!dss->remus_ctx)
> +            goto out;
>          if (r_info->compression)
>              dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
>      }
> diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_internal.h
> --- a/tools/libxl/libxl_internal.h	Thu Jul 25 00:02:19 2013 -0700
> +++ b/tools/libxl/libxl_internal.h	Thu Jul 25 00:02:22 2013 -0700
> @@ -44,6 +44,13 @@
>  #include <sys/wait.h>
>  #include <sys/socket.h>
>  
> +#include <netlink/cache.h>
> +#include <netlink/socket.h>
> +#include <netlink/attr.h>
> +#include <netlink/route/link.h>
> +#include <netlink/route/route.h>
> +#include <netlink/route/qdisc.h>
> +
>  #include <xenstore.h>
>  #include <xenctrl.h>
>  #include <xenguest.h>
> @@ -2242,6 +2249,18 @@ typedef struct libxl__logdirty_switch {
>      libxl__ev_time timeout;
>  } libxl__logdirty_switch;
>  
> +typedef struct libxl__remus_ctx {
> +    /* checkpoint interval */
> +    int interval;
> +    /* array of plug qdisc pointers, that hold
> +     * network output from the guest's vifs.
> +     */
> +    int num_netbufs;
> +    struct rtnl_qdisc **netbuf_qdisc_list;
> +    struct nl_sock *nlsock;
> +    struct nl_cache *qdisc_cache;
> +} libxl__remus_ctx;
> +
>  struct libxl__domain_suspend_state {
>      /* set by caller of libxl__domain_suspend */
>      libxl__ao *ao;
> @@ -2260,7 +2279,7 @@ struct libxl__domain_suspend_state {
>      int xcflags;
>      int guest_responded;
>      const char *dm_savefile;
> -    int interval; /* checkpoint interval (for Remus) */
> +    libxl__remus_ctx *remus_ctx;
>      libxl__save_helper_state shs;
>      libxl__logdirty_switch logdirty;
>      /* private for libxl__domain_save_device_model */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
  2013-07-29 15:42   ` Ian Campbell
@ 2013-07-29 18:00     ` Shriram Rajagopalan
  2013-07-30 10:44       ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-29 18:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Stefano Stabellini, xen-devel

On Mon, Jul 29, 2013 at 11:42 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
>> The functions contained in xl_netbuf.c are built on top of libnl3 API.
>> They setup the necessary infrastructure required to for network output
>> buffering in Remus.  In xend, the Remus python code performed this setup
>> by invoking shell commands.  This code is built purely on top of APIs
>> supplied by libnl3.0.
>>
>> There are two public helper functions:
>> 1) remus_install_netbuf_on_dev to install a network buffer on a vif.
>> Its basically equivalent to:
>> a)find free ifb (say ifb0)
>> b)ip link set ifb0 up
>> c)tc qdisc add dev vif1.0 ingress
>> d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \
>>       action mirred egress redirect dev ifb0
>> e)tc qdisc add dev ifb0 root plug
>> f)get handle to plug qdisc and control it programmatically after suspending
>>   a VM/receiving checkpoint ack from remote
>>
>>
>> 2) remus_uninstall_netbufs to remove all
>> network buffers and ingress qdiscs from the supplied interface list.
>>
>> I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building
>> on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection)
>> and it is pretty complicated to implement. So I am currently resorting to system() lib call
>> to get the job done.
>>
>> N.B. This implementation is just xl (cmdline utility) specific.
>
> You haven't included a Makefile change so I can't easily tell if this is
> really part of the xl binary, or a separate helper or indeed part of
> libxl.
>

The Makefile changes were clumped together in patch 4, along with changes to
configure.ac.

>> Other toolstacks may choose
>> to reuse this code or do this setup in their own way (scripts/shell commands, etc.).
>
> It's not clear to me why this code wouldn't be part of libxl to be
> called by xl and other toolstacks as needed. Why would a toolstack want
> to do this differently?
>
> The point of libxl is precisely to relieve toolstacks of the burden of
> doing these sorts of complex domain setup activities themselves by
> putting them in a common library, I can't see why Remus setup would be
> an exception to this.
>
> I think we need to answer this question before reviewing this code,
> since doing it libxl would change the requirements of the code (from the
> API exposed, to logging, to memory managment, to not using system
> directly).
>

 If every one had to do this in the same way, then I agree that this
code should be
moved into libxl.

The setup can be done using a a few lines of shell script or in a 100-200
line C code (like this patch).  Given that there is a choice, I didnt
see any point
in forcing all toolstacks to go the libxl way (resorting to its
internal c code for setting
up the buffers).  But if you think that we might as well make a sane
choice and abstract
this complexity from other toolstacks, I would be happy to push this
into libxl itself.

shriram

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
  2013-07-29 15:49   ` Ian Campbell
@ 2013-07-29 19:00     ` Shriram Rajagopalan
  2013-07-30 10:50       ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-29 19:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Stefano Stabellini, xen-devel

On Mon, Jul 29, 2013 at 11:49 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
>> Add appropriate code to xl_cmdline.c to setup network buffers for
>> each vif belonging to the guest.  Also provide a command line switch
>> to explicitly "enable" network buffering.
>>
>> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>>
>> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl
>> --- a/tools/libxl/libxl_types.idl     Wed Jul 24 22:55:00 2013 -0700
>> +++ b/tools/libxl/libxl_types.idl     Thu Jul 25 00:02:19 2013 -0700
>> @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain
>>      ("interval",     integer),
>>      ("blackhole",    bool),
>>      ("compression",  bool),
>> +    ("netbuf_iflist", libxl_string_list),
>>      ])
>>
>>  libxl_event_type = Enumeration("event_type", [
>> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c
>> --- a/tools/libxl/xl_cmdimpl.c        Wed Jul 24 22:55:00 2013 -0700
>> +++ b/tools/libxl/xl_cmdimpl.c        Thu Jul 25 00:02:19 2013 -0700
>> @@ -7039,10 +7039,109 @@ done:
>>      return ret;
>>  }
>>
>> +static char **get_guest_vifnames(uint32_t domid, int *num_vifs)
>> +{
>> +    char **viflist;
>> +    libxl_device_nic *nics;
>> +    libxl_nicinfo nicinfo;
>> +    int nb, i;
>> +
>> +    nics = libxl_device_nic_list(ctx, domid, &nb);
>> +    if (!nics) { *num_vifs = 0; return NULL;}
>> +
>> +    viflist = calloc((nb + 1), sizeof(char *));
>> +    if (!viflist) {
>> +        perror("failed to allocate memory to hold vif names!");
>> +        exit(-1);
>> +    }
>> +
>> +    for (i = 0; i < nb; ++i) {
>> +        if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo))  {
>> +            if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) {
>
> This doesn't account for the ifname field of the vif.
>

Ah thanks. good catch.. Will add a check for that.

> Also, I'm not sure how this is supposed to work when driver domains are
> in use either,

Remus wont work with driver domains, unless there are agents in each of the
driver domains, that are co-ordinated by the memory checkpoint code in dom0.

Irrespective of the hotplug scripts, Remus needs to control the IFB
network interface
attached to the Guest's vifs/tap devices. Given that all these
interfaces will be inside
a driver domain, which does not have a fast (not xenstore)
communication channel to dom0,
there is no way the memory checkpointing code can co-ordinate with the
driver domain in order
to buffer/release its network packets after each checkpoint.

One alternative would be to have a network agent running inside each
of these driver domains,
assuming that the driver domains would have network access (practical ? ).
The memory checkpoint code would have to control the IFB devices via
the network agents.

All this is in the long run.. The immediate goal is to get
network/disk buffering to work with xl.


> if this were done via libxl then it would naturally get
> incorporated into Roger's work to make hotplug scripts etc work properly
> with stub domains.
>
> Most of the other comments I had would become invalid/irrelevant when
> this was moved to libxl, since you'd naturally end up doing things
> differently anyway.
>

Even if the code moved to libxl, most of the setup would remain the
same, i.e. setup on demand
using C code or something. It does not make sense to setup an IFB
device for every domain
right from the boot. You will eventually run out of ifb devices in the system.

If you are suggesting that we invoke a hotplug script when "xl remus"
command is issued,
I dont mind doing so either. The code in libxl (to control the plug
qdisc) is not going to go away
in either case.

shriram

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions
  2013-07-29 18:00     ` Shriram Rajagopalan
@ 2013-07-30 10:44       ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2013-07-30 10:44 UTC (permalink / raw)
  To: rshriram; +Cc: Stefano Stabellini, xen-devel

On Mon, 2013-07-29 at 14:00 -0400, Shriram Rajagopalan wrote:
> On Mon, Jul 29, 2013 at 11:42 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
> >> The functions contained in xl_netbuf.c are built on top of libnl3 API.
> >> They setup the necessary infrastructure required to for network output
> >> buffering in Remus.  In xend, the Remus python code performed this setup
> >> by invoking shell commands.  This code is built purely on top of APIs
> >> supplied by libnl3.0.
> >>
> >> There are two public helper functions:
> >> 1) remus_install_netbuf_on_dev to install a network buffer on a vif.
> >> Its basically equivalent to:
> >> a)find free ifb (say ifb0)
> >> b)ip link set ifb0 up
> >> c)tc qdisc add dev vif1.0 ingress
> >> d)tc filter add dev vif1.0 parent ffff: proto ip pref 10 u32 match u32 0 0 \
> >>       action mirred egress redirect dev ifb0
> >> e)tc qdisc add dev ifb0 root plug
> >> f)get handle to plug qdisc and control it programmatically after suspending
> >>   a VM/receiving checkpoint ack from remote
> >>
> >>
> >> 2) remus_uninstall_netbufs to remove all
> >> network buffers and ingress qdiscs from the supplied interface list.
> >>
> >> I have managed to code up 5 of the 6 steps mentioned above, purely in C, by building
> >> on top of libnl3. There is currently no support in libnl3 to implement step (d) (redirection)
> >> and it is pretty complicated to implement. So I am currently resorting to system() lib call
> >> to get the job done.
> >>
> >> N.B. This implementation is just xl (cmdline utility) specific.
> >
> > You haven't included a Makefile change so I can't easily tell if this is
> > really part of the xl binary, or a separate helper or indeed part of
> > libxl.
> >
> 
> The Makefile changes were clumped together in patch 4, along with changes to
> configure.ac.

Please put these sorts of changes alongside the code changes, that might
mean reordering your series etc.

> >> Other toolstacks may choose
> >> to reuse this code or do this setup in their own way (scripts/shell commands, etc.).
> >
> > It's not clear to me why this code wouldn't be part of libxl to be
> > called by xl and other toolstacks as needed. Why would a toolstack want
> > to do this differently?
> >
> > The point of libxl is precisely to relieve toolstacks of the burden of
> > doing these sorts of complex domain setup activities themselves by
> > putting them in a common library, I can't see why Remus setup would be
> > an exception to this.
> >
> > I think we need to answer this question before reviewing this code,
> > since doing it libxl would change the requirements of the code (from the
> > API exposed, to logging, to memory managment, to not using system
> > directly).
> >
> 
> If every one had to do this in the same way, then I agree that this
> code should be moved into libxl.
> 
> The setup can be done using a a few lines of shell script or in a 100-200
> line C code (like this patch).  Given that there is a choice, I didnt
> see any point
> in forcing all toolstacks to go the libxl way (resorting to its
> internal c code for setting
> up the buffers).  But if you think that we might as well make a sane
> choice and abstract
> this complexity from other toolstacks, I would be happy to push this
> into libxl itself.

If you think the sane choice is for everyone to do this the same way
then please do it in libxl.

Realistically I don't imagine other toolstacks will implement Remus
unless you either make it a no-brainer (by making it a trivial to use
libxl interface) or you do it for them.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
  2013-07-29 19:00     ` Shriram Rajagopalan
@ 2013-07-30 10:50       ` Ian Campbell
  2013-07-30 15:25         ` Shriram Rajagopalan
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-07-30 10:50 UTC (permalink / raw)
  To: rshriram; +Cc: Stefano Stabellini, Roger Pau Monne, xen-devel

On Mon, 2013-07-29 at 15:00 -0400, Shriram Rajagopalan wrote:
> On Mon, Jul 29, 2013 at 11:49 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Thu, 2013-07-25 at 00:09 -0700, Shriram Rajagopalan wrote:
> >> Add appropriate code to xl_cmdline.c to setup network buffers for
> >> each vif belonging to the guest.  Also provide a command line switch
> >> to explicitly "enable" network buffering.
> >>
> >> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> >>
> >> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/libxl_types.idl
> >> --- a/tools/libxl/libxl_types.idl     Wed Jul 24 22:55:00 2013 -0700
> >> +++ b/tools/libxl/libxl_types.idl     Thu Jul 25 00:02:19 2013 -0700
> >> @@ -521,6 +521,7 @@ libxl_domain_remus_info = Struct("domain
> >>      ("interval",     integer),
> >>      ("blackhole",    bool),
> >>      ("compression",  bool),
> >> +    ("netbuf_iflist", libxl_string_list),
> >>      ])
> >>
> >>  libxl_event_type = Enumeration("event_type", [
> >> diff -r 3ae38cbe535c -r 3cd67f6ff63a tools/libxl/xl_cmdimpl.c
> >> --- a/tools/libxl/xl_cmdimpl.c        Wed Jul 24 22:55:00 2013 -0700
> >> +++ b/tools/libxl/xl_cmdimpl.c        Thu Jul 25 00:02:19 2013 -0700
> >> @@ -7039,10 +7039,109 @@ done:
> >>      return ret;
> >>  }
> >>
> >> +static char **get_guest_vifnames(uint32_t domid, int *num_vifs)
> >> +{
> >> +    char **viflist;
> >> +    libxl_device_nic *nics;
> >> +    libxl_nicinfo nicinfo;
> >> +    int nb, i;
> >> +
> >> +    nics = libxl_device_nic_list(ctx, domid, &nb);
> >> +    if (!nics) { *num_vifs = 0; return NULL;}
> >> +
> >> +    viflist = calloc((nb + 1), sizeof(char *));
> >> +    if (!viflist) {
> >> +        perror("failed to allocate memory to hold vif names!");
> >> +        exit(-1);
> >> +    }
> >> +
> >> +    for (i = 0; i < nb; ++i) {
> >> +        if (!libxl_device_nic_getinfo(ctx, domid, &nics[i], &nicinfo))  {
> >> +            if (asprintf(&viflist[i], "vif%u.%d", domid, nicinfo.devid) < 0) {
> >
> > This doesn't account for the ifname field of the vif.
> >
> 
> Ah thanks. good catch.. Will add a check for that.

This will probably be easier once you move it into libxl since you can
do it at a place where this sort of information is already in hand, I'd
think.

> > Also, I'm not sure how this is supposed to work when driver domains are
> > in use either,
> 
> Remus wont work with driver domains, unless there are agents in each of the
> driver domains, that are co-ordinated by the memory checkpoint code in dom0.

That doesn't seem to be to hard to arrange. We are already thinking of a
"xenbackendd" daemon which runs hotplug scripts in both dom0 and domU in
order to reduce our dependency on udev (in dom0 the daemon may actually
be libxl directly).

> Irrespective of the hotplug scripts, Remus needs to control the IFB
> network interface
> attached to the Guest's vifs/tap devices. Given that all these
> interfaces will be inside
> a driver domain, which does not have a fast (not xenstore)
> communication channel to dom0,
> there is no way the memory checkpointing code can co-ordinate with the
> driver domain in order
> to buffer/release its network packets after each checkpoint.

It doesn't seem implausible that the toolstack and that daemon might not
also negotiate a memory-checkpoint evtchn or something like that for a
given domain.

> One alternative would be to have a network agent running inside each
> of these driver domains,
> assuming that the driver domains would have network access (practical ? ).
> The memory checkpoint code would have to control the IFB devices via
> the network agents.
> 
> All this is in the long run.. The immediate goal is to get
> network/disk buffering to work with xl.

We still need to consider the future when designing the libxl API since
that is stable and we avoid changing it once committed (there are
mechanisms to evolve the interface, see libxl.h, but they are best
avoided).

> 
> 
> > if this were done via libxl then it would naturally get
> > incorporated into Roger's work to make hotplug scripts etc work properly
> > with stub domains.
> >
> > Most of the other comments I had would become invalid/irrelevant when
> > this was moved to libxl, since you'd naturally end up doing things
> > differently anyway.
> >
> 
> Even if the code moved to libxl, most of the setup would remain the
> same, i.e. setup on demand
> using C code or something. It does not make sense to setup an IFB
> device for every domain
> right from the boot. You will eventually run out of ifb devices in the system.

Uhm, I don't think I suggested setting up an IFB at boot here.

> If you are suggesting that we invoke a hotplug script when "xl remus"
> command is issued,
> I dont mind doing so either. The code in libxl (to control the plug
> qdisc) is not going to go away
> in either case.

Would doing the setup in a shell script be easier/cleaner etc?

I'm still not sure I understand why finding an ifb and binding it to a
vif cannot be done by the vif hotplug script and the ifb communicated
back to the toolstack via xenstore.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
  2013-07-30 10:50       ` Ian Campbell
@ 2013-07-30 15:25         ` Shriram Rajagopalan
  2013-07-30 15:39           ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-07-30 15:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Stefano Stabellini, Roger Pau Monne, xen-devel

>
>> > Also, I'm not sure how this is supposed to work when driver domains are
>> > in use either,
>>
>> Remus wont work with driver domains, unless there are agents in each of the
>> driver domains, that are co-ordinated by the memory checkpoint code in dom0.
>
> That doesn't seem to be to hard to arrange. We are already thinking of a
> "xenbackendd" daemon which runs hotplug scripts in both dom0 and domU in
> order to reduce our dependency on udev (in dom0 the daemon may actually
> be libxl directly).
>
>> Irrespective of the hotplug scripts, Remus needs to control the IFB
>> network interface
>> attached to the Guest's vifs/tap devices. Given that all these
>> interfaces will be inside
>> a driver domain, which does not have a fast (not xenstore)
>> communication channel to dom0,
>> there is no way the memory checkpointing code can co-ordinate with the
>> driver domain in order
>> to buffer/release its network packets after each checkpoint.
>
> It doesn't seem implausible that the toolstack and that daemon might not
> also negotiate a memory-checkpoint evtchn or something like that for a
> given domain.
>
>> One alternative would be to have a network agent running inside each
>> of these driver domains,
>> assuming that the driver domains would have network access (practical ? ).
>> The memory checkpoint code would have to control the IFB devices via
>> the network agents.
>>
>> All this is in the long run.. The immediate goal is to get
>> network/disk buffering to work with xl.
>
> We still need to consider the future when designing the libxl API since
> that is stable and we avoid changing it once committed (there are
> mechanisms to evolve the interface, see libxl.h, but they are best
> avoided).
>

Assuming that we push all of the remus setup into libxl, to ease the pain off
other toolstacks, then the external API to activate/deactivate remus
will not change.
(Also assuming that libxl.h is the libxl API you are talking about).

The API is already there. libxl_domain_remus_start with the remus_info
structure. So, even if we add driver domain support later, external
toolstacks would
not be impacted. I will certainly add stub code to handle driver
domains. However,
adding any piece of mechanism in there would be premature unless we know how
xenbackendd is going to work and the nature of communication channels.

>> If you are suggesting that we invoke a hotplug script when "xl remus"
>> command is issued,
>> I dont mind doing so either. The code in libxl (to control the plug
>> qdisc) is not going to go away
>> in either case.
>
> Would doing the setup in a shell script be easier/cleaner etc?
>

Neither. The current C-code (xl_netbuf) can do everything that a shell
script does.
However, if we were to invoke an external "hotplug" style script, it
would give us
more flexibility.
For e.g., people could tweak the script to add other traffic shaping
stuff onto the VM's traffic.  Or, with openvswitch/SDN, other fun
stuff could be done.

> I'm still not sure I understand why finding an ifb and binding it to a
> vif cannot be done by the vif hotplug script and the ifb communicated
> back to the toolstack via xenstore.
>

To make sure we are on the same page:
 I am under the assumption that the vif hotplug script is invoked
during the domain boot.
 I am also assuming that you are talking about existing vif hotplug
scripts (vif-setup, vif-bridge, etc).

 So, if we lock onto an IFB and bind it to the vif (i.e. redirect all
egress traffic via the IFB),
 then what we are basically doing is to acquire an IFB device for
every vif in the VM, for every VM,
 that may be run under Remus sometime in future.
 This basically means we would be setting up an IFB device for every
domain right from boot
 and not when Remus is started for the domain.

In case you mean something like a "vif-remus" hotplug script that is
invoked when one invokes
libxl_domain_remus_start API, then, you are right. all the setup can
be done in this hotplug script
and the chosen ifbs can be communicated via xenstore back to libxl.

Hope that clears things up.


shriram

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown
  2013-07-30 15:25         ` Shriram Rajagopalan
@ 2013-07-30 15:39           ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2013-07-30 15:39 UTC (permalink / raw)
  To: rshriram; +Cc: Stefano Stabellini, Roger Pau Monne, xen-devel

On Tue, 2013-07-30 at 11:25 -0400, Shriram Rajagopalan wrote:
> >
> >> > Also, I'm not sure how this is supposed to work when driver domains are
> >> > in use either,
> >>
> >> Remus wont work with driver domains, unless there are agents in each of the
> >> driver domains, that are co-ordinated by the memory checkpoint code in dom0.
> >
> > That doesn't seem to be to hard to arrange. We are already thinking of a
> > "xenbackendd" daemon which runs hotplug scripts in both dom0 and domU in
> > order to reduce our dependency on udev (in dom0 the daemon may actually
> > be libxl directly).
> >
> >> Irrespective of the hotplug scripts, Remus needs to control the IFB
> >> network interface
> >> attached to the Guest's vifs/tap devices. Given that all these
> >> interfaces will be inside
> >> a driver domain, which does not have a fast (not xenstore)
> >> communication channel to dom0,
> >> there is no way the memory checkpointing code can co-ordinate with the
> >> driver domain in order
> >> to buffer/release its network packets after each checkpoint.
> >
> > It doesn't seem implausible that the toolstack and that daemon might not
> > also negotiate a memory-checkpoint evtchn or something like that for a
> > given domain.
> >
> >> One alternative would be to have a network agent running inside each
> >> of these driver domains,
> >> assuming that the driver domains would have network access (practical ? ).
> >> The memory checkpoint code would have to control the IFB devices via
> >> the network agents.
> >>
> >> All this is in the long run.. The immediate goal is to get
> >> network/disk buffering to work with xl.
> >
> > We still need to consider the future when designing the libxl API since
> > that is stable and we avoid changing it once committed (there are
> > mechanisms to evolve the interface, see libxl.h, but they are best
> > avoided).
> >
> 
> Assuming that we push all of the remus setup into libxl, to ease the pain off
> other toolstacks, then the external API to activate/deactivate remus
> will not change.
> (Also assuming that libxl.h is the libxl API you are talking about).
> 
> The API is already there. libxl_domain_remus_start with the remus_info
> structure. So, even if we add driver domain support later, external
> toolstacks would
> not be impacted. I will certainly add stub code to handle driver
> domains. However,
> adding any piece of mechanism in there would be premature unless we know how
> xenbackendd is going to work and the nature of communication channels.
> 
> >> If you are suggesting that we invoke a hotplug script when "xl remus"
> >> command is issued,
> >> I dont mind doing so either. The code in libxl (to control the plug
> >> qdisc) is not going to go away
> >> in either case.
> >
> > Would doing the setup in a shell script be easier/cleaner etc?
> >
> 
> Neither. The current C-code (xl_netbuf) can do everything that a shell
> script does.
> However, if we were to invoke an external "hotplug" style script, it
> would give us
> more flexibility.
> For e.g., people could tweak the script to add other traffic shaping
> stuff onto the VM's traffic.  Or, with openvswitch/SDN, other fun
> stuff could be done.
> 
> > I'm still not sure I understand why finding an ifb and binding it to a
> > vif cannot be done by the vif hotplug script and the ifb communicated
> > back to the toolstack via xenstore.
> >
> 
> To make sure we are on the same page:
>  I am under the assumption that the vif hotplug script is invoked
> during the domain boot.
>  I am also assuming that you are talking about existing vif hotplug
> scripts (vif-setup, vif-bridge, etc).

This is what I meant, but...
> 
>  So, if we lock onto an IFB and bind it to the vif (i.e. redirect all
> egress traffic via the IFB),
>  then what we are basically doing is to acquire an IFB device for
> every vif in the VM, for every VM,
>  that may be run under Remus sometime in future.
>  This basically means we would be setting up an IFB device for every
> domain right from boot
>  and not when Remus is started for the domain.

... this.

For some reason I thought that remus domains were started as such and
had the inherent property, which is obviously not the case, and I knew
that really.

> In case you mean something like a "vif-remus" hotplug script that is
> invoked when one invokes
> libxl_domain_remus_start API,

Or a new "action" for the existing vif-foo, but yes. Depends on whether
routed remus is different to bridged remus I suppose.

>  then, you are right. all the setup can
> be done in this hotplug script
> and the chosen ifbs can be communicated via xenstore back to libxl.

This might be worth it for the flexibility etc you mention above. Anyone
got any other thoughts?

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0 of 4 RFC] xl - Remus network buffering support
  2013-07-25  7:09 [PATCH 0 of 4 RFC] xl - Remus network buffering support Shriram Rajagopalan
                   ` (3 preceding siblings ...)
  2013-07-25  7:09 ` [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile Shriram Rajagopalan
@ 2013-07-30 16:11 ` Roger Pau Monné
  2013-07-31  8:33   ` Ian Campbell
  4 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2013-07-30 16:11 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On 25/07/13 09:09, Shriram Rajagopalan wrote:
> This patch series adds support for network buffering in the Remus
> codebase in xl/libxl. In previous emails, I had proposed for a script
> invocation to setup network buffering. After digging through libnl API,
> I managed to find most of what I needed (except for one command, which
> right now is executed through system() call).
> 
> The patch series in its current state would allow xl to dynamically setup
> and teardown buffering devices, qdiscs, etc associated with the guest,
> instead of resorting to clunky one time configurations.
> 
> The series is organized as follows:
>  1/4 - Network buffering setup functions - abstractions built on top of libnl3 API
>        to implement functionality such as add/delete qdisc, interface up/down,
>        search for free ifb devices, etc.           
> 
>  2/4 - xl cmdline utility uses these abstractions to setup network buffers and
>        provides libxl with a list of ifb devices where packets would be buffered
> 
>  3/4 - Libxl interaction with network buffer module in the kernel via libnl3.
> 
>  4/4 - adds libnl3 (>= v3.2.17) dependency to autoconf scripts and linker flags
>        in tools/libxl/Makefile. 
> 
> Functionality tested on debian squeeze (kernel 3.4) + openvswitch + 64-bit PV domU (kernel 3.4).

I've just skimmed quickly through the patches, but it seems like they
add a bunch of Linux specific code to general parts of libxl, which
would prevent building or using libxl in any OS different than Linux. If
you have to add Linux specific code there's libxl_linux.c, but please
remember to also add the necessary stubs to libxl_netbsd.c in order to
not break libxl for OSes different than Linux.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0 of 4 RFC] xl - Remus network buffering support
  2013-07-30 16:11 ` [PATCH 0 of 4 RFC] xl - Remus network buffering support Roger Pau Monné
@ 2013-07-31  8:33   ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2013-07-31  8:33 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Shriram Rajagopalan, Stefano Stabellini, xen-devel

On Tue, 2013-07-30 at 18:11 +0200, Roger Pau Monné wrote:
> I've just skimmed quickly through the patches, but it seems like they
> add a bunch of Linux specific code to general parts of libxl, which
> would prevent building or using libxl in any OS different than Linux. If
> you have to add Linux specific code there's libxl_linux.c, but please
> remember to also add the necessary stubs to libxl_netbsd.c in order to
> not break libxl for OSes different than Linux.

Good point.

Since this looks like it might be a reasonable amount of code I think it
might be better to put it into a more semantic rather than OS based
location, e..g libxl_netlink.c or libxl_remus.c and the put the stubs in
libxl_no<xxx>.c and conditionally compile the right one (based on OS).

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
  2013-07-25  7:09 ` [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks Shriram Rajagopalan
  2013-07-29 16:06   ` Ian Campbell
@ 2013-08-07 15:38   ` Ian Jackson
  2013-08-07 21:51     ` Shriram Rajagopalan
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2013-08-07 15:38 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Stefano Stabellini, xen-devel

Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus	callbacks"):
> This patch constitutes the core network buffering logic.
> Libxl would receive a list of IFB devices that collectively act as
> network buffering devices, for a given guest. Also, libxl expects that
> every ifb device in the list supplied by a toolstack (netbuf_iflist)
> should have a plug qdisc installed.

This seems to have many of the required pieces and is going in the
right direction.

> diff -r 3cd67f6ff63a -r bef729fc4336 tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c	Thu Jul 25 00:02:19 2013 -0700
> +++ b/tools/libxl/libxl_dom.c	Thu Jul 25 00:02:22 2013 -0700
> @@ -20,6 +20,7 @@
>  #include "libxl_internal.h"
>  #include "libxl_arch.h"
>  
> +#include <netlink/route/qdisc/plug.h>

We need a configure option to disable this, in case the host doesn't
have it.

>  #include <xc_dom.h>
>  #include <xen/hvm/hvm_info_table.h>
>  #include <xen/hvm/hvm_xs_strings.h>
> @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid
>  
>  /*----- remus callbacks -----*/
>  
> +/* REMUS TODO: Issue disk checkpoint reqs. */
>  static int libxl__remus_domain_suspend_callback(void *data)
...
> +    if (!remus_ctx->num_netbufs) return is_suspended;
> +
> +    if (is_suspended) {
> +        for (i = 0; i < remus_ctx->num_netbufs; ++i) {
> +            ret = rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]);
> +            if (!ret)
> +                ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],

Needs the line length reducing to 70-75 characters, maximum.
(Multiple occcurrences of this problem in the patch.)

> @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi
>  static void remus_checkpoint_dm_saved(libxl__egc *egc,
>                                        libxl__domain_suspend_state *dss, int rc)
>  {
> -    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
> -    /* REMUS TODO: make this asynchronous */
> -    assert(!rc); /* REMUS TODO handle this error properly */
...
> +    assert(!rc);

Does this TODO not need to remain ?

> +            if (!ret)
> +                ret = rtnl_qdisc_add(remus_ctx->nlsock, remus_ctx->netbuf_qdisc_list[i],
> +                                     NLM_F_REQUEST);
> +            if (ret) {
> +                LOG(ERROR, "Cannot release buffer from %s:%s",
> +                    dss->remus->netbuf_iflist[i], nl_geterror(ret));
> +                ret= 0;

Missing space after "=".

...
> +    usleep(dss->remus_ctx->interval * 1000);

This is still pretty bad, surely ?

> +static libxl__remus_ctx *setup_remus_ctx(libxl__gc *gc,
> +                                         const libxl_domain_remus_info *r_info)
> +{
...
> +    libxl_string_list l;
...
> +    l = r_info->netbuf_iflist;

Is this a convenience shorthand ?  If so I think it might be better to
say
   const libxl_string_list *l = &r_info->netbuf_iflist;

...
> +    nl_connect(nlsock, NETLINK_ROUTE);

Does this not return an error value ?

> +    if (rtnl_qdisc_alloc_cache(nlsock, &qdisc_cache) < 0) {
> +        LOG(ERROR, "setup_remus_ctx: failed to allocate qdisc cache");
> +        goto end;
> +    }
> +
> +    remus_ctx->netbuf_qdisc_list = libxl__calloc(gc, num_netbufs + 1, 
> +                                                 sizeof(struct rtnl_qdisc *));

Why num_netbufs+1 ?

> +    remus_ctx->num_netbufs = num_netbufs;
> +    remus_ctx->nlsock = nlsock;
> +    remus_ctx->qdisc_cache = qdisc_cache;

It would probably be better to use the fields in the remus_ctx
directly, rather than having a separate set of local variables.

> +    ifb = rtnl_link_alloc();
> +
> +    for (i = 0; i < num_netbufs; ++i) {
> +
> +        if (rtnl_link_get_kernel(nlsock, 0, l[i], &ifb) < 0) {
> +            LOG(ERROR, "setup_remus_ctx: cannot obtain handle for %s", l[i]);
> +            goto end;

Do these functions not provide errno values, or the moral equivalent ?

> +        }
> +
> +        ifindex = rtnl_link_get_ifindex(ifb);
> +        if (!ifindex) {
> +            LOG(ERROR, "setup_remus_ctx: invalid interface %s", l[i]);
> +            goto end;
> +        }
> +
> +        qdisc = rtnl_qdisc_get_by_parent(qdisc_cache, ifindex, TC_H_ROOT);
> +        if (!qdisc || strcmp(rtnl_tc_get_kind(TC_CAST(qdisc)), "plug")) {

Can rtnl_tc_get_kind fail ?

> +            LOG(ERROR, "setup_remus_ctx: plug qdisc is not installed on %s", l[i]);
> +            goto end;
> +        }
> +
> +        remus_ctx->netbuf_qdisc_list[i] = qdisc;
> +    }
> +
> +    rtnl_link_put(ifb);
> +    return remus_ctx;
> +
> + end:
> +    if (ifb) rtnl_link_put(ifb);
> +    if (qdisc_cache) nl_cache_free(qdisc_cache);
> +    if (nlsock) nl_close(nlsock);

I think this can perhaps leak remus_ctx and qdisc.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
  2013-07-29 16:06   ` Ian Campbell
@ 2013-08-07 15:41     ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2013-08-07 15:41 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Shriram Rajagopalan, Stefano Stabellini, xen-devel

Ian Campbell writes ("Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks"):
> This really needs Ian's review WRT the lifetime of the remus context
> object vs. the ao machinery. IOW I'm not sure if this is the appropriate
> gc to be using or if one associated with the ao context isn't required.

AFAICT the remus_ctx is allocated from the ao's gc because of this:

> >  void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
...
         STATE_AO_GC(dss->ao);
...
> > +        dss->remus_ctx = setup_remus_ctx(gc, r_info);

So "gc" in libxl__domain_suspend is the ao gc, and that ends up being
used for everything.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
  2013-08-07 15:38   ` Ian Jackson
@ 2013-08-07 21:51     ` Shriram Rajagopalan
  2013-08-08 11:07       ` Ian Jackson
  0 siblings, 1 reply; 27+ messages in thread
From: Shriram Rajagopalan @ 2013-08-07 21:51 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2967 bytes --]

On Wed, Aug 7, 2013 at 11:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> >
> > +#include <netlink/route/qdisc/plug.h>
>
> We need a configure option to disable this, in case the host doesn't
> have it.
>
>
>  #include <xc_dom.h>
> >  #include <xen/hvm/hvm_info_table.h>
> >  #include <xen/hvm/hvm_xs_strings.h>
> > @@ -1212,12 +1213,36 @@ int libxl__toolstack_save(uint32_t domid
> >
> >  /*----- remus callbacks -----*/
> >
> > +/* REMUS TODO: Issue disk checkpoint reqs. */
> >  static int libxl__remus_domain_suspend_callback(void *data)
> ...
> > +    if (!remus_ctx->num_netbufs) return is_suspended;
> > +
> > +    if (is_suspended) {
> > +        for (i = 0; i < remus_ctx->num_netbufs; ++i) {
> > +            ret =
> rtnl_qdisc_plug_buffer(remus_ctx->netbuf_qdisc_list[i]);
> > +            if (!ret)
> > +                ret = rtnl_qdisc_add(remus_ctx->nlsock,
> remus_ctx->netbuf_qdisc_list[i],
>
> Needs the line length reducing to 70-75 characters, maximum.
> (Multiple occcurrences of this problem in the patch.)
>
> > @@ -1255,13 +1279,104 @@ static void libxl__remus_domain_checkpoi
> >  static void remus_checkpoint_dm_saved(libxl__egc *egc,
> >                                        libxl__domain_suspend_state *dss,
> int rc)
> >  {
> > -    /* REMUS TODO: Wait for disk and memory ack, release network buffer
> */
> > -    /* REMUS TODO: make this asynchronous */
> > -    assert(!rc); /* REMUS TODO handle this error properly */
> ...
> > +    assert(!rc);
>
> Does this TODO not need to remain ?
>
> > +            if (!ret)
> > +                ret = rtnl_qdisc_add(remus_ctx->nlsock,
> remus_ctx->netbuf_qdisc_list[i],
> > +                                     NLM_F_REQUEST);
> > +            if (ret) {
> > +                LOG(ERROR, "Cannot release buffer from %s:%s",
> > +                    dss->remus->netbuf_iflist[i], nl_geterror(ret));
> > +                ret= 0;
>
> Missing space after "=".
>
> ...
> > +    usleep(dss->remus_ctx->interval * 1000);
>
> This is still pretty bad, surely ?
>
>
Was thinking of updating this in a separate patch.. But I guess since this
line
is changing, might as well do it in this patch.


> > +    rtnl_link_put(ifb);
> > +    return remus_ctx;
> > +
> > + end:
> > +    if (ifb) rtnl_link_put(ifb);
> > +    if (qdisc_cache) nl_cache_free(qdisc_cache);
> > +    if (nlsock) nl_close(nlsock);
>
> I think this can perhaps leak remus_ctx and qdisc.
>
>
There is no issue of leaking qdisc. Because it simply obtains references
from
qdisc_cache, which is freed if things fail.

WRT remus_ctx, I had intentionally skipped the free call.
 Its allocated in the GC context. [the same one used by domain_suspend,
etc].
And if this call fails, then the parent call libxl_domain_remus_start will
fail.

I assumed that when libxl_free_all was called at this point, remus_ctx
would also be
freed. Am I missing something in the overall control flow, that would cause
this leak ?

Thanks,
> Ian.
>
>

[-- Attachment #1.2: Type: text/html, Size: 4468 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks
  2013-08-07 21:51     ` Shriram Rajagopalan
@ 2013-08-08 11:07       ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2013-08-08 11:07 UTC (permalink / raw)
  To: rshriram; +Cc: Stefano Stabellini, xen-devel

Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks"):
> On Wed, Aug 7, 2013 at 11:38 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

>     > +    usleep(dss->remus_ctx->interval * 1000);
> 
>     This is still pretty bad, surely ?
...
> Was thinking of updating this in a separate patch.. But I guess since this line
> is changing, might as well do it in this patch.

I'm happy with it in a separate patch.  Doing it in a separate patch
probably makes things clearer.  This patch does have a TODO about it.

>     > +    rtnl_link_put(ifb);
>     > +    return remus_ctx;
>     > +
>     > + end:
>     > +    if (ifb) rtnl_link_put(ifb);
>     > +    if (qdisc_cache) nl_cache_free(qdisc_cache);
>     > +    if (nlsock) nl_close(nlsock);
> 
>     I think this can perhaps leak remus_ctx and qdisc.
...
> There is no issue of leaking qdisc. Because it simply obtains references from
> qdisc_cache, which is freed if things fail. 

Perhaps this is worth a comment ?

> WRT remus_ctx, I had intentionally skipped the free call. 
>  Its allocated in the GC context. [the same one used by domain_suspend, etc].
> And if this call fails, then the parent call libxl_domain_remus_start will
> fail.

Sorry, yes.

> I assumed that when libxl_free_all was called at this point, remus_ctx would
> also be
> freed. Am I missing something in the overall control flow, that would cause
> this leak ?

No, I was confused.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-08-08 11:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-25  7:09 [PATCH 0 of 4 RFC] xl - Remus network buffering support Shriram Rajagopalan
2013-07-25  7:09 ` [PATCH 1 of 4 RFC] xl/remus : Network buffering setup helper functions Shriram Rajagopalan
2013-07-29 15:42   ` Ian Campbell
2013-07-29 18:00     ` Shriram Rajagopalan
2013-07-30 10:44       ` Ian Campbell
2013-07-25  7:09 ` [PATCH 2 of 4 RFC] xl/remus: Network buffering cmdline switch, setup/teardown Shriram Rajagopalan
2013-07-29 15:49   ` Ian Campbell
2013-07-29 19:00     ` Shriram Rajagopalan
2013-07-30 10:50       ` Ian Campbell
2013-07-30 15:25         ` Shriram Rajagopalan
2013-07-30 15:39           ` Ian Campbell
2013-07-25  7:09 ` [PATCH 3 of 4 RFC] xl/remus: Control network buffering in remus callbacks Shriram Rajagopalan
2013-07-29 16:06   ` Ian Campbell
2013-08-07 15:41     ` Ian Jackson
2013-08-07 15:38   ` Ian Jackson
2013-08-07 21:51     ` Shriram Rajagopalan
2013-08-08 11:07       ` Ian Jackson
2013-07-25  7:09 ` [PATCH 4 of 4 RFC] xl/remus: Add libnl3 dependency to autoconf scripts and libxl/Makefile Shriram Rajagopalan
2013-07-26  9:44   ` Wen Congyang
2013-07-26 13:51     ` Shriram Rajagopalan
2013-07-26  9:56   ` David Vrabel
2013-07-26 13:56     ` Shriram Rajagopalan
2013-07-29  5:58       ` Wen Congyang
2013-07-29 13:07         ` Shriram Rajagopalan
2013-07-29 15:41           ` David Vrabel
2013-07-30 16:11 ` [PATCH 0 of 4 RFC] xl - Remus network buffering support Roger Pau Monné
2013-07-31  8:33   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.