All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] xdp: Generalize XDP
@ 2016-09-20 22:00 Tom Herbert
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 22:00 UTC (permalink / raw)
  To: davem, netdev
  Cc: kernel-team, tariqt, bblanco, alexei.starovoitov, eric.dumazet, brouer

This patch set generalizes XDP by make the hooks in drivers to be
generic in the same manner of nfhooks. This has a number of
advantages:

  - Allows alternative users of the XDP hooks other than the original
    BPF
  - Allows a means to pipeline XDP programs together
  - Reduces the amount of code and complexity needed in drivers to
    manage XDP
  - Provides a more structured environment that is extensible to new
    features while being mostly transparent to the drivers 

The generic XDP infrastructure is based on how nfhooks works. The new
xdp_hook_ops structure contains callback functions and private data
structure that can be populated by the user of XDP. The hook ops are
registered either on a netdev or a napi (both maintain a list of XDP
hook ops). Allow per netdev ops makes management of XDP a lot simpler
when the intent is for the hook to apply to the whole driver (as is the
case with XDP_BPF so far). The downside is that we may need per napi
data (such as counters of returned actions).

The xdp_hook_ops contains three fields of interest. The "hook" field is
the function that is run for the hook. This takes a private data field
and the xdp_buff as arguments. "priv" is private data and "put_priv"
is a function called when XDP is done with the private data. In XDP_BPF
terminology the hook field is bpf_prog_run_xdp, "priv" is the xdp_prog,
and "put_priv" is bpf_prog_put.

The meaning of ndo_xdp is also changed. There are two commands for this
nod: XDP_DEV_INIT and XDP_DEV_FINISH. XDP_DEV_INIT is called the first
time an XDP hook is set on a device, this is primarily intended to
allow the device to initialize XDP (allocated the XDP TX queues for
instance). XDP_DEV_FINISH is called when the last XDP hook is
removed from a driver so that the driver can cleanup when XDP is done.

A new net feature is added NETIF_F_XDP so that a driver indicates
that is supports XDP.

The primary modification to a driver to support XDP is that it call
xdp_hook_run in the receive path (equivalent to bpf_prog_run in
previous XDP-BPF). The driver must deal with the four XDP return
actions XDP_PASS, XDP_DROP, XDP_TX, and XDP_ABORT.

xdp.h contains the interface to register and manage XDP hooks.

Tested:

Created a simple hook that does XDP_PASS and saw it works. A lot more
testing is needed for this.

Tom Herbert (3):
  xdp: Infrastructure to generalize XDP
  mlx4: Change XDP/BPF to use generic XDP infrastructure
  netdevice: Remove obsolete xdp_netdev_command

 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  64 ++------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |  25 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   1 -
 include/linux/filter.h                         |  19 +--
 include/linux/netdev_features.h                |   3 +-
 include/linux/netdevice.h                      |  24 ++-
 include/net/xdp.h                              | 218 +++++++++++++++++++++++++
 include/uapi/linux/bpf.h                       |  20 ---
 include/uapi/linux/xdp.h                       |  24 +++
 net/core/Makefile                              |   2 +-
 net/core/dev.c                                 |  44 ++++-
 net/core/filter.c                              |   7 +-
 net/core/rtnetlink.c                           |  16 +-
 net/core/xdp.c                                 | 211 ++++++++++++++++++++++++
 14 files changed, 534 insertions(+), 144 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 include/uapi/linux/xdp.h
 create mode 100644 net/core/xdp.c

-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:00 [PATCH RFC 0/3] xdp: Generalize XDP Tom Herbert
@ 2016-09-20 22:00 ` Tom Herbert
  2016-09-20 22:37   ` Eric Dumazet
                     ` (3 more replies)
  2016-09-20 22:00 ` [PATCH RFC 2/3] mlx4: Change XDP/BPF to use generic XDP infrastructure Tom Herbert
  2016-09-20 22:00 ` [PATCH RFC 3/3] netdevice: Remove obsolete xdp_netdev_command Tom Herbert
  2 siblings, 4 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 22:00 UTC (permalink / raw)
  To: davem, netdev
  Cc: kernel-team, tariqt, bblanco, alexei.starovoitov, eric.dumazet, brouer

This patch creates an infrastructure for registering and running code at
XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
heavily from the techniques used by netfilter to make generic nfhooks.

An XDP hook is defined by the  xdp_hook_ops. This structure contains the
ops of an XDP hook. A pointer to this structure is passed into the XDP
register function to set up a hook. The XDP register function mallocs
its own xdp_hook_ops structure and copies the values from the
xdp_hook_ops passed in. The register function also stores the pointer
value of the xdp_hook_ops argument; this pointer is used in subsequently
calls to XDP to identify the registered hook.

The interface is defined in net/xdp.h. This includes the definition of
xdp_hook_ops, functions to register and unregister hook ops on a device
or individual instances of napi, and xdp_hook_run that is called by
drivers to run the hooks.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/filter.h          |   6 +-
 include/linux/netdev_features.h |   3 +-
 include/linux/netdevice.h       |  11 ++
 include/net/xdp.h               | 218 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h        |  20 ----
 include/uapi/linux/xdp.h        |  24 +++++
 net/core/Makefile               |   2 +-
 net/core/dev.c                  |   4 +
 net/core/xdp.c                  | 211 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 472 insertions(+), 27 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 include/uapi/linux/xdp.h
 create mode 100644 net/core/xdp.c

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..2a26133 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -16,6 +16,7 @@
 #include <linux/capability.h>
 
 #include <net/sch_generic.h>
+#include <net/xdp.h>
 
 #include <asm/cacheflush.h>
 
@@ -432,11 +433,6 @@ struct bpf_skb_data_end {
 	void *data_end;
 };
 
-struct xdp_buff {
-	void *data;
-	void *data_end;
-};
-
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf and act_bpf programs
  */
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 9c6c8ef..697fdea 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -72,8 +72,8 @@ enum {
 	NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
 	NETIF_F_HW_L2FW_DOFFLOAD_BIT,	/* Allow L2 Forwarding in Hardware */
 	NETIF_F_BUSY_POLL_BIT,		/* Busy poll */
-
 	NETIF_F_HW_TC_BIT,		/* Offload TC infrastructure */
+	NETIF_F_XDP_BIT,		/* Support XDP interface */
 
 	/*
 	 * Add your fresh new feature above and remember to update
@@ -136,6 +136,7 @@ enum {
 #define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL	__NETIF_F(BUSY_POLL)
 #define NETIF_F_HW_TC		__NETIF_F(HW_TC)
+#define NETIF_F_XDP		__NETIF_F(XDP)
 
 #define for_each_netdev_feature(mask_addr, bit)	\
 	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a10d8d1..f2b7d1b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,6 +324,7 @@ struct napi_struct {
 	struct sk_buff		*skb;
 	struct hrtimer		timer;
 	struct list_head	dev_list;
+	struct list_head	xdp_hook_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
 };
@@ -819,6 +820,14 @@ enum xdp_netdev_command {
 	 * return true if a program is currently attached and running.
 	 */
 	XDP_QUERY_PROG,
+	/* Initialize XDP in the device. Called the first time an XDP hook
+	 * hook is being set on the device.
+	 */
+	XDP_DEV_INIT,
+	/* XDP is finished on the device. Called after the last XDP hook
+	 * has been removed from a device.
+	 */
+	XDP_DEV_FINISH,
 };
 
 struct netdev_xdp {
@@ -1663,6 +1672,8 @@ struct net_device {
 	struct list_head	close_list;
 	struct list_head	ptype_all;
 	struct list_head	ptype_specific;
+	struct list_head        xdp_hook_list;
+	unsigned int		xdp_hook_cnt;
 
 	struct {
 		struct list_head upper;
diff --git a/include/net/xdp.h b/include/net/xdp.h
new file mode 100644
index 0000000..c01a44e
--- /dev/null
+++ b/include/net/xdp.h
@@ -0,0 +1,218 @@
+/*
+ * eXpress Data Path (XDP)
+ *
+ * Copyright (c) 2016 Tom Herbert <tom@herbertland.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __NET_XDP_H_
+#define __NET_XDP_H_
+
+#include <linux/netdevice.h>
+#include <linux/static_key.h>
+#include <uapi/linux/xdp.h>
+
+/* XDP data structure.
+ *
+ * Fields:
+ *   data - pointer to first byte of data
+ *   data_end - pointer to last byte
+ *
+ * Length is deduced by xdp->data_end - xdp->data.
+ */
+struct xdp_buff {
+	void *data;
+	void *data_end;
+};
+
+typedef unsigned int xdp_hookfn(const void *priv, struct xdp_buff *xdp);
+typedef void xdp_put_privfn(const void *priv);
+
+/* xdp_hook_ops struct
+ *
+ * This structure contains the ops of an XDP hook. A pointer to this structure
+ * is passed into the XDP register function to set up a hook. The XDP
+ * register function mallocs its own xdp_hook_ops structure and copies the
+ * values from the xdp_hook_ops passed in. The register function also stores
+ * the pointer value of the xdp_hook_ops argument; this pointer is used
+ * in subsequently calls to XDP to identify the registered hook.
+ *
+ * Fields:
+ *
+ *   list - list glue
+ *   priority - priority for insertion into list. List is ordered lowest to
+ *	greatest priority.
+ *   priv - private data associated with hook. This is passed as an argument
+ *	to the hook function
+ *   hook - function to call when hooks are run
+ *   put_priv - function call when XDP is done with private data
+ */
+struct xdp_hook_ops {
+	struct list_head list;
+	int priority;
+	void __rcu *priv;
+	xdp_hookfn *hook;
+	xdp_put_privfn *put_priv;
+};
+
+struct xdp_hook_entry {
+	const struct xdp_hook_ops *orig_ops;
+	struct xdp_hook_ops ops;
+};
+
+extern struct xdp_hook_ops xdp_bpf_hook_ops;
+
+extern struct static_key_false xdp_hooks_needed;
+
+/* Check if XDP hooks are set for a napi or its device */
+static inline bool xdp_hook_run_needed_check(struct napi_struct *napi)
+{
+	return (static_branch_unlikely(&xdp_hooks_needed) &&
+		(!(list_empty(&napi->dev->xdp_hook_list) &&
+		   list_empty(&napi->xdp_hook_list))));
+}
+
+static inline int __xdp_hook_run(struct list_head *list_head,
+				 struct xdp_buff *xdp)
+{
+	struct xdp_hook_ops *elem;
+	int ret = XDP_PASS;
+
+	list_for_each_entry(elem, list_head, list) {
+		ret = elem->hook(elem->priv, xdp);
+		if (ret != XDP_PASS)
+			break;
+	}
+
+	return ret;
+}
+
+/* Run the XDP hooks for a napi device. Called from a driver's receive
+ * routine
+ */
+static inline int xdp_hook_run(struct napi_struct *napi, struct xdp_buff *xdp)
+{
+	struct net_device *dev = napi->dev;
+	int ret = XDP_PASS;
+
+	if (static_branch_unlikely(&xdp_hooks_needed)) {
+		/* Run hooks in napi first */
+		ret = __xdp_hook_run(&napi->xdp_hook_list, xdp);
+		if (ret != XDP_PASS)
+			return ret;
+
+		/* Now run device hooks */
+		ret = __xdp_hook_run(&dev->xdp_hook_list, xdp);
+		if (ret != XDP_PASS)
+			return ret;
+	}
+
+	return ret;
+}
+
+int __xdp_register_hook(struct net_device *dev,
+			struct list_head *list,
+			const struct xdp_hook_ops *reg,
+			bool change);
+
+/* Register an XDP hook and a device */
+static inline int xdp_register_dev_hook(struct net_device *dev,
+					const struct xdp_hook_ops *reg)
+{
+	return __xdp_register_hook(dev, &dev->xdp_hook_list, reg, false);
+}
+
+/* Register an XDP hook and a napi instance */
+static inline int xdp_register_napi_hook(struct napi_struct *napi,
+					 const struct xdp_hook_ops *reg)
+{
+	return __xdp_register_hook(napi->dev, &napi->xdp_hook_list, reg, false);
+}
+
+/* Change an XDP hook.
+ *
+ *    - If the hook does not exist (xdp_hook_ops does not match a hook set on
+ *      the device), then attempt to register the hook.
+ *    - Else, change the private data (priv field in xdp_hook_ops) in the
+ *      existing hook to be the new one (in reg). All the other fields in
+ *      xdp_hook_ops are ignored in that case.
+ */
+
+/* Change a device XDP hook */
+static inline int xdp_change_dev_hook(struct net_device *dev,
+				      const struct xdp_hook_ops *reg)
+{
+	return __xdp_register_hook(dev, &dev->xdp_hook_list, reg, true);
+}
+
+/* Change a napi XDP hook */
+static inline int xdp_change_napi_hook(struct napi_struct *napi,
+				       const struct xdp_hook_ops *reg)
+{
+	return __xdp_register_hook(napi->dev, &napi->xdp_hook_list, reg, true);
+}
+
+void __xdp_unregister_hook(struct net_device *dev,
+			   struct list_head *list,
+			   const struct xdp_hook_ops *reg);
+
+/* Unregister device XDP hook */
+static inline void xdp_unregister_dev_hook(struct net_device *dev,
+					   const struct xdp_hook_ops *reg)
+{
+	return __xdp_unregister_hook(dev, &dev->xdp_hook_list, reg);
+}
+
+/* Unregister a napi XDP hook */
+static inline void xdp_unregister_napi_hook(struct napi_struct *napi,
+					    const struct xdp_hook_ops *reg)
+{
+	return __xdp_unregister_hook(napi->dev, &napi->xdp_hook_list, reg);
+}
+
+/* Unregister all XDP hooks associated with a device (both the device hooks
+ * and hooks on all napi instances. This function is called when the netdev
+ * is being freed.
+ */
+void xdp_unregister_all_hooks(struct net_device *dev);
+
+/* Unregister all XDP hooks for a given xdp_hook_ops in a net. This walks
+ * all devices in net and napis for each device to unregister matching hooks.
+ * This can be called when a module that had registered some number of hooks
+ * is being unloaded.
+ */
+void xdp_unregister_net_hooks(struct net *net, struct xdp_hook_ops *reg);
+
+/* Find a registered device hook.
+ *   - If hook is found *ret is set to the values in the registered hook and
+ *     true is returned.
+ *   - Else false is returned.
+ */
+bool __xdp_find_hook(struct list_head *list, const struct xdp_hook_ops *reg,
+		     struct xdp_hook_ops *ret);
+
+/* Find a device XDP hook */
+static inline bool xdp_find_dev_hook(struct net_device *dev,
+				     const struct xdp_hook_ops *reg,
+				     struct xdp_hook_ops *ret)
+{
+	return __xdp_find_hook(&dev->xdp_hook_list, reg, ret);
+}
+
+/* Find a napi XDP hook */
+static inline bool xdp_find_napi_hook(struct napi_struct *napi,
+				      const struct xdp_hook_ops *reg,
+				      struct xdp_hook_ops *ret)
+{
+	return __xdp_find_hook(&napi->xdp_hook_list, reg, ret);
+}
+
+static inline void xdp_warn_invalid_action(u32 act)
+{
+	WARN_ONCE(1, "Illegal XDP return value %u, expect packet loss\n", act);
+}
+
+#endif /* __NET_XDP_H_ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f896dfa..1a143d3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -473,24 +473,4 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
-/* User return codes for XDP prog type.
- * A valid XDP program must return one of these defined values. All other
- * return codes are reserved for future use. Unknown return codes will result
- * in packet drop.
- */
-enum xdp_action {
-	XDP_ABORTED = 0,
-	XDP_DROP,
-	XDP_PASS,
-	XDP_TX,
-};
-
-/* user accessible metadata for XDP packet hook
- * new fields must be added to the end of this structure
- */
-struct xdp_md {
-	__u32 data;
-	__u32 data_end;
-};
-
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/include/uapi/linux/xdp.h b/include/uapi/linux/xdp.h
new file mode 100644
index 0000000..f3002ca
--- /dev/null
+++ b/include/uapi/linux/xdp.h
@@ -0,0 +1,24 @@
+#ifndef _UAPI__LINUX_XDP_H__
+#define _UAPI__LINUX_XDP_H__
+
+/* User return codes for XDP prog type.
+ * A valid XDP program must return one of these defined values. All other
+ * return codes are reserved for future use. Unknown return codes will result
+ * in packet drop.
+ */
+enum xdp_action {
+	XDP_ABORTED = 0,
+	XDP_DROP,
+	XDP_PASS,
+	XDP_TX,
+};
+
+/* user accessible metadata for XDP packet hook
+ * new fields must be added to the end of this structure
+ */
+struct xdp_md {
+	__u32 data;
+	__u32 data_end;
+};
+
+#endif /* _UAPI__LINUX_XDP_H__ */
diff --git a/net/core/Makefile b/net/core/Makefile
index c0a0208..0d2d8ca 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
 obj-y		     += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
 			neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
-			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o
+			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o xdp.o
 
 obj-$(CONFIG_XFRM) += flow.o
 obj-y += net-sysfs.o
diff --git a/net/core/dev.c b/net/core/dev.c
index 9dbece2..0d2c826 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -141,6 +141,7 @@
 #include <linux/netfilter_ingress.h>
 #include <linux/sctp.h>
 #include <linux/crash_dump.h>
+#include <net/xdp.h>
 
 #include "net-sysfs.h"
 
@@ -5079,6 +5080,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
 	INIT_LIST_HEAD(&napi->poll_list);
+	INIT_LIST_HEAD(&napi->xdp_hook_list);
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
 	napi->gro_count = 0;
@@ -7647,6 +7649,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->all_adj_list.lower);
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
+	INIT_LIST_HEAD(&dev->xdp_hook_list);
 #ifdef CONFIG_NET_SCHED
 	hash_init(dev->qdisc_hash);
 #endif
@@ -7706,6 +7709,7 @@ void free_netdev(struct net_device *dev)
 	struct napi_struct *p, *n;
 
 	might_sleep();
+	xdp_unregister_all_hooks(dev);
 	netif_free_tx_queues(dev);
 #ifdef CONFIG_SYSFS
 	kvfree(dev->_rx);
diff --git a/net/core/xdp.c b/net/core/xdp.c
new file mode 100644
index 0000000..815ead8
--- /dev/null
+++ b/net/core/xdp.c
@@ -0,0 +1,211 @@
+/*
+ * Kernel Connection Multiplexor
+ *
+ * Copyright (c) 2016 Tom Herbert <tom@herbertland.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+#include <net/xdp.h>
+
+DEFINE_STATIC_KEY_FALSE(xdp_hooks_needed);
+EXPORT_SYMBOL(xdp_hooks_needed);
+
+static DEFINE_MUTEX(xdp_hook_mutex);
+
+/* Mutex xdp_hook_mutex must be held */
+static int __xdp_register_one_hook(struct net_device *dev,
+				   struct list_head *hook_list,
+				   struct xdp_hook_entry *entry,
+				   struct xdp_hook_ops *prev_elem)
+{
+	int err;
+
+	/* Check if we driver XDP needs initialization */
+	if (!dev->xdp_hook_cnt && dev->netdev_ops->ndo_xdp) {
+		struct netdev_xdp xdp_op = {};
+
+		xdp_op.command = XDP_DEV_INIT;
+		err = dev->netdev_ops->ndo_xdp(dev, &xdp_op);
+		if (err)
+			return err;
+	}
+
+	list_add_rcu(&entry->ops.list, prev_elem->list.prev);
+	static_branch_inc(&xdp_hooks_needed);
+	dev->xdp_hook_cnt++;
+
+	return 0;
+}
+
+int __xdp_register_hook(struct net_device *dev,
+			struct list_head *hook_list,
+			const struct xdp_hook_ops *reg,
+			bool change)
+{
+	struct xdp_hook_entry *entry;
+	struct xdp_hook_ops *elem, *prevelem = NULL;
+	int err;
+
+	mutex_lock(&xdp_hook_mutex);
+
+	/* Walk list, see if hook is already registered and determin insertion
+	 * point.
+	 */
+	list_for_each_entry(elem, hook_list, list) {
+		struct xdp_hook_entry *tent;
+
+		tent = container_of(elem, struct xdp_hook_entry, ops);
+		if (tent->orig_ops == reg) {
+			if (change) {
+				void *old_priv;
+
+				/* Only allow changing priv field in an existing
+				 * hook.
+				 */
+				old_priv = rcu_dereference_protected(elem->priv,
+					lockdep_is_held(&xdp_hook_mutex));
+				rcu_assign_pointer(elem->priv, reg->priv);
+				if (old_priv && elem->put_priv)
+					elem->put_priv(old_priv);
+				err = 0;
+				goto out;
+			} else {
+				/* Already registered */
+				err = -EALREADY;
+				goto out;
+			}
+		}
+		if (reg->priority < elem->priority)
+			prevelem = elem;
+	}
+
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->orig_ops = reg;
+	entry->ops = *reg;
+
+	if (prevelem)
+		elem = prevelem;
+
+	err = __xdp_register_one_hook(dev, hook_list, entry, elem);
+	if (err)
+		goto err;
+
+out:
+	mutex_unlock(&xdp_hook_mutex);
+
+	return 0;
+
+err:
+	mutex_unlock(&xdp_hook_mutex);
+	kfree(entry);
+	return err;
+}
+EXPORT_SYMBOL_GPL(__xdp_register_hook);
+
+/* Mutext xdp_hook_mutex must be held */
+static void __xdp_unregister_one_hook(struct net_device *dev,
+				      struct list_head *hook_list,
+				      struct xdp_hook_ops *elem)
+{
+	struct xdp_hook_entry *entry =
+		container_of(elem, struct xdp_hook_entry, ops);
+
+	list_del_rcu(&entry->ops.list);
+	static_branch_dec(&xdp_hooks_needed);
+	dev->xdp_hook_cnt--;
+
+	if (elem->priv && elem->put_priv)
+		elem->put_priv(elem->priv);
+
+	if (!dev->xdp_hook_cnt && dev->netdev_ops->ndo_xdp) {
+		struct netdev_xdp xdp_op = {};
+
+		xdp_op.command = XDP_DEV_FINISH;
+		dev->netdev_ops->ndo_xdp(dev, &xdp_op);
+	}
+}
+
+void __xdp_unregister_hook(struct net_device *dev,
+			   struct list_head *hook_list,
+			   const struct xdp_hook_ops *reg)
+{
+	struct xdp_hook_entry *entry;
+	struct xdp_hook_ops *elem;
+
+	mutex_lock(&xdp_hook_mutex);
+	list_for_each_entry(elem, hook_list, list) {
+		entry = container_of(elem, struct xdp_hook_entry, ops);
+		if (entry->orig_ops == reg) {
+			__xdp_unregister_one_hook(dev, hook_list, elem);
+			break;
+		}
+	}
+	mutex_unlock(&xdp_hook_mutex);
+	if (&elem->list == hook_list) {
+		WARN(1, "xdp_unregister__hook: hook not found!\n");
+		return;
+	}
+	synchronize_net();
+
+	kfree(entry);
+}
+EXPORT_SYMBOL_GPL(__xdp_unregister_hook);
+
+static void __xdp_unregister_hooks(struct net_device *dev,
+				   struct list_head *hook_list)
+{
+	struct xdp_hook_ops *elem, *telem;
+
+	list_for_each_entry_safe(elem, telem, hook_list, list)
+		__xdp_unregister_one_hook(dev, hook_list, elem);
+}
+
+void xdp_unregister_all_hooks(struct net_device *dev)
+{
+	struct napi_struct *napi;
+
+	/* Unregister NAPI hooks for device */
+	list_for_each_entry(napi, &dev->napi_list, dev_list)
+		__xdp_unregister_hooks(dev, &napi->xdp_hook_list);
+
+	/* Unregister device hooks */
+	__xdp_unregister_hooks(dev, &dev->xdp_hook_list);
+}
+EXPORT_SYMBOL_GPL(xdp_unregister_all_hooks);
+
+void xdp_unregister_net_hooks(struct net *net, struct xdp_hook_ops *reg)
+{
+	struct net_device *dev;
+	struct napi_struct *napi;
+
+	list_for_each_entry_rcu(dev, &net->dev_base_head, dev_list) {
+		list_for_each_entry(napi, &dev->napi_list, dev_list)
+			xdp_unregister_napi_hook(napi, reg);
+
+		xdp_unregister_dev_hook(dev, reg);
+	}
+}
+EXPORT_SYMBOL_GPL(xdp_unregister_net_hooks);
+
+bool __xdp_find_hook(struct list_head *hook_list,
+		  const struct xdp_hook_ops *reg,
+		  struct xdp_hook_ops *ret)
+{
+	struct xdp_hook_entry *entry;
+	struct xdp_hook_ops *elem;
+
+	list_for_each_entry_rcu(elem, hook_list, list) {
+		entry = container_of(elem, struct xdp_hook_entry, ops);
+		if (entry->orig_ops == reg) {
+			*ret = *elem;
+			return true;
+		}
+	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(__xdp_find_hook);
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH RFC 2/3] mlx4: Change XDP/BPF to use generic XDP infrastructure
  2016-09-20 22:00 [PATCH RFC 0/3] xdp: Generalize XDP Tom Herbert
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
@ 2016-09-20 22:00 ` Tom Herbert
  2016-09-20 22:00 ` [PATCH RFC 3/3] netdevice: Remove obsolete xdp_netdev_command Tom Herbert
  2 siblings, 0 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 22:00 UTC (permalink / raw)
  To: davem, netdev
  Cc: kernel-team, tariqt, bblanco, alexei.starovoitov, eric.dumazet, brouer

This patch changes the XDP-BPF implementation to use the generic
XDP infrastructure. This includes corresponding changes to the
Mellanox XDP code.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 ++++----------------------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 25 ++++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
 include/linux/filter.h                         | 13 ------
 net/core/dev.c                                 | 40 +++++++++++++---
 net/core/filter.c                              |  7 +--
 net/core/rtnetlink.c                           | 16 +++----
 7 files changed, 63 insertions(+), 103 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 62516f8..47990b7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2622,39 +2622,15 @@ static int mlx4_en_set_tx_maxrate(struct net_device *dev, int queue_index, u32 m
 	return err;
 }
 
-static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+static int mlx4_xdp_make_tx_rings(struct net_device *dev, int xdp_ring_num)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
-	struct bpf_prog *old_prog;
-	int xdp_ring_num;
 	int port_up = 0;
 	int err;
-	int i;
-
-	xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0;
 
-	/* No need to reconfigure buffers when simply swapping the
-	 * program for a new one.
-	 */
-	if (priv->xdp_ring_num == xdp_ring_num) {
-		if (prog) {
-			prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
-			if (IS_ERR(prog))
-				return PTR_ERR(prog);
-		}
-		mutex_lock(&mdev->state_lock);
-		for (i = 0; i < priv->rx_ring_num; i++) {
-			old_prog = rcu_dereference_protected(
-					priv->rx_ring[i]->xdp_prog,
-					lockdep_is_held(&mdev->state_lock));
-			rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
-			if (old_prog)
-				bpf_prog_put(old_prog);
-		}
-		mutex_unlock(&mdev->state_lock);
+	if (priv->xdp_ring_num == xdp_ring_num)
 		return 0;
-	}
 
 	if (priv->num_frags > 1) {
 		en_err(priv, "Cannot set XDP if MTU requires multiple frags\n");
@@ -2668,12 +2644,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -EINVAL;
 	}
 
-	if (prog) {
-		prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
-		if (IS_ERR(prog))
-			return PTR_ERR(prog);
-	}
-
 	mutex_lock(&mdev->state_lock);
 	if (priv->port_up) {
 		port_up = 1;
@@ -2684,15 +2654,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	netif_set_real_num_tx_queues(dev, priv->tx_ring_num -
 							priv->xdp_ring_num);
 
-	for (i = 0; i < priv->rx_ring_num; i++) {
-		old_prog = rcu_dereference_protected(
-					priv->rx_ring[i]->xdp_prog,
-					lockdep_is_held(&mdev->state_lock));
-		rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
-		if (old_prog)
-			bpf_prog_put(old_prog);
-	}
-
 	if (port_up) {
 		err = mlx4_en_start_port(dev);
 		if (err) {
@@ -2706,23 +2667,18 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	return 0;
 }
 
-static bool mlx4_xdp_attached(struct net_device *dev)
+static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 
-	return !!priv->xdp_ring_num;
-}
-
-static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
-{
 	switch (xdp->command) {
-	case XDP_SETUP_PROG:
-		return mlx4_xdp_set(dev, xdp->prog);
-	case XDP_QUERY_PROG:
-		xdp->prog_attached = mlx4_xdp_attached(dev);
-		return 0;
+	case XDP_DEV_INIT:
+		return mlx4_xdp_make_tx_rings(dev,
+		    ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP));
+	case XDP_DEV_FINISH:
+		return mlx4_xdp_make_tx_rings(dev, 0);
 	default:
-		return -EINVAL;
+		return 0;
 	}
 }
 
@@ -3210,7 +3166,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 
 	dev->vlan_features = dev->hw_features;
 
-	dev->hw_features |= NETIF_F_RXCSUM | NETIF_F_RXHASH;
+	dev->hw_features |= NETIF_F_RXCSUM | NETIF_F_RXHASH | NETIF_F_XDP;
 	dev->features = dev->hw_features | NETIF_F_HIGHDMA |
 			NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
 			NETIF_F_HW_VLAN_CTAG_FILTER;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c80073e..e06ac63 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -42,6 +42,7 @@
 #include <linux/if_vlan.h>
 #include <linux/vmalloc.h>
 #include <linux/irq.h>
+#include <net/xdp.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ip6_checksum.h>
@@ -535,13 +536,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_rx_ring *ring = *pring;
-	struct bpf_prog *old_prog;
 
-	old_prog = rcu_dereference_protected(
-					ring->xdp_prog,
-					lockdep_is_held(&mdev->state_lock));
-	if (old_prog)
-		bpf_prog_put(old_prog);
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
 	vfree(ring->rx_info);
 	ring->rx_info = NULL;
@@ -783,7 +778,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring];
 	struct mlx4_en_rx_alloc *frags;
 	struct mlx4_en_rx_desc *rx_desc;
-	struct bpf_prog *xdp_prog;
 	int doorbell_pending;
 	struct sk_buff *skb;
 	int tx_index;
@@ -795,6 +789,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	int factor = priv->cqe_factor;
 	u64 timestamp;
 	bool l2_tunnel;
+	bool run_xdp;
 
 	if (!priv->port_up)
 		return 0;
@@ -802,9 +797,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	if (budget <= 0)
 		return polled;
 
-	/* Protect accesses to: ring->xdp_prog, priv->mac_hash list */
+	/* Protect accesses to: XDP hooks, priv->mac_hash list */
 	rcu_read_lock();
-	xdp_prog = rcu_dereference(ring->xdp_prog);
+	run_xdp = xdp_hook_run_needed_check(&cq->napi);
 	doorbell_pending = 0;
 	tx_index = (priv->tx_ring_num - priv->xdp_ring_num) + cq->ring;
 
@@ -880,10 +875,10 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		l2_tunnel = (dev->hw_enc_features & NETIF_F_RXCSUM) &&
 			(cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_L2_TUNNEL));
 
-		/* A bpf program gets first chance to drop the packet. It may
+		/* An xdp program gets first chance to drop the packet. It may
 		 * read bytes but not past the end of the frag.
 		 */
-		if (xdp_prog) {
+		if (run_xdp) {
 			struct xdp_buff xdp;
 			dma_addr_t dma;
 			u32 act;
@@ -897,7 +892,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 							frags[0].page_offset;
 			xdp.data_end = xdp.data + length;
 
-			act = bpf_prog_run_xdp(xdp_prog, &xdp);
+			act = xdp_hook_run(&cq->napi, &xdp);
 			switch (act) {
 			case XDP_PASS:
 				break;
@@ -906,14 +901,14 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 							length, tx_index,
 							&doorbell_pending))
 					goto consumed;
-				goto next; /* Drop on xmit failure */
-			default:
-				bpf_warn_invalid_xdp_action(act);
+				break;
 			case XDP_ABORTED:
 			case XDP_DROP:
 				if (mlx4_en_rx_recycle(ring, frags))
 					goto consumed;
 				goto next;
+			default:
+				xdp_warn_invalid_action(act);
 			}
 		}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index a3528dd..56d5950 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -340,7 +340,6 @@ struct mlx4_en_rx_ring {
 	u8  fcs_del;
 	void *buf;
 	void *rx_info;
-	struct bpf_prog __rcu *xdp_prog;
 	struct mlx4_en_page_cache page_cache;
 	unsigned long bytes;
 	unsigned long packets;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 2a26133..f9863ee 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -494,18 +494,6 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 	return BPF_PROG_RUN(prog, skb);
 }
 
-static inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
-				   struct xdp_buff *xdp)
-{
-	u32 ret;
-
-	rcu_read_lock();
-	ret = BPF_PROG_RUN(prog, (void *)xdp);
-	rcu_read_unlock();
-
-	return ret;
-}
-
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
 	return max(sizeof(struct bpf_prog),
@@ -590,7 +578,6 @@ bool bpf_helper_changes_skb_data(void *func);
 
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
-void bpf_warn_invalid_xdp_action(u32 act);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/net/core/dev.c b/net/core/dev.c
index 0d2c826..d35ee4d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -142,6 +142,7 @@
 #include <linux/sctp.h>
 #include <linux/crash_dump.h>
 #include <net/xdp.h>
+#include <linux/filter.h>
 
 #include "net-sysfs.h"
 
@@ -6635,6 +6636,32 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down);
 
+static u32 dev_bpf_prog_run_xdp(const void *priv,
+				struct xdp_buff *xdp)
+{
+	const struct bpf_prog *prog = (const struct bpf_prog *)priv;
+	u32 ret;
+
+	rcu_read_lock();
+	ret = BPF_PROG_RUN(prog, (void *)xdp);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static void dev_bpf_prog_put_xdp(const void *priv)
+{
+	bpf_prog_put((struct bpf_prog *)priv);
+}
+
+struct xdp_hook_ops xdp_bpf_hook_ops = {
+	.hook = dev_bpf_prog_run_xdp,
+	.put_priv = dev_bpf_prog_put_xdp,
+	.priority = 0,
+};
+
+static DEFINE_MUTEX(xdp_bpf_lock);
+
 /**
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
@@ -6644,22 +6671,23 @@ EXPORT_SYMBOL(dev_change_proto_down);
  */
 int dev_change_xdp_fd(struct net_device *dev, int fd)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
 	struct bpf_prog *prog = NULL;
-	struct netdev_xdp xdp = {};
 	int err;
 
-	if (!ops->ndo_xdp)
+	if (!(dev->features & NETIF_F_XDP))
 		return -EOPNOTSUPP;
+
 	if (fd >= 0) {
 		prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 	}
 
-	xdp.command = XDP_SETUP_PROG;
-	xdp.prog = prog;
-	err = ops->ndo_xdp(dev, &xdp);
+	mutex_lock(&xdp_bpf_lock); /* Since xdp_bpf_hook_ops is modified */
+	xdp_bpf_hook_ops.priv = prog;
+	err = xdp_change_dev_hook(dev, &xdp_bpf_hook_ops);
+	mutex_unlock(&xdp_bpf_lock);
+
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 298b146..f4a1ea8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -51,6 +51,7 @@
 #include <net/dst_metadata.h>
 #include <net/dst.h>
 #include <net/sock_reuseport.h>
+#include <net/xdp.h>
 
 /**
  *	sk_filter_trim_cap - run a packet through a socket filter
@@ -2595,12 +2596,6 @@ static bool xdp_is_valid_access(int off, int size,
 	return __is_valid_xdp_access(off, size, type);
 }
 
-void bpf_warn_invalid_xdp_action(u32 act)
-{
-	WARN_ONCE(1, "Illegal XDP return value %u, expect packet loss\n", act);
-}
-EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
-
 static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					int src_reg, int ctx_off,
 					struct bpf_insn *insn_buf,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 0dbae42..c1aeb71 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -56,6 +56,7 @@
 #include <net/fib_rules.h>
 #include <net/rtnetlink.h>
 #include <net/net_namespace.h>
+#include <net/xdp.h>
 
 struct rtnl_link {
 	rtnl_doit_func		doit;
@@ -897,7 +898,7 @@ static size_t rtnl_xdp_size(const struct net_device *dev)
 {
 	size_t xdp_size = nla_total_size(1);	/* XDP_ATTACHED */
 
-	if (!dev->netdev_ops->ndo_xdp)
+	if (!(dev->features & NETIF_F_XDP))
 		return 0;
 	else
 		return xdp_size;
@@ -1226,20 +1227,19 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 
 static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 {
-	struct netdev_xdp xdp_op = {};
 	struct nlattr *xdp;
+	struct xdp_hook_ops ret;
 	int err;
 
-	if (!dev->netdev_ops->ndo_xdp)
-		return 0;
 	xdp = nla_nest_start(skb, IFLA_XDP);
 	if (!xdp)
 		return -EMSGSIZE;
-	xdp_op.command = XDP_QUERY_PROG;
-	err = dev->netdev_ops->ndo_xdp(dev, &xdp_op);
-	if (err)
+
+	err = xdp_find_dev_hook(dev, &xdp_bpf_hook_ops, &ret);
+	if (err && err != -ENOENT)
 		goto err_cancel;
-	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, xdp_op.prog_attached);
+
+	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, !err);
 	if (err)
 		goto err_cancel;
 
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH RFC 3/3] netdevice: Remove obsolete xdp_netdev_command
  2016-09-20 22:00 [PATCH RFC 0/3] xdp: Generalize XDP Tom Herbert
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
  2016-09-20 22:00 ` [PATCH RFC 2/3] mlx4: Change XDP/BPF to use generic XDP infrastructure Tom Herbert
@ 2016-09-20 22:00 ` Tom Herbert
  2 siblings, 0 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 22:00 UTC (permalink / raw)
  To: davem, netdev
  Cc: kernel-team, tariqt, bblanco, alexei.starovoitov, eric.dumazet, brouer

Remove XDP_SETUP_PROG and XDP_QUERY_PROG as they should no longer be
needed.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/netdevice.h | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f2b7d1b..9a545ab 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -808,18 +808,6 @@ struct tc_to_netdev {
  * to the netdevice through the xdp op.
  */
 enum xdp_netdev_command {
-	/* Set or clear a bpf program used in the earliest stages of packet
-	 * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
-	 * is responsible for calling bpf_prog_put on any old progs that are
-	 * stored. In case of error, the callee need not release the new prog
-	 * reference, but on success it takes ownership and must bpf_prog_put
-	 * when it is no longer used.
-	 */
-	XDP_SETUP_PROG,
-	/* Check if a bpf program is set on the device.  The callee should
-	 * return true if a program is currently attached and running.
-	 */
-	XDP_QUERY_PROG,
 	/* Initialize XDP in the device. Called the first time an XDP hook
 	 * hook is being set on the device.
 	 */
@@ -833,10 +821,7 @@ enum xdp_netdev_command {
 struct netdev_xdp {
 	enum xdp_netdev_command command;
 	union {
-		/* XDP_SETUP_PROG */
-		struct bpf_prog *prog;
-		/* XDP_QUERY_PROG */
-		bool prog_attached;
+		/* Command parameters */
 	};
 };
 
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
@ 2016-09-20 22:37   ` Eric Dumazet
  2016-09-20 22:40     ` Tom Herbert
  2016-09-20 22:44   ` Thomas Graf
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Eric Dumazet @ 2016-09-20 22:37 UTC (permalink / raw)
  To: Tom Herbert
  Cc: davem, netdev, kernel-team, tariqt, bblanco, alexei.starovoitov, brouer

On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote:

> diff --git a/net/core/xdp.c b/net/core/xdp.c
> new file mode 100644
> index 0000000..815ead8
> --- /dev/null
> +++ b/net/core/xdp.c
> @@ -0,0 +1,211 @@
> +/*
> + * Kernel Connection Multiplexor
> + *
> + * Copyright (c) 2016 Tom Herbert <tom@herbertland.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation.
> + */


Too much copy/paste Tom :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:37   ` Eric Dumazet
@ 2016-09-20 22:40     ` Tom Herbert
  0 siblings, 0 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 22:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov,
	Jesper Dangaard Brouer

On Tue, Sep 20, 2016 at 3:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote:
>
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> new file mode 100644
>> index 0000000..815ead8
>> --- /dev/null
>> +++ b/net/core/xdp.c
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Kernel Connection Multiplexor
>> + *
>> + * Copyright (c) 2016 Tom Herbert <tom@herbertland.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2
>> + * as published by the Free Software Foundation.
>> + */
>
>
> Too much copy/paste Tom :)
>
If this is you're only complaint Eric I'll be quite impressed ;-)

>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
  2016-09-20 22:37   ` Eric Dumazet
@ 2016-09-20 22:44   ` Thomas Graf
  2016-09-20 22:49     ` Tom Herbert
  2016-09-21  0:01   ` Alexei Starovoitov
  2016-09-23 11:13   ` Jamal Hadi Salim
  3 siblings, 1 reply; 39+ messages in thread
From: Thomas Graf @ 2016-09-20 22:44 UTC (permalink / raw)
  To: Tom Herbert
  Cc: davem, netdev, kernel-team, tariqt, bblanco, alexei.starovoitov,
	eric.dumazet, brouer

On 09/20/16 at 03:00pm, Tom Herbert wrote:
> +static inline int __xdp_hook_run(struct list_head *list_head,
> +				 struct xdp_buff *xdp)
> +{
> +	struct xdp_hook_ops *elem;
> +	int ret = XDP_PASS;
> +
> +	list_for_each_entry(elem, list_head, list) {
> +		ret = elem->hook(elem->priv, xdp);
> +		if (ret != XDP_PASS)
> +			break;
> +	}

Walking over a linear list? Really? :-) I thought this was supposed
to be fast, no compromises made.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:44   ` Thomas Graf
@ 2016-09-20 22:49     ` Tom Herbert
  2016-09-20 23:09       ` Thomas Graf
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 22:49 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/20/16 at 03:00pm, Tom Herbert wrote:
>> +static inline int __xdp_hook_run(struct list_head *list_head,
>> +                              struct xdp_buff *xdp)
>> +{
>> +     struct xdp_hook_ops *elem;
>> +     int ret = XDP_PASS;
>> +
>> +     list_for_each_entry(elem, list_head, list) {
>> +             ret = elem->hook(elem->priv, xdp);
>> +             if (ret != XDP_PASS)
>> +                     break;
>> +     }
>
> Walking over a linear list? Really? :-) I thought this was supposed
> to be fast, no compromises made.

Can you suggest an alternative?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:49     ` Tom Herbert
@ 2016-09-20 23:09       ` Thomas Graf
  2016-09-20 23:18         ` Tom Herbert
  2016-09-20 23:22         ` Daniel Borkmann
  0 siblings, 2 replies; 39+ messages in thread
From: Thomas Graf @ 2016-09-20 23:09 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 09/20/16 at 03:49pm, Tom Herbert wrote:
> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > On 09/20/16 at 03:00pm, Tom Herbert wrote:
> >> +static inline int __xdp_hook_run(struct list_head *list_head,
> >> +                              struct xdp_buff *xdp)
> >> +{
> >> +     struct xdp_hook_ops *elem;
> >> +     int ret = XDP_PASS;
> >> +
> >> +     list_for_each_entry(elem, list_head, list) {
> >> +             ret = elem->hook(elem->priv, xdp);
> >> +             if (ret != XDP_PASS)
> >> +                     break;
> >> +     }
> >
> > Walking over a linear list? Really? :-) I thought this was supposed
> > to be fast, no compromises made.
> 
> Can you suggest an alternative?

Single BPF program that encodes whatever logic is required. This is
what BPF is for. If it absolutely has to run two programs in sequence
then it can still do that even though I really don't see much of a
point of doing that in a high performance environment.

I'm not even sure yet I understand full purpose of this yet ;-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 23:09       ` Thomas Graf
@ 2016-09-20 23:18         ` Tom Herbert
  2016-09-20 23:43           ` Thomas Graf
  2016-09-20 23:22         ` Daniel Borkmann
  1 sibling, 1 reply; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 23:18 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Tue, Sep 20, 2016 at 4:09 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/20/16 at 03:49pm, Tom Herbert wrote:
>> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf <tgraf@suug.ch> wrote:
>> > On 09/20/16 at 03:00pm, Tom Herbert wrote:
>> >> +static inline int __xdp_hook_run(struct list_head *list_head,
>> >> +                              struct xdp_buff *xdp)
>> >> +{
>> >> +     struct xdp_hook_ops *elem;
>> >> +     int ret = XDP_PASS;
>> >> +
>> >> +     list_for_each_entry(elem, list_head, list) {
>> >> +             ret = elem->hook(elem->priv, xdp);
>> >> +             if (ret != XDP_PASS)
>> >> +                     break;
>> >> +     }
>> >
>> > Walking over a linear list? Really? :-) I thought this was supposed
>> > to be fast, no compromises made.
>>
>> Can you suggest an alternative?
>
> Single BPF program that encodes whatever logic is required. This is
> what BPF is for. If it absolutely has to run two programs in sequence
> then it can still do that even though I really don't see much of a
> point of doing that in a high performance environment.
>
> I'm not even sure yet I understand full purpose of this yet ;-)

This allows other use cases than BPF inserting code into the data
path. This gives XDP potential more utility and more users so that we
can motivate more driver implementations. For instance, I thinks it's
totally reasonable if the nftables guys want to insert some of their
rules to perform early DDOS drop to get the same performance that we
see in XDP.

Tom

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 23:09       ` Thomas Graf
  2016-09-20 23:18         ` Tom Herbert
@ 2016-09-20 23:22         ` Daniel Borkmann
  1 sibling, 0 replies; 39+ messages in thread
From: Daniel Borkmann @ 2016-09-20 23:22 UTC (permalink / raw)
  To: Thomas Graf, Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 09/21/2016 01:09 AM, Thomas Graf wrote:
> On 09/20/16 at 03:49pm, Tom Herbert wrote:
>> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf <tgraf@suug.ch> wrote:
>>> On 09/20/16 at 03:00pm, Tom Herbert wrote:
>>>> +static inline int __xdp_hook_run(struct list_head *list_head,
>>>> +                              struct xdp_buff *xdp)
>>>> +{
>>>> +     struct xdp_hook_ops *elem;
>>>> +     int ret = XDP_PASS;
>>>> +
>>>> +     list_for_each_entry(elem, list_head, list) {
>>>> +             ret = elem->hook(elem->priv, xdp);
>>>> +             if (ret != XDP_PASS)
>>>> +                     break;
>>>> +     }
>>>
>>> Walking over a linear list? Really? :-) I thought this was supposed
>>> to be fast, no compromises made.
>>
>> Can you suggest an alternative?
>
> Single BPF program that encodes whatever logic is required. This is
> what BPF is for. If it absolutely has to run two programs in sequence
> then it can still do that even though I really don't see much of a
> point of doing that in a high performance environment.

Agreed, if there's one thing I would change in cls_bpf, then it's getting
rid of the (uapi unfortunately) list of classifiers and just make it a
single one, because that's all that is needed, and chaining/pipelining
can be done via tail calls for example. This whole list + callback likely
also makes things slower. Why not let more drivers come first to support
the current xdp model we have, and then we can move common parts to the
driver-independent core code?

> I'm not even sure yet I understand full purpose of this yet ;-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 23:18         ` Tom Herbert
@ 2016-09-20 23:43           ` Thomas Graf
  2016-09-20 23:59             ` Tom Herbert
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Graf @ 2016-09-20 23:43 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 09/20/16 at 04:18pm, Tom Herbert wrote:
> This allows other use cases than BPF inserting code into the data
> path. This gives XDP potential more utility and more users so that we
> can motivate more driver implementations. For instance, I thinks it's
> totally reasonable if the nftables guys want to insert some of their
> rules to perform early DDOS drop to get the same performance that we
> see in XDP.

Reasonable point with nftables but are any of these users on the table
already and ready to consume non-skbs? It would be a pity to add this
complexity and cost if it is never used.

I don't see how we can ensure performance if we have multiple
subsystems register for the hook each adding their own parsers which
need to be passed through sequentially. Maybe I'm missing something.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 23:43           ` Thomas Graf
@ 2016-09-20 23:59             ` Tom Herbert
  2016-09-21  0:13               ` Alexei Starovoitov
  2016-09-21 11:55               ` Thomas Graf
  0 siblings, 2 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-20 23:59 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Tue, Sep 20, 2016 at 4:43 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/20/16 at 04:18pm, Tom Herbert wrote:
>> This allows other use cases than BPF inserting code into the data
>> path. This gives XDP potential more utility and more users so that we
>> can motivate more driver implementations. For instance, I thinks it's
>> totally reasonable if the nftables guys want to insert some of their
>> rules to perform early DDOS drop to get the same performance that we
>> see in XDP.
>
> Reasonable point with nftables but are any of these users on the table
> already and ready to consume non-skbs? It would be a pity to add this
> complexity and cost if it is never used.
>
Well, need to measure to ascertain the cost. As for complexity, this
actually reduces complexity needed for XDP in the drivers which is a
good thing because that's where most of the support and development
pain will be.

I am looking at using this for ILA router. The problem I am hitting is
that not all packets that we need to translate go through the XDP
path. Some would go through the kernel path, some through XDP path but
that would mean I need parallel lookup tables to be maintained for the
two paths which won't scale. ILA translation is so trivial and not
really something that we need to be user programmable, the fast path
is really for accelerating an existing kernel capability. If I can
reuse the kernel code already written and the existing kernel data
structures to make a fast path in XDP there is a lot of value in that
for me.

> I don't see how we can ensure performance if we have multiple
> subsystems register for the hook each adding their own parsers which
> need to be passed through sequentially. Maybe I'm missing something.

We can optimize for allowing only one hook, or maybe limit to only
allowing one hook to be set. In any case this obviously requires a lot
of performance evaluation, I am hoping to feedback on the design
first. My question about using a linear list for this was real, do you
know a better method off hand to implement a call list?

Thanks,
Tom

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
  2016-09-20 22:37   ` Eric Dumazet
  2016-09-20 22:44   ` Thomas Graf
@ 2016-09-21  0:01   ` Alexei Starovoitov
  2016-09-21  6:39     ` Jesper Dangaard Brouer
  2016-09-21 17:26     ` Jakub Kicinski
  2016-09-23 11:13   ` Jamal Hadi Salim
  3 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2016-09-21  0:01 UTC (permalink / raw)
  To: Tom Herbert, davem, netdev
  Cc: kernel-team, tariqt, bblanco, alexei.starovoitov, eric.dumazet, brouer

On 9/20/16 3:00 PM, Tom Herbert wrote:
> +static inline int __xdp_hook_run(struct list_head *list_head,
> +				 struct xdp_buff *xdp)
> +{
> +	struct xdp_hook_ops *elem;
> +	int ret = XDP_PASS;
> +
> +	list_for_each_entry(elem, list_head, list) {
> +		ret = elem->hook(elem->priv, xdp);
> +		if (ret != XDP_PASS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +/* Run the XDP hooks for a napi device. Called from a driver's receive
> + * routine
> + */
> +static inline int xdp_hook_run(struct napi_struct *napi, struct xdp_buff *xdp)
> +{
> +	struct net_device *dev = napi->dev;
> +	int ret = XDP_PASS;
> +
> +	if (static_branch_unlikely(&xdp_hooks_needed)) {
> +		/* Run hooks in napi first */
> +		ret = __xdp_hook_run(&napi->xdp_hook_list, xdp);
> +		if (ret != XDP_PASS)
> +			return ret;
> +
> +		/* Now run device hooks */
> +		ret = __xdp_hook_run(&dev->xdp_hook_list, xdp);
> +		if (ret != XDP_PASS)
> +			return ret;
> +	}
> +
> +	return ret;
> +}

it's an interesting idea to move prog pointer into napi struct,
but certainly not at such huge cost.
Right now it's 1 load + 1 cmp + 1 indirect jump per packet
to invoke the program, with above approach it becomes
6 loads + 3 cmp (just to get through run_needed_check() check)
+ 6 loads + 3 cmp + 2 indirect jumps.
(I may be little bit off +- few loads)
That is a non-starter.
When we were optimizing receive path of tc clast ingress hook
we saw 1Mpps saved for every load+cmp+indirect jump removed.

We're working on inlining of bpf_map_lookup to save one
indirect call per lookup, we cannot just waste them here.

We need to save cycles instead, especially when it doesn't
really solve your goals. It seems the goals are:

 >- Allows alternative users of the XDP hooks other than the original
 >    BPF

this should be achieved by their own hooks while reusing
return codes XDP_TX, XDP_PASS to keep driver side the same.
I'm not against other packet processing engines, but not
at the cost of lower performance.

 >  - Allows a means to pipeline XDP programs together

this can be done already via bpf_tail_call. No changes needed.

 >  - Reduces the amount of code and complexity needed in drivers to
 >    manage XDP

hmm:
534 insertions(+), 144 deletions(-)
looks like increase in complexity instead.

 >  - Provides a more structured environment that is extensible to new
 >    features while being mostly transparent to the drivers

don't see that in these patches either.
Things like packet size change (that we're working on) still
has to be implemented for every driver.
Existing XDP_TX, XDP_DROP have to be implemented per driver as well.

Also introduction of xdp.h breaks existing UAPI.
That's not acceptable either.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 23:59             ` Tom Herbert
@ 2016-09-21  0:13               ` Alexei Starovoitov
  2016-09-21 11:55               ` Thomas Graf
  1 sibling, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2016-09-21  0:13 UTC (permalink / raw)
  To: Tom Herbert, Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 9/20/16 4:59 PM, Tom Herbert wrote:
> I am looking at using this for ILA router. The problem I am hitting is
> that not all packets that we need to translate go through the XDP
> path. Some would go through the kernel path, some through XDP path but
> that would mean I need parallel lookup tables to be maintained for the
> two paths which won't scale. ILA translation is so trivial and not
> really something that we need to be user programmable, the fast path
> is really for accelerating an existing kernel capability. If I can
> reuse the kernel code already written and the existing kernel data
> structures to make a fast path in XDP there is a lot of value in that
> for me.

sounds like you want to add hard coded ILA rewriter to the driver
instead of doing it as BPF program?!
That is 180 degree turn vs the whole protocol ossification tune
that I thought you strongly believe in.

What kernel data structures do you want to reuse?
ILA rewriter needs single hash lookup. Several different
types of hash maps exist on bpf side already and
even more are coming that will be usable by both tc and xdp side.
csum adjustment? we have them for tc. Not for xdp yet,
but it's trivial to allow them on xdp side too.
May be we should talk about real motivation for the patches
and see what is the best solution.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21  0:01   ` Alexei Starovoitov
@ 2016-09-21  6:39     ` Jesper Dangaard Brouer
  2016-09-21  8:42       ` Daniel Borkmann
  2016-09-21 15:44       ` Alexei Starovoitov
  2016-09-21 17:26     ` Jakub Kicinski
  1 sibling, 2 replies; 39+ messages in thread
From: Jesper Dangaard Brouer @ 2016-09-21  6:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet, brouer

On Tue, 20 Sep 2016 17:01:39 -0700 Alexei Starovoitov <ast@fb.com> wrote:

>  >  - Provides a more structured environment that is extensible to new
>  >    features while being mostly transparent to the drivers  
> 
> don't see that in these patches either.
> Things like packet size change (that we're working on) still
> has to be implemented for every driver.
> Existing XDP_TX, XDP_DROP have to be implemented per driver as well.
> 
> Also introduction of xdp.h breaks existing UAPI.
> That's not acceptable either.

This response piss me off!@@#$

We are the early stages of XDP development. Users cannot consider XDP a
stable UAPI yet.  I added a big fat warning to the docs here[1].

If you already consider this a stable API, then I will suggest that we
disable XDP or rip the hole thing out again!!! Create a separate tree
where we can cooperate on getting this right, before forcing this
upstream.  I have raise concern about this several times upstream, but
the patches got applied anyway, because you Alexei, promised this was
super extendable and we could still change the APIs later. Maybe you
tricked me?! I've started to look at the details, and I'm not happy with
the extensibility.  And I'm also not happy with Brenden's response to
my API concerns that this is just a "fire-and-forget" API.

Most importantly, the XDP interface for feature or capabilities
negotiation is missing.  Documented here[2].

I strongly believe the two actions XDP_DROP and XDP_TX should have been
express as two different capabilities, because XDP_TX requires more
changes to the device driver than a simple drop like XDP_DROP.

One can easily imagine (after the e1000 discussion) that an older
driver only want to implement the XDP_DROP facility. The reason is that
XDP_TX would requires changing too much driver code, which is a concern
for an old stable and time-proven driver.  Thus, the need for
negotiating features is already a practical problem!


[1] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/implementation/userspace_api.html

Quote:
  "Warning: The userspace API specification should have to be defined
   properly before code was accepted upstream. Concerns have been raise
   about the current API upstream. Users should expect this first API
   attempt will need adjustments. This cannot be considered a stable API yet."

[2] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/design.html#capabilities-negotiation

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21  6:39     ` Jesper Dangaard Brouer
@ 2016-09-21  8:42       ` Daniel Borkmann
  2016-09-21 15:44       ` Alexei Starovoitov
  1 sibling, 0 replies; 39+ messages in thread
From: Daniel Borkmann @ 2016-09-21  8:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Alexei Starovoitov
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet

On 09/21/2016 08:39 AM, Jesper Dangaard Brouer wrote:
> On Tue, 20 Sep 2016 17:01:39 -0700 Alexei Starovoitov <ast@fb.com> wrote:
>
>>   >  - Provides a more structured environment that is extensible to new
>>   >    features while being mostly transparent to the drivers
>>
>> don't see that in these patches either.
>> Things like packet size change (that we're working on) still
>> has to be implemented for every driver.
>> Existing XDP_TX, XDP_DROP have to be implemented per driver as well.
>>
>> Also introduction of xdp.h breaks existing UAPI.
>> That's not acceptable either.
>
> This response piss me off!@@#$
>
> We are the early stages of XDP development. Users cannot consider XDP a
> stable UAPI yet.  I added a big fat warning to the docs here[1].
>
> If you already consider this a stable API, then I will suggest that we
> disable XDP or rip the hole thing out again!!! Create a separate tree
> where we can cooperate on getting this right, before forcing this
> upstream.  I have raise concern about this several times upstream, but
> the patches got applied anyway, because you Alexei, promised this was
> super extendable and we could still change the APIs later. Maybe you
> tricked me?! I've started to look at the details, and I'm not happy with
> the extensibility.

I don't quite follow you here, Jesper. If you're talking about uapi-related
extensibility wrt struct xdp_md, then it's done the same way as done for tc
with 'shadow' struct __sk_buff. The concept of having this internal BPF insn
rewriter is working quite well for this, and it is extendable with new meta
data. Wrt return codes we're flexible to add new ones once agreed upon. The
whole XDP config is done via netlink, nested in IFLA_XDP container, so it can
be extended in future with other attrs, flags, etc, for setup and dumping.
The 'breakage' that was mentioned here was related to moving things into xdp.h.
It can be done and we did it similarly in the past with bpf_common.h, but
then really also bpf.h needs to include the new xdp.h header.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 23:59             ` Tom Herbert
  2016-09-21  0:13               ` Alexei Starovoitov
@ 2016-09-21 11:55               ` Thomas Graf
  2016-09-21 14:19                 ` Tom Herbert
  2016-09-21 17:26                 ` Jakub Kicinski
  1 sibling, 2 replies; 39+ messages in thread
From: Thomas Graf @ 2016-09-21 11:55 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 09/20/16 at 04:59pm, Tom Herbert wrote:
> Well, need to measure to ascertain the cost. As for complexity, this
> actually reduces complexity needed for XDP in the drivers which is a
> good thing because that's where most of the support and development
> pain will be.

I'm not objecting to anything that simplifies the process of adding
XDP capability to drivers. You have my full support here.

> I am looking at using this for ILA router. The problem I am hitting is
> that not all packets that we need to translate go through the XDP
> path. Some would go through the kernel path, some through XDP path but

When you say kernel path, what do you mean specifically? One aspect of
XDP I love is that XDP can act as an acceleration option for existing
BPF programs attached to cls_bpf. Support for direct packet read and
write at clsact level have made it straight forward to write programs
which are compatible or at minimum share a lot of common code. They
can share data structures, lookup functionality, etc.

> We can optimize for allowing only one hook, or maybe limit to only
> allowing one hook to be set. In any case this obviously requires a lot
> of performance evaluation, I am hoping to feedback on the design
> first. My question about using a linear list for this was real, do you
> know a better method off hand to implement a call list?

My main concern is that we overload the XDP hook. Instead of making use
of the programmable glue, we put a linked list in front where everybody
can attach a program to.

A possible alternative:
 1. The XDP hook always has single consumer controlled by the user
    through Netlink, BPF is one of them. If a user wants to hardcode
    the ILA router to that hook, he can do that.

 2. BPF for XDP is extended to allow returning a verdict which results
    in something else to be invoked. If user wants to invoke the ILA
    router for just some packets, he can do that.

That said, I see so much value in a BPF implementation of ILA at XDP
level with all of the parsing logic and exact semantics remain
flexible without the cost of translating some configuration to a set
of actions.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 11:55               ` Thomas Graf
@ 2016-09-21 14:19                 ` Tom Herbert
  2016-09-21 14:48                   ` Thomas Graf
  2016-09-21 15:39                   ` Alexei Starovoitov
  2016-09-21 17:26                 ` Jakub Kicinski
  1 sibling, 2 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-21 14:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, Sep 21, 2016 at 4:55 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/20/16 at 04:59pm, Tom Herbert wrote:
>> Well, need to measure to ascertain the cost. As for complexity, this
>> actually reduces complexity needed for XDP in the drivers which is a
>> good thing because that's where most of the support and development
>> pain will be.
>
> I'm not objecting to anything that simplifies the process of adding
> XDP capability to drivers. You have my full support here.
>
>> I am looking at using this for ILA router. The problem I am hitting is
>> that not all packets that we need to translate go through the XDP
>> path. Some would go through the kernel path, some through XDP path but
>
> When you say kernel path, what do you mean specifically? One aspect of
> XDP I love is that XDP can act as an acceleration option for existing
> BPF programs attached to cls_bpf. Support for direct packet read and
> write at clsact level have made it straight forward to write programs
> which are compatible or at minimum share a lot of common code. They
> can share data structures, lookup functionality, etc.
>
>> We can optimize for allowing only one hook, or maybe limit to only
>> allowing one hook to be set. In any case this obviously requires a lot
>> of performance evaluation, I am hoping to feedback on the design
>> first. My question about using a linear list for this was real, do you
>> know a better method off hand to implement a call list?
>
> My main concern is that we overload the XDP hook. Instead of making use
> of the programmable glue, we put a linked list in front where everybody
> can attach a program to.
>
> A possible alternative:
>  1. The XDP hook always has single consumer controlled by the user
>     through Netlink, BPF is one of them. If a user wants to hardcode
>     the ILA router to that hook, he can do that.
>
>  2. BPF for XDP is extended to allow returning a verdict which results
>     in something else to be invoked. If user wants to invoke the ILA
>     router for just some packets, he can do that.
>
> That said, I see so much value in a BPF implementation of ILA at XDP
> level with all of the parsing logic and exact semantics remain
> flexible without the cost of translating some configuration to a set
> of actions.

Thomas,

As I mentioned though, not all packets we need to translate are going
to go through XDP. ILA already integrates with the routing table via
LWT and that solution works incredibly well especially when we are
sourcing connections (ILA is by far the fastest most efficient
technique of network virtualization). We already have a lookup table
and the translation process coded in the kernel and configuration is
already implemented by netlink and iproute. So I'm not yet sure I want
to redesign all of this around BPF, maybe that is the right answer
maybe it isn't; but, my point is I don't want to be _forced_ into a
certain design that because of constraints on one kernel interface. As
a kernel developer I want flexibility on how we design and implement
things!

I think there are two questions that this patch set poses for the
community wrt XDP:

#1: Should we allow alternate code to run in XDP other than BPF?
#2: If #1 is true what is the best way to implement that?

If the answer to #1 is "no" then the answer to #2 is irrelevant. So
with this RFC I'm hoping we can come the agreement on questions #1.

IMO, there are at least three factors that would motivate the answer
to #1 be yes

1) There is precedence in nfhooks in creating generic hooks in the
critical data path that can have other uses than those originally
intended by the authors. I would take it as general principle that
hooks in the data path should be as generic as possible to get the
most utility.
2) The changes to make XDP work in drivers are quite invasive and as
we've seen in at least one attempt to modify a driver to support XDP
it is easy to break other mechanisms. If we're going to be modifying a
lot of drivers we really want to get the most value for that work.
3) The XDP interface is well designed and really has no dependencies
on BPF. Writing code to directly parse packets is not rocket science.
So it seems reasonable to me that a subsystem may want to use XDP
interface to accelerate existing functionality without introducing
additional dependencies on BPF or other non-kernel mechanisms

In any case, I ask everyone to be open minded. If there is agreement
that BPF is sufficient for all XDP use cases then I wont' pursue these
patches. But given the ramifications of XDP, especially the
considering the changes required across drivers to support it, I
really would like to get input from the community on this.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 14:19                 ` Tom Herbert
@ 2016-09-21 14:48                   ` Thomas Graf
  2016-09-21 15:08                     ` Tom Herbert
  2016-09-21 15:39                   ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Thomas Graf @ 2016-09-21 14:48 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 09/21/16 at 07:19am, Tom Herbert wrote:
> certain design that because of constraints on one kernel interface. As
> a kernel developer I want flexibility on how we design and implement
> things!

Perfectly valid argument. I reviewed your ILA changes and did not
object to them.


> I think there are two questions that this patch set poses for the
> community wrt XDP:
> 
> #1: Should we allow alternate code to run in XDP other than BPF?
> #2: If #1 is true what is the best way to implement that?
> 
> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
> with this RFC I'm hoping we can come the agreement on questions #1.

I'm not opposed to running non-BPF code at XDP. I'm against adding
a linked list of hook consumers.

Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
in combination with a potential XDP-nftables? We don't know yet I
guess.

Maybe exclusive access to the hook for one consumer as selected by
the user is good enough.

If that is not good enough: BPF (and potentially nftables in the
future) could provide means to perform a selection process where a
helper call can run another XDP prog or return a verdict to trigger
another XDP prog. Definitely more flexible and faster than a linear
list doing  if, else if, else if, else if, ...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 14:48                   ` Thomas Graf
@ 2016-09-21 15:08                     ` Tom Herbert
  2016-09-21 19:56                       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Herbert @ 2016-09-21 15:08 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, Sep 21, 2016 at 7:48 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/21/16 at 07:19am, Tom Herbert wrote:
>> certain design that because of constraints on one kernel interface. As
>> a kernel developer I want flexibility on how we design and implement
>> things!
>
> Perfectly valid argument. I reviewed your ILA changes and did not
> object to them.
>
>
>> I think there are two questions that this patch set poses for the
>> community wrt XDP:
>>
>> #1: Should we allow alternate code to run in XDP other than BPF?
>> #2: If #1 is true what is the best way to implement that?
>>
>> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
>> with this RFC I'm hoping we can come the agreement on questions #1.
>
> I'm not opposed to running non-BPF code at XDP. I'm against adding
> a linked list of hook consumers.
>
> Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
> in combination with a potential XDP-nftables? We don't know yet I
> guess.
>
Right. Admittedly, I feel like we owe a bit of reciprocity to
nftables. For ILA we are using the NF_INET_PRE_ROUTING hook with our
own code (looks like ipvlan set nfhooks as well). This works really
well and saves the value of early demux in ILA. Had we not had the
ability to use nfhooks in this fashion it's likely we would have had
to create another hook (we did try putting translation in nftables
rules but that was too inefficient for ILA).

> Maybe exclusive access to the hook for one consumer as selected by
> the user is good enough.
>
> If that is not good enough: BPF (and potentially nftables in the
> future) could provide means to perform a selection process where a
> helper call can run another XDP prog or return a verdict to trigger
> another XDP prog. Definitely more flexible and faster than a linear
> list doing  if, else if, else if, else if, ...

It seems reasonable that the the output of one program may be an
indication of another program. We've already talked about something
like that in regards to splitting BPF programs into device independent
program and device dependent program.

Tom

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 14:19                 ` Tom Herbert
  2016-09-21 14:48                   ` Thomas Graf
@ 2016-09-21 15:39                   ` Alexei Starovoitov
  1 sibling, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2016-09-21 15:39 UTC (permalink / raw)
  To: Tom Herbert, Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 9/21/16 7:19 AM, Tom Herbert wrote:
> #1: Should we allow alternate code to run in XDP other than BPF?

separate nft hook - yes
generic hook - no
since it's one step away from kernel modules abusing this hook.
pass/drop/tx of raw buffer at the driver level is a perfect
interface to bypass everything in the stack.
The tighter we make it the better.

If nft and bpf are both not flexible enough to express
dataplane functionality we should extend them instead of
writing C code or kernel modules.

On bpf side we're trying very hard to kill any dream of
interoperability with kernel modules.
The map and prog type registration is done in a way to make
it impossible for kernel modules to register their own
map and program types or provide their own helper functions.

nfhooks approach is very lax at that and imo it was a mistake,
since there are plenty of out of tree modules that using nf hooks
and plenty of in-tree modules that are barely maintained.

> #2: If #1 is true what is the best way to implement that?

Add separate nft hook that doesn't interfere in any way
with bpf hook at xdp level.
The order nft-first or bpf-first or exclusive attach
doesn't matter to me. These are details to be discussed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21  6:39     ` Jesper Dangaard Brouer
  2016-09-21  8:42       ` Daniel Borkmann
@ 2016-09-21 15:44       ` Alexei Starovoitov
  1 sibling, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2016-09-21 15:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet

On 9/20/16 11:39 PM, Jesper Dangaard Brouer wrote:
> We are the early stages of XDP development. Users cannot consider XDP a
> stable UAPI yet.  I added a big fat warning to the docs here[1].
>
> If you already consider this a stable API, then I will suggest that we
> disable XDP or rip the hole thing out again!!!

the doc that you wrote is a great description of your understanding
of what XDP is about. It's not an official spec or design document.
Until it is reviewed and lands in kernel.org, please do not
make any assumption about present or future XDP API based on it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 11:55               ` Thomas Graf
  2016-09-21 14:19                 ` Tom Herbert
@ 2016-09-21 17:26                 ` Jakub Kicinski
  1 sibling, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2016-09-21 17:26 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Tom Herbert, David S. Miller, Linux Kernel Network Developers,
	Kernel Team, Tariq Toukan, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Jesper Dangaard Brouer

On Wed, 21 Sep 2016 13:55:45 +0200, Thomas Graf wrote:
> > I am looking at using this for ILA router. The problem I am hitting is
> > that not all packets that we need to translate go through the XDP
> > path. Some would go through the kernel path, some through XDP path but  
> 
> When you say kernel path, what do you mean specifically? One aspect of
> XDP I love is that XDP can act as an acceleration option for existing
> BPF programs attached to cls_bpf. Support for direct packet read and
> write at clsact level have made it straight forward to write programs
> which are compatible or at minimum share a lot of common code. They
> can share data structures, lookup functionality, etc.

My very humble dream was that XDP would be transparently offloaded from
cls_bpf if program was simple enough.  That ship has most likely sailed
because XDP has different abort behaviour.  When possible though, trying
to offload higher-level hooks when the rules don't require access to
full skb would be really cool.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21  0:01   ` Alexei Starovoitov
  2016-09-21  6:39     ` Jesper Dangaard Brouer
@ 2016-09-21 17:26     ` Jakub Kicinski
  2016-09-21 17:39       ` Tom Herbert
  1 sibling, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2016-09-21 17:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet, brouer

On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:
>  >  - Reduces the amount of code and complexity needed in drivers to
>  >    manage XDP  
> 
> hmm:
> 534 insertions(+), 144 deletions(-)
> looks like increase in complexity instead.

and more to come to tie this with HW offloads.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 17:26     ` Jakub Kicinski
@ 2016-09-21 17:39       ` Tom Herbert
  2016-09-21 18:45         ` Jakub Kicinski
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Herbert @ 2016-09-21 17:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, David S. Miller,
	Linux Kernel Network Developers, Kernel Team, Tariq Toukan,
	Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:
>>  >  - Reduces the amount of code and complexity needed in drivers to
>>  >    manage XDP
>>
>> hmm:
>> 534 insertions(+), 144 deletions(-)
>> looks like increase in complexity instead.
>
> and more to come to tie this with HW offloads.

The amount of driver code did decrease with these patches:

drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 ++++----------------------
drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 25 ++++------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -

Minimizing complexity being added to drivers for XDP is critical since
we basically asking every driver to replicate the function. This
property also should also apply to HW offloads, the more complexity we
can abstract out drivers into a common backend infrastructure the
better for supporting across different drivers.

Tom

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 17:39       ` Tom Herbert
@ 2016-09-21 18:45         ` Jakub Kicinski
  2016-09-21 18:50           ` Tom Herbert
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2016-09-21 18:45 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Alexei Starovoitov, David S. Miller,
	Linux Kernel Network Developers, Kernel Team, Tariq Toukan,
	Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote:
> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:  
> >>  >  - Reduces the amount of code and complexity needed in drivers to
> >>  >    manage XDP  
> >>
> >> hmm:
> >> 534 insertions(+), 144 deletions(-)
> >> looks like increase in complexity instead.  
> >
> > and more to come to tie this with HW offloads.  
> 
> The amount of driver code did decrease with these patches:
> 
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 ++++----------------------
> drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 25 ++++------
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
> 
> Minimizing complexity being added to drivers for XDP is critical since
> we basically asking every driver to replicate the function. This
> property also should also apply to HW offloads, the more complexity we
> can abstract out drivers into a common backend infrastructure the
> better for supporting across different drivers.

I'm in the middle of writing/testing XDP support for the Netronome's
driver and generic infra is very much appreciated ;)  In my experience
the 50 lines of code which are required for assigning the programs and
freeing them aren't really a big deal, though.

Let's also separate putting xdp_prog in netdevice/napi_struct from the
generic hook infra.  All the simplifications to the driver AFAICS come
from the former.  If everyone is fine with growing napi_struct we can do
that but IMHO this is not an argument for the generic infra :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 18:45         ` Jakub Kicinski
@ 2016-09-21 18:50           ` Tom Herbert
  2016-09-21 18:54             ` Jakub Kicinski
  2016-09-21 18:58             ` Thomas Graf
  0 siblings, 2 replies; 39+ messages in thread
From: Tom Herbert @ 2016-09-21 18:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, David S. Miller,
	Linux Kernel Network Developers, Kernel Team, Tariq Toukan,
	Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, Sep 21, 2016 at 11:45 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote:
>> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:
>> >>  >  - Reduces the amount of code and complexity needed in drivers to
>> >>  >    manage XDP
>> >>
>> >> hmm:
>> >> 534 insertions(+), 144 deletions(-)
>> >> looks like increase in complexity instead.
>> >
>> > and more to come to tie this with HW offloads.
>>
>> The amount of driver code did decrease with these patches:
>>
>> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 ++++----------------------
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 25 ++++------
>> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
>>
>> Minimizing complexity being added to drivers for XDP is critical since
>> we basically asking every driver to replicate the function. This
>> property also should also apply to HW offloads, the more complexity we
>> can abstract out drivers into a common backend infrastructure the
>> better for supporting across different drivers.
>
> I'm in the middle of writing/testing XDP support for the Netronome's
> driver and generic infra is very much appreciated ;)  In my experience
> the 50 lines of code which are required for assigning the programs and
> freeing them aren't really a big deal, though.
>

50 lines in one driver is not a big deal, 50 lines in a hundred
drivers is! I learned this lesson in BQL which was well abstracted out
to be minimally invasive but we still saw many issues because of the
pecularities of different drivers.

> Let's also separate putting xdp_prog in netdevice/napi_struct from the
> generic hook infra.  All the simplifications to the driver AFAICS come
> from the former.  If everyone is fine with growing napi_struct we can do
> that but IMHO this is not an argument for the generic infra :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 18:50           ` Tom Herbert
@ 2016-09-21 18:54             ` Jakub Kicinski
  2016-09-21 18:58             ` Thomas Graf
  1 sibling, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2016-09-21 18:54 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Alexei Starovoitov, David S. Miller,
	Linux Kernel Network Developers, Kernel Team, Tariq Toukan,
	Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, 21 Sep 2016 11:50:06 -0700, Tom Herbert wrote:
> On Wed, Sep 21, 2016 at 11:45 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> > On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote:  
> >> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski <kubakici@wp.pl> wrote:  
> >> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:  
> >> >>  >  - Reduces the amount of code and complexity needed in drivers to
> >> >>  >    manage XDP  
> >> >>
> >> >> hmm:
> >> >> 534 insertions(+), 144 deletions(-)
> >> >> looks like increase in complexity instead.  
> >> >
> >> > and more to come to tie this with HW offloads.  
> >>
> >> The amount of driver code did decrease with these patches:
> >>
> >> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 ++++----------------------
> >> drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 25 ++++------
> >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
> >>
> >> Minimizing complexity being added to drivers for XDP is critical since
> >> we basically asking every driver to replicate the function. This
> >> property also should also apply to HW offloads, the more complexity we
> >> can abstract out drivers into a common backend infrastructure the
> >> better for supporting across different drivers.  
> >
> > I'm in the middle of writing/testing XDP support for the Netronome's
> > driver and generic infra is very much appreciated ;)  In my experience
> > the 50 lines of code which are required for assigning the programs and
> > freeing them aren't really a big deal, though.
> >  
> 
> 50 lines in one driver is not a big deal, 50 lines in a hundred
> drivers is! I learned this lesson in BQL which was well abstracted out
> to be minimally invasive but we still saw many issues because of the
> pecularities of different drivers.

Agreed, I just meant to say that splitting rings and rewritting RX path
to behave differently for XDP vs non-XDP case is way more brain
consuming than a bit of boilerplate code so if anyone could solve those
two it would be much appreciated :)  My main point was what I wrote
below, though.

> > Let's also separate putting xdp_prog in netdevice/napi_struct from the
> > generic hook infra.  All the simplifications to the driver AFAICS come
> > from the former.  If everyone is fine with growing napi_struct we can do
> > that but IMHO this is not an argument for the generic infra :)  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 18:50           ` Tom Herbert
  2016-09-21 18:54             ` Jakub Kicinski
@ 2016-09-21 18:58             ` Thomas Graf
  1 sibling, 0 replies; 39+ messages in thread
From: Thomas Graf @ 2016-09-21 18:58 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jakub Kicinski, Alexei Starovoitov, David S. Miller,
	Linux Kernel Network Developers, Kernel Team, Tariq Toukan,
	Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 09/21/16 at 11:50am, Tom Herbert wrote:
> 50 lines in one driver is not a big deal, 50 lines in a hundred
> drivers is! I learned this lesson in BQL which was well abstracted out
> to be minimally invasive but we still saw many issues because of the
> pecularities of different drivers.

You want to enable XDP in a hundred drivers? Are you planning to
deploy ISA NIC based ILA routers? ;-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 15:08                     ` Tom Herbert
@ 2016-09-21 19:56                       ` Jesper Dangaard Brouer
  2016-09-22 13:14                         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 39+ messages in thread
From: Jesper Dangaard Brouer @ 2016-09-21 19:56 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Thomas Graf, David S. Miller, Linux Kernel Network Developers,
	Kernel Team, Tariq Toukan, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, brouer


On Wed, 21 Sep 2016 08:08:34 -0700 Tom Herbert <tom@herbertland.com> wrote:

> On Wed, Sep 21, 2016 at 7:48 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > On 09/21/16 at 07:19am, Tom Herbert wrote:  
> >> certain design that because of constraints on one kernel interface. As
> >> a kernel developer I want flexibility on how we design and implement
> >> things!  
> >
> > Perfectly valid argument. I reviewed your ILA changes and did not
> > object to them.
> >
> >  
> >> I think there are two questions that this patch set poses for the
> >> community wrt XDP:
> >>
> >> #1: Should we allow alternate code to run in XDP other than BPF?
> >> #2: If #1 is true what is the best way to implement that?
> >>
> >> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
> >> with this RFC I'm hoping we can come the agreement on questions #1.  

I vote yes to #1.

> > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > a linked list of hook consumers.

I also worry about the performance impact of a linked list.  We should
simple benchmark it instead of discussing it! ;-)


> > Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
> > in combination with a potential XDP-nftables? We don't know yet I
> > guess.
> >  
> Right. Admittedly, I feel like we owe a bit of reciprocity to
> nftables. For ILA we are using the NF_INET_PRE_ROUTING hook with our
> own code (looks like ipvlan set nfhooks as well). This works really
> well and saves the value of early demux in ILA. Had we not had the
> ability to use nfhooks in this fashion it's likely we would have had
> to create another hook (we did try putting translation in nftables
> rules but that was too inefficient for ILA).

Thinking about it, I actually think Tom is proposing a very valid user
of the XDP hook, which is the kernel itself.  And Tom even have a real
first user ILA.  The way I read the ILA-RFC-draft[1], the XDP hook
would benefit the NVE (Network Virtualization Edge) component, which
can run separately or run on the Tenant System, where the latter case
could use XDP_PASS.

[1] https://www.ietf.org/archive/id/draft-herbert-nvo3-ila-02.txt
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-21 19:56                       ` Jesper Dangaard Brouer
@ 2016-09-22 13:14                         ` Jesper Dangaard Brouer
  2016-09-22 14:46                           ` Eric Dumazet
  0 siblings, 1 reply; 39+ messages in thread
From: Jesper Dangaard Brouer @ 2016-09-22 13:14 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Thomas Graf, David S. Miller, Linux Kernel Network Developers,
	Kernel Team, Tariq Toukan, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, brouer

On Wed, 21 Sep 2016 21:56:58 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> > > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > > a linked list of hook consumers.  
> 
> I also worry about the performance impact of a linked list.  We should
> simple benchmark it instead of discussing it! ;-)

(Note, there are some stability issue with this RFC patchset, when
removing the xdp program, that I had to workaround/patch)


I've started benchmarking this and I only see added cost of 2.89ns from
these patches, at these crazy speeds it does correspond to -485Kpps.

 I was really expecting to see a higher cost of this approach.

I tested this on two different machines. One was suppose to work with
DDIO, but I could not get DDIO working on that machine (result in max
12.7Mpps drop).  Even-though the mlx5 card does work with DDIO.  Even
removed the mlx5 and used same slot but no luck.   (A side-note: Also
measured a 16ns performance difference between which PCIe slot I'm
using).

The reason I wanted to benchmark this on a DDIO machine is, that I'm
suspecting that the added cost, could be hiding behind the cache miss.

Well, I'm running out-of-time benchmarking this stuff, I must prepare
for my Network Performance Workshop ;-)


(A side-note: my skylake motherboard also had a PCI slot, so I found an
old e1000 NIC in my garage, and it worked!)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-22 13:14                         ` Jesper Dangaard Brouer
@ 2016-09-22 14:46                           ` Eric Dumazet
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Dumazet @ 2016-09-22 14:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Tom Herbert, Thomas Graf, David S. Miller,
	Linux Kernel Network Developers, Kernel Team, Tariq Toukan,
	Brenden Blanco, Alexei Starovoitov

On Thu, 2016-09-22 at 15:14 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 21 Sep 2016 21:56:58 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > > > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > > > a linked list of hook consumers.  
> > 
> > I also worry about the performance impact of a linked list.  We should
> > simple benchmark it instead of discussing it! ;-)
> 
> (Note, there are some stability issue with this RFC patchset, when
> removing the xdp program, that I had to workaround/patch)
> 
> 
> I've started benchmarking this and I only see added cost of 2.89ns from
> these patches, at these crazy speeds it does correspond to -485Kpps.

I claim the methodology is too biased.

At full speed, all the extra code is hot in caches, and your core has
full access to memory bus anyway. Even branch predictor has fresh
information.

Now, in a mixed workload, where all cores compete to access L2/L3 and
RAM, things might be very different.

Testing icache/dcache pressure is not a matter of measuring how many
Kpps you add or remove on a hot path.

A latency test, when other cpus are busy reading/writing all over
memory, and your caches are cold, would be useful.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
                     ` (2 preceding siblings ...)
  2016-09-21  0:01   ` Alexei Starovoitov
@ 2016-09-23 11:13   ` Jamal Hadi Salim
  2016-09-23 13:00     ` Jesper Dangaard Brouer
  2016-09-23 14:14     ` Tom Herbert
  3 siblings, 2 replies; 39+ messages in thread
From: Jamal Hadi Salim @ 2016-09-23 11:13 UTC (permalink / raw)
  To: Tom Herbert, davem, netdev
  Cc: kernel-team, tariqt, bblanco, alexei.starovoitov, eric.dumazet, brouer

On 16-09-20 06:00 PM, Tom Herbert wrote:
> This patch creates an infrastructure for registering and running code at
> XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
> heavily from the techniques used by netfilter to make generic nfhooks.
>
> An XDP hook is defined by the  xdp_hook_ops. This structure contains the
> ops of an XDP hook. A pointer to this structure is passed into the XDP
> register function to set up a hook. The XDP register function mallocs
> its own xdp_hook_ops structure and copies the values from the
> xdp_hook_ops passed in. The register function also stores the pointer
> value of the xdp_hook_ops argument; this pointer is used in subsequently
> calls to XDP to identify the registered hook.
>
> The interface is defined in net/xdp.h. This includes the definition of
> xdp_hook_ops, functions to register and unregister hook ops on a device
> or individual instances of napi, and xdp_hook_run that is called by
> drivers to run the hooks.
>

Tom,
perused the thread and it seems you are serious ;->
Are we heading towards Frankenstein Avenue?
The whole point behind letting in XDP is so that _small programs_
can be written to do some quick thing. eBPF 4K program limit was
touted as the check and bound. eBPF sounded fine.
This sounds like a huge contradiction.

cheers,
jamal

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-23 11:13   ` Jamal Hadi Salim
@ 2016-09-23 13:00     ` Jesper Dangaard Brouer
  2016-09-23 14:26       ` Alexei Starovoitov
  2016-09-25 11:32       ` Jamal Hadi Salim
  2016-09-23 14:14     ` Tom Herbert
  1 sibling, 2 replies; 39+ messages in thread
From: Jesper Dangaard Brouer @ 2016-09-23 13:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet, brouer, Thomas Graf

On Fri, 23 Sep 2016 07:13:30 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> On 16-09-20 06:00 PM, Tom Herbert wrote:
> > This patch creates an infrastructure for registering and running code at
> > XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
> > heavily from the techniques used by netfilter to make generic nfhooks.
> >
> > An XDP hook is defined by the  xdp_hook_ops. This structure contains the
> > ops of an XDP hook. A pointer to this structure is passed into the XDP
> > register function to set up a hook. The XDP register function mallocs
> > its own xdp_hook_ops structure and copies the values from the
> > xdp_hook_ops passed in. The register function also stores the pointer
> > value of the xdp_hook_ops argument; this pointer is used in subsequently
> > calls to XDP to identify the registered hook.
> >
> > The interface is defined in net/xdp.h. This includes the definition of
> > xdp_hook_ops, functions to register and unregister hook ops on a device
> > or individual instances of napi, and xdp_hook_run that is called by
> > drivers to run the hooks.
> >  
> 
> Tom,
> perused the thread and it seems you are serious ;->
> Are we heading towards Frankenstein Avenue?
> The whole point behind letting in XDP is so that _small programs_
> can be written to do some quick thing. eBPF 4K program limit was
> touted as the check and bound. eBPF sounded fine.
> This sounds like a huge contradiction.
> 
> cheers,
> jamal

Hi Jamal,

I don't understand why you think this is so controversial. The way I
see it (after reading the thread): This is about allowing kernel
components to _also_ use the XDP hook.

I believe Tom have a valid use-case in ILA. The NVE (Network
Virtualization Edge) component very naturally fits in the XDP hook, as
it only need to look at the IPv6 address and do a table lookup (in a
ILA specific data structure) to see if this packet is for local stack
delivery or forward.  For forward it does not need take the performance
hit of allocating SKBs etc.

You can see it as a way to accelerate the NVE component. I can imagine
it could be done in approx 10-20 lines of code, as it would use the
existing ILA lookup function calls.

AFAIK Thomas Graf also see XDP as an acceleration for bpf_cls, but he
is lucky because his code is already eBFP based.

To support Tom's case, with eBPF, I think we would have to implement
specific eBPF helper functions that can do the ILA table lookups.  And
then need a userspace component to load the eBPF program.  Why add all
this, when we could contain all this in the kernel, and simply call
this as native C-code via the XDP hook?

Notice, this is not about throwing eBPF out.  Using eBPF is _very_
essential for XDP.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-23 11:13   ` Jamal Hadi Salim
  2016-09-23 13:00     ` Jesper Dangaard Brouer
@ 2016-09-23 14:14     ` Tom Herbert
  2016-09-25 12:29       ` Jamal Hadi Salim
  1 sibling, 1 reply; 39+ messages in thread
From: Tom Herbert @ 2016-09-23 14:14 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On Fri, Sep 23, 2016 at 4:13 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 16-09-20 06:00 PM, Tom Herbert wrote:
>>
>> This patch creates an infrastructure for registering and running code at
>> XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
>> heavily from the techniques used by netfilter to make generic nfhooks.
>>
>> An XDP hook is defined by the  xdp_hook_ops. This structure contains the
>> ops of an XDP hook. A pointer to this structure is passed into the XDP
>> register function to set up a hook. The XDP register function mallocs
>> its own xdp_hook_ops structure and copies the values from the
>> xdp_hook_ops passed in. The register function also stores the pointer
>> value of the xdp_hook_ops argument; this pointer is used in subsequently
>> calls to XDP to identify the registered hook.
>>
>> The interface is defined in net/xdp.h. This includes the definition of
>> xdp_hook_ops, functions to register and unregister hook ops on a device
>> or individual instances of napi, and xdp_hook_run that is called by
>> drivers to run the hooks.
>>
>
> Tom,
> perused the thread and it seems you are serious ;->
> Are we heading towards Frankenstein Avenue?

I'm surprised. I thought you'd like democratizing an interface. :-)

> The whole point behind letting in XDP is so that _small programs_
> can be written to do some quick thing. eBPF 4K program limit was
> touted as the check and bound. eBPF sounded fine.
> This sounds like a huge contradiction.

Jamal, thanks for the feedback.

IMO, XDP != BPF. XDP is an interface in drivers that allow raw packet
processing in the receive path,. This is akin to DPDK in the kernel.
BPF is critical use case of the interface.

As for the 4K limit that still allows for a lot of damage and abuse
that the user can do with a loadable program and, yes, code in the
kernel can do significant damage as well. However, unlike BPF programs
being set from userspace, kernel code has a well established process
for acceptance, any suggested code gets review, and if it's going
"Frankenstein Avenue" I'd expect it to be rejected. I get a little
worried that we are going to start artificially limiting what we can
do in the kernel on the basis that we don't trust kernel programmers
to be competent enough not to abuse kernel interfaces-- the fact that
we are enabling userspace to arbitrarily program the kernel but
purposely limiting what the kernel itself can do would be a huge irony
to me.

Thanks,
Tom




>
> cheers,
> jamal
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-23 13:00     ` Jesper Dangaard Brouer
@ 2016-09-23 14:26       ` Alexei Starovoitov
  2016-09-25 11:32       ` Jamal Hadi Salim
  1 sibling, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2016-09-23 14:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Jamal Hadi Salim
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet, Thomas Graf

On 9/23/16 6:00 AM, Jesper Dangaard Brouer wrote:
> To support Tom's case, with eBPF, I think we would have to implement
> specific eBPF helper functions that can do the ILA table lookups.  And
> then need a userspace component to load the eBPF program.  Why add all
> this, when we could contain all this in the kernel, and simply call
> this as native C-code via the XDP hook?

well, ILA router was written in BPF already :) Once for tc+clsact
and once for XDP. In both cases no extra bpf helpers were needed.
Due to this use case we added an optimization to be able to do
a map lookup with packet data directly (instead of copying packet
bytes to stack and pointing map_lookup helper to stack).
Looks like those ila router patches never made it to the list.
I'll dig them out and forward.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-23 13:00     ` Jesper Dangaard Brouer
  2016-09-23 14:26       ` Alexei Starovoitov
@ 2016-09-25 11:32       ` Jamal Hadi Salim
  1 sibling, 0 replies; 39+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 11:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet, Thomas Graf

On 16-09-23 09:00 AM, Jesper Dangaard Brouer wrote:
> On Fri, 23 Sep 2016 07:13:30 -0400
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>

[..]

>> Tom,
>> perused the thread and it seems you are serious ;->
>> Are we heading towards Frankenstein Avenue?
>> The whole point behind letting in XDP is so that _small programs_
>> can be written to do some quick thing. eBPF 4K program limit was
>> touted as the check and bound. eBPF sounded fine.
>> This sounds like a huge contradiction.
>>
>> cheers,
>> jamal
>
> Hi Jamal,
>
> I don't understand why you think this is so controversial. The way I
> see it (after reading the thread): This is about allowing kernel
> components to _also_ use the XDP hook.
>

The initial push was XDP to support very small programs that did
very simple things fast (to be extreme: for example running a whole
network stack is a no-no). EBPF with the 4K program limit was pointed
as the limit.
What Tom is presenting is implying this constraint is now being
removed. Thats the controversy.

cheers,
jamal

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
  2016-09-23 14:14     ` Tom Herbert
@ 2016-09-25 12:29       ` Jamal Hadi Salim
  0 siblings, 0 replies; 39+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 12:29 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer

On 16-09-23 10:14 AM, Tom Herbert wrote:
> On Fri, Sep 23, 2016 at 4:13 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 16-09-20 06:00 PM, Tom Herbert wrote:

>> Tom,
>> perused the thread and it seems you are serious ;->
>> Are we heading towards Frankenstein Avenue?
>
> I'm surprised. I thought you'd like democratizing an interface. :-)
>

I do;-> challenge here is the worry about the classical camel's nose
metaphor[1]. The original intent was for simplicity. Do one very
simple thing. Do it fast.
Note: this is code that is sitting in a critical driver path

[Democratizing the interfaces is a bigger scope discussion we can have
not just for XDP but tc, netfilter etc.]

>> The whole point behind letting in XDP is so that _small programs_
>> can be written to do some quick thing. eBPF 4K program limit was
>> touted as the check and bound. eBPF sounded fine.
>> This sounds like a huge contradiction.
>
> Jamal, thanks for the feedback.
>
> IMO, XDP != BPF. XDP is an interface in drivers that allow raw packet
> processing in the receive path,. This is akin to DPDK in the kernel.

Understood.

Most of these DPDK apps do simple things. EX: show ILA forwarding with
high throughput on a single core - which can be achieved with XDP as is
right now. With DPDK you start doing more generic things and shit starts
slowing down. We can have the best of both worlds In Linux (with 2-3
other modular interfaces above XDP).
Getting both {performance, flexibility} is hard. Add a third dimension
of usability and everything starts spiralling on the z-axis.
i.e pick one and a quarter of those 3 features; anything else is a
an art project. Example: you start adding flexibility we will no doubt
need a northbound control interface, etc.

So I see XDP as this "fast and specific blob" and things above are more
generic.
Our biggest issue, as the netmap and other folks have shown,
is the obesity of the skb. On XDP, I would bet things
get faster because there is no skb - the danger is turning it into
"move your stack to XDP to avoid skbs"

 > BPF is critical use case of the interface.
> As for the 4K limit that still allows for a lot of damage and abuse
> that the user can do with a loadable program and, yes, code in the
> kernel can do significant damage as well.

But we have at least a handwavy insurance limit.
A determined person could circumvent the limit - but it is more a
deterrent than enforcement.


> However, unlike BPF programs
> being set from userspace, kernel code has a well established process
> for acceptance, any suggested code gets review, and if it's going
> "Frankenstein Avenue" I'd expect it to be rejected. I get a little
> worried that we are going to start artificially limiting what we can
> do in the kernel on the basis that we don't trust kernel programmers
> to be competent enough not to abuse kernel interfaces-- the fact that
> we are enabling userspace to arbitrarily program the kernel but
> purposely limiting what the kernel itself can do would be a huge irony
> to me.

My concern is not incompentency. We already allow people to shoot
their little finger if they want to. And just to be clear I am
not concerned with allowing other people to save the world their
way ;->

So lets say we said ebpf was one use case:
If you wrote an XDP "app" in plain C, could you still enforce this
limit so bigcorp doesnt start building islands around Linux?
Again, the camel metaphor applies.

Note, comparison with kernel modules at tc or netfilter hooks doesnt
apply because:
a) it is a lot harder to avoid kernel interfaces (skbs etc) in those
places. XDP for example totally avoids the skb. So contributing to the
Linux base is a given with kernel modules.
b) Programs that are usable as kernel modules tend to be geared towards
inclusion into the kernel because of the complexity. It is more
maintainance to keep them off tree.
XDP is such a  simple interface that the opportunity to move a whole
DPDK-like industry into the kernel is huge (at the detriment of the
Linux kernel networking).

cheers,
jamal

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-09-25 12:29 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 22:00 [PATCH RFC 0/3] xdp: Generalize XDP Tom Herbert
2016-09-20 22:00 ` [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP Tom Herbert
2016-09-20 22:37   ` Eric Dumazet
2016-09-20 22:40     ` Tom Herbert
2016-09-20 22:44   ` Thomas Graf
2016-09-20 22:49     ` Tom Herbert
2016-09-20 23:09       ` Thomas Graf
2016-09-20 23:18         ` Tom Herbert
2016-09-20 23:43           ` Thomas Graf
2016-09-20 23:59             ` Tom Herbert
2016-09-21  0:13               ` Alexei Starovoitov
2016-09-21 11:55               ` Thomas Graf
2016-09-21 14:19                 ` Tom Herbert
2016-09-21 14:48                   ` Thomas Graf
2016-09-21 15:08                     ` Tom Herbert
2016-09-21 19:56                       ` Jesper Dangaard Brouer
2016-09-22 13:14                         ` Jesper Dangaard Brouer
2016-09-22 14:46                           ` Eric Dumazet
2016-09-21 15:39                   ` Alexei Starovoitov
2016-09-21 17:26                 ` Jakub Kicinski
2016-09-20 23:22         ` Daniel Borkmann
2016-09-21  0:01   ` Alexei Starovoitov
2016-09-21  6:39     ` Jesper Dangaard Brouer
2016-09-21  8:42       ` Daniel Borkmann
2016-09-21 15:44       ` Alexei Starovoitov
2016-09-21 17:26     ` Jakub Kicinski
2016-09-21 17:39       ` Tom Herbert
2016-09-21 18:45         ` Jakub Kicinski
2016-09-21 18:50           ` Tom Herbert
2016-09-21 18:54             ` Jakub Kicinski
2016-09-21 18:58             ` Thomas Graf
2016-09-23 11:13   ` Jamal Hadi Salim
2016-09-23 13:00     ` Jesper Dangaard Brouer
2016-09-23 14:26       ` Alexei Starovoitov
2016-09-25 11:32       ` Jamal Hadi Salim
2016-09-23 14:14     ` Tom Herbert
2016-09-25 12:29       ` Jamal Hadi Salim
2016-09-20 22:00 ` [PATCH RFC 2/3] mlx4: Change XDP/BPF to use generic XDP infrastructure Tom Herbert
2016-09-20 22:00 ` [PATCH RFC 3/3] netdevice: Remove obsolete xdp_netdev_command Tom Herbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.