netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
@ 2018-12-07 11:44 Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 1/7] xsk: simplify AF_XDP socket teardown Björn Töpel
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

Hi!

This patch set adds support for a new XDP socket bind option,
XDP_ATTACH.

The rationale behind attach is performance and ease of use. Many XDP
socket users just need a simple way of creating/binding a socket and
receiving frames right away without loading an XDP program.

XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
is a kernel provided XDP program that is installed to the netdev when
XDP_ATTACH is being passed as a bind() flag.

The builtin program is the simplest program possible to redirect a
frame to an attached socket. In restricted C it would look like this:
    
  SEC("xdp")
  int xdp_prog(struct xdp_md *ctx)
  {
        return bpf_xsk_redirect(ctx);
  }
    
The builtin program loaded via XDP_ATTACH behaves, from an
install-to-netdev/uninstall-from-netdev point of view, differently
from regular XDP programs. The easiest way to look at it is as a
2-level hierarchy, where regular XDP programs has precedence over the
builtin one.
    
If no regular XDP program is installed to the netdev, the builtin will
be install. If the builtin program is installed, and a regular is
installed, regular XDP program will have precedence over the builtin
one.
    
Further, if a regular program is installed, and later removed, the
builtin one will automatically be installed.
    
The sxdp_flags field of struct sockaddr_xdp gets two new options
XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
corresponding XDP netlink install flags.

The builtin XDP program functionally adds even more complexity to the
already hard to read dev_change_xdp_fd. Maybe it would be simpler to
store the program in the struct net_device together with the install
flags instead of calling the ndo_bpf multiple times?

The outline of the series is as following:
  patch 1-2: Introduce the first part of XDP_ATTACH, simply adding
             the socket to the netdev structure.
  patch 3:   Add a new BPF function, bpf_xsk_redirect, that 
             redirects a frame to an attached socket.
  patch 4-5: Preparatory commits for built in BPF programs
  patch 6:   Make XDP_ATTACH load a builtin XDP program
  patch 7:   Extend the samples application with XDP_ATTACH
             support

Patch 1 through 3 gives the performance boost and make it possible to
use AF_XDP sockets without an XSKMAP, but still requires an explicit
XDP program to be loaded.

Patch 4 through 6 make it possible to use XDP socket without explictly
loading an XDP program.

The performance numbers for rxdrop (Intel(R) Xeon(R) Gold 6154 CPU @
3.00GHz):

XDP_SKB:
XSKMAP:     2.8 Mpps
XDP_ATTACH: 2.9 Mpps

XDP_DRV - copy:
XSKMAP:     8.5 Mpps
XDP_ATTACH: 9.3 Mpps

XDP_DRV - zero-copy:
XSKMAP:     15.1 Mpps
XDP_ATTACH: 17.3 Mpps

Thanks!
Björn


Björn Töpel (7):
  xsk: simplify AF_XDP socket teardown
  xsk: add XDP_ATTACH bind() flag
  bpf: add bpf_xsk_redirect function
  bpf: prepare for builtin bpf program
  bpf: add function to load builtin BPF program
  xsk: load a builtin XDP program on XDP_ATTACH
  samples: bpf: add support for XDP_ATTACH to xdpsock

 include/linux/bpf.h         |   2 +
 include/linux/filter.h      |   4 +
 include/linux/netdevice.h   |  11 +++
 include/net/xdp_sock.h      |   2 +
 include/trace/events/xdp.h  |  61 +++++++++++++++
 include/uapi/linux/bpf.h    |  14 +++-
 include/uapi/linux/if_xdp.h |   9 ++-
 kernel/bpf/syscall.c        |  91 ++++++++++++++--------
 net/core/dev.c              |  84 +++++++++++++++++++--
 net/core/filter.c           | 100 ++++++++++++++++++++++++
 net/xdp/xsk.c               | 146 +++++++++++++++++++++++++++++-------
 samples/bpf/xdpsock_user.c  | 108 ++++++++++++++++----------
 12 files changed, 524 insertions(+), 108 deletions(-)

-- 
2.19.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 1/7] xsk: simplify AF_XDP socket teardown
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
@ 2018-12-07 11:44 ` Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 2/7] xsk: add XDP_ATTACH bind() flag Björn Töpel
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

Prior this commit, when the struct socket object was being released,
the UMEM did not have its reference count decreased. Instead, this was
done in the struct sock sk_destruct function.

There is no reason to keep the UMEM reference around when the socket
is being orphaned, so in this patch the xdp_put_mem is called in the
xsk_release function. This results in that the xsk_destruct function
can be removed!

Note that, it still holds that a struct xsk_sock reference might still
linger in the XSKMAP after the UMEM is released, e.g. if a user does
not clear the XSKMAP prior to closing the process. This sock will be
in a "released" zombie like state, until the XSKMAP is removed.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 07156f43d295..a03268454a27 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -366,6 +366,7 @@ static int xsk_release(struct socket *sock)
 
 	xskq_destroy(xs->rx);
 	xskq_destroy(xs->tx);
+	xdp_put_umem(xs->umem);
 
 	sock_orphan(sk);
 	sock->sk = NULL;
@@ -713,18 +714,6 @@ static const struct proto_ops xsk_proto_ops = {
 	.sendpage	= sock_no_sendpage,
 };
 
-static void xsk_destruct(struct sock *sk)
-{
-	struct xdp_sock *xs = xdp_sk(sk);
-
-	if (!sock_flag(sk, SOCK_DEAD))
-		return;
-
-	xdp_put_umem(xs->umem);
-
-	sk_refcnt_debug_dec(sk);
-}
-
 static int xsk_create(struct net *net, struct socket *sock, int protocol,
 		      int kern)
 {
@@ -751,9 +740,6 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
 
 	sk->sk_family = PF_XDP;
 
-	sk->sk_destruct = xsk_destruct;
-	sk_refcnt_debug_inc(sk);
-
 	sock_set_flag(sk, SOCK_RCU_FREE);
 
 	xs = xdp_sk(sk);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 2/7] xsk: add XDP_ATTACH bind() flag
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 1/7] xsk: simplify AF_XDP socket teardown Björn Töpel
@ 2018-12-07 11:44 ` Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 3/7] bpf: add bpf_xsk_redirect function Björn Töpel
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

In this commit the XDP_ATTACH bind() flag is introduced. When an XDP
socket is bound with this flag set, the socket will be associated with
a certain netdev Rx queue. The idea is that the XDP socket users do
not have to deal with the XSKMAP, or even an XDP program. Instead
XDP_ATTACH will "attach" an XDP socket to a queue, load a builtin XDP
program that forwards all received packets from the attached queue to
the socket.

An XDP socket bound with this option performs better, since the BPF
program is smaller, and the kernel code-path also has fewer
instructions.

This commit only introduces the first part of XDP_ATTACH, namely
associating the XDP socket to a netdev Rx queue.

To redirect XDP frames to an attached socket, the XDP program must use
the bpf_xsk_redirect that will be introduced in the next commit.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/netdevice.h   |  1 +
 include/net/xdp_sock.h      |  2 ++
 include/uapi/linux/if_xdp.h |  1 +
 net/xdp/xsk.c               | 50 +++++++++++++++++++++++++++++--------
 4 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 94fb2e12f117..a6cc68d2504c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -743,6 +743,7 @@ struct netdev_rx_queue {
 	struct xdp_rxq_info		xdp_rxq;
 #ifdef CONFIG_XDP_SOCKETS
 	struct xdp_umem                 *umem;
+	struct xdp_sock			*xsk;
 #endif
 } ____cacheline_aligned_in_smp;
 
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 13acb9803a6d..95315eb0410a 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -72,7 +72,9 @@ struct xdp_sock {
 
 struct xdp_buff;
 #ifdef CONFIG_XDP_SOCKETS
+int xsk_generic_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
+int xsk_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
 int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
 void xsk_flush(struct xdp_sock *xs);
 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index caed8b1614ff..bd76235c2749 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -16,6 +16,7 @@
 #define XDP_SHARED_UMEM	(1 << 0)
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
+#define XDP_ATTACH	(1 << 3)
 
 struct sockaddr_xdp {
 	__u16 sxdp_family;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a03268454a27..1eff7ac8596d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -100,17 +100,20 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	return err;
 }
 
-int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+int xsk_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
-	u32 len;
+	u32 len = xdp->data_end - xdp->data;
+
+	return (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) ?
+		__xsk_rcv_zc(xs, xdp, len) : __xsk_rcv(xs, xdp, len);
+}
 
+int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
-	len = xdp->data_end - xdp->data;
-
-	return (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) ?
-		__xsk_rcv_zc(xs, xdp, len) : __xsk_rcv(xs, xdp, len);
+	return xsk_attached_rcv(xs, xdp);
 }
 
 void xsk_flush(struct xdp_sock *xs)
@@ -119,7 +122,7 @@ void xsk_flush(struct xdp_sock *xs)
 	xs->sk.sk_data_ready(&xs->sk);
 }
 
-int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+int xsk_generic_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
@@ -127,9 +130,6 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	u64 addr;
 	int err;
 
-	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
-		return -EINVAL;
-
 	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
@@ -152,6 +152,14 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	return err;
 }
 
+int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
+		return -EINVAL;
+
+	return xsk_generic_attached_rcv(xs, xdp);
+}
+
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries)
 {
 	xskq_produce_flush_addr_n(umem->cq, nb_entries);
@@ -339,6 +347,18 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
 	return 0;
 }
 
+static void xsk_detach(struct xdp_sock *xs)
+{
+	WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL);
+}
+
+static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid)
+{
+	WRITE_ONCE(dev->_rx[qid].xsk, xs);
+
+	return 0;
+}
+
 static int xsk_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
@@ -359,6 +379,7 @@ static int xsk_release(struct socket *sock)
 
 		/* Wait for driver to stop using the xdp socket. */
 		xdp_del_sk_umem(xs->umem, xs);
+		xsk_detach(xs);
 		xs->dev = NULL;
 		synchronize_net();
 		dev_put(dev);
@@ -432,7 +453,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		struct xdp_sock *umem_xs;
 		struct socket *sock;
 
-		if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY)) {
+		if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) ||
+		    (flags & XDP_ATTACH)) {
 			/* Cannot specify flags for shared sockets. */
 			err = -EINVAL;
 			goto out_unlock;
@@ -478,6 +500,12 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
 			goto out_unlock;
+
+		if (flags & XDP_ATTACH) {
+			err = xsk_attach(xs, dev, qid);
+			if (err)
+				goto out_unlock;
+		}
 	}
 
 	xs->dev = dev;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 3/7] bpf: add bpf_xsk_redirect function
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 1/7] xsk: simplify AF_XDP socket teardown Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 2/7] xsk: add XDP_ATTACH bind() flag Björn Töpel
@ 2018-12-07 11:44 ` Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 4/7] bpf: prepare for builtin bpf program Björn Töpel
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

The bpf_xsk_redirect function is a new redirect bpf function, in
addition to bpf_redirect/bpf_redirect_map. If an XDP socket has been
attached to a netdev Rx queue via the XDP_ATTACH bind() option and
bpf_xsk_redirect is called, the packet will be redirected to the
attached socket.

The bpf_xsk_redirect function returns XDP_REDIRECT if there is a
socket attached to the originated queue, otherwise XDP_PASS.

This commit also adds the corresponding trace points for the redirect
call.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/filter.h     |   4 ++
 include/trace/events/xdp.h |  61 ++++++++++++++++++++++
 include/uapi/linux/bpf.h   |  14 +++++-
 net/core/filter.c          | 100 +++++++++++++++++++++++++++++++++++++
 4 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index d16deead65c6..691b5c1003c8 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -525,6 +525,10 @@ struct bpf_redirect_info {
 	u32 flags;
 	struct bpf_map *map;
 	struct bpf_map *map_to_flush;
+#ifdef CONFIG_XDP_SOCKETS
+	struct xdp_sock *xsk;
+	struct xdp_sock *xsk_to_flush;
+#endif
 	u32 kern_flags;
 };
 
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index e95cb86b65cf..30f399bd462b 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -158,6 +158,67 @@ struct _bpf_dtab_netdev {
 	 trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map),	\
 				    err, map, idx)
 
+DECLARE_EVENT_CLASS(xsk_redirect_template,
+
+	TP_PROTO(const struct net_device *dev,
+		 const struct bpf_prog *xdp,
+		 int err,
+		 struct xdp_buff *xbuff),
+
+	TP_ARGS(dev, xdp, err, xbuff),
+
+	TP_STRUCT__entry(
+		__field(int, prog_id)
+		__field(u32, act)
+		__field(int, ifindex)
+		__field(int, err)
+		__field(u32, queue_index)
+		__field(enum xdp_mem_type, mem_type)
+	),
+
+	TP_fast_assign(
+		__entry->prog_id	= xdp->aux->id;
+		__entry->act		= XDP_REDIRECT;
+		__entry->ifindex	= dev->ifindex;
+		__entry->err		= err;
+		__entry->queue_index	= xbuff->rxq->queue_index;
+		__entry->mem_type	= xbuff->rxq->mem.type;
+	),
+
+	TP_printk("prog_id=%d action=%s ifindex=%d err=%d queue_index=%d"
+		  " mem_type=%d",
+		  __entry->prog_id,
+		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+		  __entry->ifindex,
+		  __entry->err,
+		  __entry->queue_index,
+		  __entry->mem_type)
+);
+
+DEFINE_EVENT(xsk_redirect_template, xsk_redirect,
+	TP_PROTO(const struct net_device *dev,
+		 const struct bpf_prog *xdp,
+		 int err,
+		 struct xdp_buff *xbuff),
+
+	TP_ARGS(dev, xdp, err, xbuff)
+);
+
+DEFINE_EVENT(xsk_redirect_template, xsk_redirect_err,
+	TP_PROTO(const struct net_device *dev,
+		 const struct bpf_prog *xdp,
+		 int err,
+		 struct xdp_buff *xbuff),
+
+	TP_ARGS(dev, xdp, err, xbuff)
+);
+
+#define _trace_xsk_redirect(dev, xdp, xbuff)		\
+	 trace_xsk_redirect(dev, xdp, 0, xbuff)
+
+#define _trace_xsk_redirect_err(dev, xdp, xbuff, err)	\
+	 trace_xsk_redirect_err(dev, xdp, err, xbuff)
+
 TRACE_EVENT(xdp_cpumap_kthread,
 
 	TP_PROTO(int map_id, unsigned int processed,  unsigned int drops,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a84fd232d934..2912d87a39ba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2298,6 +2298,17 @@ union bpf_attr {
  *		payload and/or *pop* value being to large.
  *	Return
  *		0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xsk_redirect(struct xdp_buff *xdp_md)
+ *	 Description
+ *		Redirect the packet to the attached XDP socket, if any.
+ *		An XDP socket can be attached to a network interface Rx
+ *		queue by passing the XDP_ATTACH option at bind point of
+ *		the socket.
+ *
+ *	Return
+ *		**XDP_REDIRECT** if there is an XDP socket attached to the Rx
+ *		queue receiving the frame, otherwise **XDP_PASS**.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2391,7 +2402,8 @@ union bpf_attr {
 	FN(map_pop_elem),		\
 	FN(map_peek_elem),		\
 	FN(msg_push_data),		\
-	FN(msg_pop_data),
+	FN(msg_pop_data),		\
+	FN(xsk_redirect),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 3d54af4c363d..86c5fe5a9ec0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3415,6 +3415,17 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 	return 0;
 }
 
+static void xdp_do_flush_xsk(struct bpf_redirect_info *ri)
+{
+#ifdef CONFIG_XDP_SOCKETS
+	struct xdp_sock *xsk = ri->xsk_to_flush;
+
+	ri->xsk_to_flush = NULL;
+	if (xsk)
+		xsk_flush(xsk);
+#endif
+}
+
 void xdp_do_flush_map(void)
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
@@ -3436,6 +3447,8 @@ void xdp_do_flush_map(void)
 			break;
 		}
 	}
+
+	xdp_do_flush_xsk(ri);
 }
 EXPORT_SYMBOL_GPL(xdp_do_flush_map);
 
@@ -3501,6 +3514,30 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 	return err;
 }
 
+#ifdef CONFIG_XDP_SOCKETS
+static int xdp_do_xsk_redirect(struct net_device *dev, struct xdp_buff *xdp,
+			       struct bpf_prog *xdp_prog,
+			       struct bpf_redirect_info *ri)
+{
+	struct xdp_sock *xsk = ri->xsk;
+	int err;
+
+	ri->xsk = NULL;
+	ri->xsk_to_flush = xsk;
+
+	err = xsk_attached_rcv(xsk, xdp);
+	if (unlikely(err))
+		goto err;
+
+	_trace_xsk_redirect(dev, xdp_prog, xdp);
+	return 0;
+
+err:
+	_trace_xsk_redirect_err(dev, xdp_prog, xdp, err);
+	return err;
+}
+#endif /* CONFIG_XDP_SOCKETS */
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
@@ -3510,6 +3547,10 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 	if (likely(map))
 		return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri);
 
+#ifdef CONFIG_XDP_SOCKETS
+	if (ri->xsk)
+		return xdp_do_xsk_redirect(dev, xdp, xdp_prog, ri);
+#endif
 	return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri);
 }
 EXPORT_SYMBOL_GPL(xdp_do_redirect);
@@ -3560,6 +3601,33 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
 	return err;
 }
 
+#ifdef CONFIG_XDP_SOCKETS
+static int xdp_do_generic_xsk_redirect(struct net_device *dev,
+				       struct xdp_buff *xdp,
+				       struct bpf_prog *xdp_prog,
+				       struct sk_buff *skb)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct xdp_sock *xsk = ri->xsk;
+	int err;
+
+	ri->xsk = NULL;
+	ri->xsk_to_flush = NULL;
+
+	err = xsk_generic_attached_rcv(xsk, xdp);
+	if (err)
+		goto err;
+
+	consume_skb(skb);
+	_trace_xsk_redirect(dev, xdp_prog, xdp);
+	return 0;
+
+err:
+	_trace_xsk_redirect_err(dev, xdp_prog, xdp, err);
+	return err;
+}
+#endif /* CONFIG_XDP_SOCKETS */
+
 int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 			    struct xdp_buff *xdp, struct bpf_prog *xdp_prog)
 {
@@ -3572,6 +3640,11 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 	if (map)
 		return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog,
 						   map);
+#ifdef CONFIG_XDP_SOCKETS
+	if (ri->xsk)
+		return xdp_do_generic_xsk_redirect(dev, xdp, xdp_prog, skb);
+#endif
+
 	ri->ifindex = 0;
 	fwd = dev_get_by_index_rcu(dev_net(dev), index);
 	if (unlikely(!fwd)) {
@@ -3639,6 +3712,29 @@ static const struct bpf_func_proto bpf_xdp_redirect_map_proto = {
 	.arg3_type      = ARG_ANYTHING,
 };
 
+#ifdef CONFIG_XDP_SOCKETS
+BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct xdp_sock *xsk;
+
+	xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
+	if (xsk) {
+		ri->xsk = xsk;
+		return XDP_REDIRECT;
+	}
+
+	return XDP_PASS;
+}
+
+static const struct bpf_func_proto bpf_xdp_xsk_redirect_proto = {
+	.func           = bpf_xdp_xsk_redirect,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+#endif /* CONFIG_XDP_SOCKETS */
+
 static unsigned long bpf_skb_copy(void *dst_buff, const void *skb,
 				  unsigned long off, unsigned long len)
 {
@@ -5510,6 +5606,10 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_sk_lookup_tcp_proto;
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
+#endif
+#ifdef CONFIG_XDP_SOCKETS
+	case BPF_FUNC_xsk_redirect:
+		return &bpf_xdp_xsk_redirect_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 4/7] bpf: prepare for builtin bpf program
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
                   ` (2 preceding siblings ...)
  2018-12-07 11:44 ` [PATCH bpf-next 3/7] bpf: add bpf_xsk_redirect function Björn Töpel
@ 2018-12-07 11:44 ` Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 5/7] bpf: add function to load builtin BPF program Björn Töpel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

Break up bpf_prog_load into one function that allocates, initializes
and verifies a bpf program, and one that allocates a file descriptor.

The former function will be used in a later commit to load a builtin
BPF program.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 kernel/bpf/syscall.c | 59 ++++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index aa05aa38f4a8..ee1328625330 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1441,7 +1441,8 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
 /* last field in 'union bpf_attr' used by this command */
 #define	BPF_PROG_LOAD_LAST_FIELD func_info_cnt
 
-static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
+static struct bpf_prog *__bpf_prog_load(union bpf_attr *attr,
+					union bpf_attr __user *uattr)
 {
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog;
@@ -1450,45 +1451,45 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	bool is_gpl;
 
 	if (CHECK_ATTR(BPF_PROG_LOAD))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
 	    !capable(CAP_SYS_ADMIN))
-		return -EPERM;
+		return ERR_PTR(-EPERM);
 
 	/* copy eBPF program license from user space */
 	if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
 			      sizeof(license) - 1) < 0)
-		return -EFAULT;
+		return ERR_PTR(-EFAULT);
 	license[sizeof(license) - 1] = 0;
 
 	/* eBPF programs must be GPL compatible to use GPL-ed functions */
 	is_gpl = license_is_gpl_compatible(license);
 
 	if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS)
-		return -E2BIG;
+		return ERR_PTR(-E2BIG);
 
 	if (type == BPF_PROG_TYPE_KPROBE &&
 	    attr->kern_version != LINUX_VERSION_CODE)
-		return -EINVAL;
+		return  ERR_PTR(-EINVAL);
 
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
 	    !capable(CAP_SYS_ADMIN))
-		return -EPERM;
+		return ERR_PTR(-EPERM);
 
 	bpf_prog_load_fixup_attach_type(attr);
 	if (bpf_prog_load_check_attach_type(type, attr->expected_attach_type))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	/* plain bpf_prog allocation */
 	prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
 	if (!prog)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	prog->expected_attach_type = attr->expected_attach_type;
 
@@ -1544,20 +1545,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	if (err)
 		goto free_used_maps;
 
-	err = bpf_prog_new_fd(prog);
-	if (err < 0) {
-		/* failed to allocate fd.
-		 * bpf_prog_put() is needed because the above
-		 * bpf_prog_alloc_id() has published the prog
-		 * to the userspace and the userspace may
-		 * have refcnt-ed it through BPF_PROG_GET_FD_BY_ID.
-		 */
-		bpf_prog_put(prog);
-		return err;
-	}
-
 	bpf_prog_kallsyms_add(prog);
-	return err;
+	return prog;
 
 free_used_maps:
 	kvfree(prog->aux->func_info);
@@ -1570,7 +1559,29 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	security_bpf_prog_free(prog->aux);
 free_prog_nouncharge:
 	bpf_prog_free(prog);
-	return err;
+	return ERR_PTR(err);
+}
+
+static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+	struct bpf_prog *prog = __bpf_prog_load(attr, uattr);
+	int fd;
+
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	fd = bpf_prog_new_fd(prog);
+	if (fd < 0) {
+		/* failed to allocate fd.
+		 * bpf_prog_put() is needed because the above
+		 * bpf_prog_alloc_id() has published the prog
+		 * to the userspace and the userspace may
+		 * have refcnt-ed it through BPF_PROG_GET_FD_BY_ID.
+		 */
+		bpf_prog_put(prog);
+	}
+
+	return fd;
 }
 
 #define BPF_OBJ_LAST_FIELD file_flags
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 5/7] bpf: add function to load builtin BPF program
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
                   ` (3 preceding siblings ...)
  2018-12-07 11:44 ` [PATCH bpf-next 4/7] bpf: prepare for builtin bpf program Björn Töpel
@ 2018-12-07 11:44 ` Björn Töpel
  2018-12-07 11:44 ` [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH Björn Töpel
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

The added bpf_prog_load_builtin can be used to load and verify a BPF
program that originates from the kernel. We call this a "builtin BPF
program". A builtin program can be used for convenience, e.g. it
allows for the kernel to use the bpf infrastructure for internal
tasks.

This functionality will be used by AF_XDP sockets in a later commit.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/syscall.c | 32 ++++++++++++++++++++++++--------
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..e810bfeb6239 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -563,6 +563,8 @@ static inline int bpf_map_attr_numa_node(const union bpf_attr *attr)
 struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type);
 int array_map_alloc_check(union bpf_attr *attr);
 
+struct bpf_prog *bpf_prog_load_builtin(union bpf_attr *attr);
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ee1328625330..323831e1a1e2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1461,10 +1461,16 @@ static struct bpf_prog *__bpf_prog_load(union bpf_attr *attr,
 	    !capable(CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	/* copy eBPF program license from user space */
-	if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
-			      sizeof(license) - 1) < 0)
-		return ERR_PTR(-EFAULT);
+	/* NB! If uattr is NULL, a builtin BPF is being loaded. */
+	if (uattr) {
+		/* copy eBPF program license from user space */
+		if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
+				      sizeof(license) - 1) < 0)
+			return ERR_PTR(-EFAULT);
+	} else {
+		strncpy(license, (const char *)(unsigned long)attr->license,
+			sizeof(license) - 1);
+	}
 	license[sizeof(license) - 1] = 0;
 
 	/* eBPF programs must be GPL compatible to use GPL-ed functions */
@@ -1505,10 +1511,15 @@ static struct bpf_prog *__bpf_prog_load(union bpf_attr *attr,
 
 	prog->len = attr->insn_cnt;
 
-	err = -EFAULT;
-	if (copy_from_user(prog->insns, u64_to_user_ptr(attr->insns),
-			   bpf_prog_insn_size(prog)) != 0)
-		goto free_prog;
+	if (uattr) {
+		err = -EFAULT;
+		if (copy_from_user(prog->insns, u64_to_user_ptr(attr->insns),
+				   bpf_prog_insn_size(prog)) != 0)
+			goto free_prog;
+	} else {
+		memcpy(prog->insns, (void *)(unsigned long)attr->insns,
+		       bpf_prog_insn_size(prog));
+	}
 
 	prog->orig_prog = NULL;
 	prog->jited = 0;
@@ -1584,6 +1595,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	return fd;
 }
 
+struct bpf_prog *bpf_prog_load_builtin(union bpf_attr *attr)
+{
+	return __bpf_prog_load(attr, NULL);
+}
+
 #define BPF_OBJ_LAST_FIELD file_flags
 
 static int bpf_obj_pin(const union bpf_attr *attr)
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
                   ` (4 preceding siblings ...)
  2018-12-07 11:44 ` [PATCH bpf-next 5/7] bpf: add function to load builtin BPF program Björn Töpel
@ 2018-12-07 11:44 ` Björn Töpel
  2018-12-10  2:17   ` Jakub Kicinski
  2018-12-07 13:42 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 11:44 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel, netdev
  Cc: Björn Töpel, brouer, u9012063, qi.z.zhang

From: Björn Töpel <bjorn.topel@intel.com>

This commit extends the XDP_ATTACH bind option by loading a builtin
XDP program.

The builtin program is the simplest program possible to redirect a
frame to an attached socket. In restricted C it would look like this:

  SEC("xdp")
  int xdp_prog(struct xdp_md *ctx)
  {
  	return bpf_xsk_redirect(ctx);
  }

For many XDP socket users, this program would be the most common one.

The builtin program loaded via XDP_ATTACH behaves, from an
install-to-netdev/uninstall-from-netdev point of view, different from
regular XDP programs. The easiest way to look at it is as a 2-level
hierarchy, where regular XDP programs has precedence over the builtin
one.

If no regular XDP program is installed to the netdev, the builtin will
be install. If the builtin program is installed, and a regular is
installed, the regular XDP will have precedence over the builtin one.

Further, if a regular program is installed, and later removed, the
builtin one will automatically be installed.

The sxdp_flags field of struct sockaddr_xdp gets two new options
XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
corresponding XDP netlink install flags.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/netdevice.h   | 10 +++++
 include/uapi/linux/if_xdp.h | 10 +++--
 net/core/dev.c              | 84 ++++++++++++++++++++++++++++++++---
 net/xdp/xsk.c               | 88 +++++++++++++++++++++++++++++++++++--
 4 files changed, 179 insertions(+), 13 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a6cc68d2504c..a3094f1a9fcb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2039,6 +2039,13 @@ struct net_device {
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
 	unsigned		wol_enabled:1;
+
+#ifdef CONFIG_XDP_SOCKETS
+	struct bpf_prog		*xsk_prog;
+	u32			xsk_prog_flags;
+	bool			xsk_prog_running;
+	int			xsk_prog_ref;
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -3638,6 +3645,9 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
+int dev_xsk_prog_install(struct net_device *dev, struct bpf_prog *prog,
+			 u32 flags);
+void dev_xsk_prog_uninstall(struct net_device *dev);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, u32 flags);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index bd76235c2749..b8fb3200f640 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -13,10 +13,12 @@
 #include <linux/types.h>
 
 /* Options for the sxdp_flags field */
-#define XDP_SHARED_UMEM	(1 << 0)
-#define XDP_COPY	(1 << 1) /* Force copy-mode */
-#define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
-#define XDP_ATTACH	(1 << 3)
+#define XDP_SHARED_UMEM		(1 << 0)
+#define XDP_COPY		(1 << 1) /* Force copy-mode */
+#define XDP_ZEROCOPY		(1 << 2) /* Force zero-copy mode */
+#define XDP_ATTACH		(1 << 3)
+#define XDP_BUILTIN_SKB_MODE	(1 << 4)
+#define XDP_BUILTIN_DRV_MODE	(1 << 5)
 
 struct sockaddr_xdp {
 	__u16 sxdp_family;
diff --git a/net/core/dev.c b/net/core/dev.c
index abe50c424b29..0a1c30da2f87 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7879,6 +7879,70 @@ static void dev_xdp_uninstall(struct net_device *dev)
 					NULL));
 }
 
+#ifdef CONFIG_XDP_SOCKETS
+int dev_xsk_prog_install(struct net_device *dev, struct bpf_prog *prog,
+			 u32 flags)
+{
+	ASSERT_RTNL();
+
+	if (dev->xsk_prog) {
+		if (prog != dev->xsk_prog)
+			return -EINVAL;
+		if (flags && flags != dev->xsk_prog_flags)
+			return -EINVAL;
+	}
+
+	if (dev->xsk_prog) {
+		dev->xsk_prog_ref++;
+		return 0;
+	}
+
+	dev->xsk_prog = bpf_prog_inc(prog);
+	dev->xsk_prog_flags = flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
+	dev->xsk_prog_ref = 1;
+	(void)dev_change_xdp_fd(dev, NULL, -1, dev->xsk_prog_flags);
+	return 0;
+}
+
+void dev_xsk_prog_uninstall(struct net_device *dev)
+{
+	ASSERT_RTNL();
+
+	if (--dev->xsk_prog_ref == 0) {
+		bpf_prog_put(dev->xsk_prog);
+		dev->xsk_prog = NULL;
+		if (dev->xsk_prog_running)
+			(void)dev_change_xdp_fd(dev, NULL, -1,
+						dev->xsk_prog_flags);
+		dev->xsk_prog_flags = 0;
+		dev->xsk_prog_running = false;
+	}
+}
+#endif
+
+static void dev_xsk_prog_pre_load(struct net_device *dev, int fd,
+				  struct bpf_prog **prog, u32 *flags)
+{
+#ifdef CONFIG_XDP_SOCKETS
+	if (fd >= 0)
+		return;
+
+	if (dev->xsk_prog) {
+		*prog = bpf_prog_inc(dev->xsk_prog);
+		*flags = dev->xsk_prog_flags;
+		dev->xsk_prog_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+	}
+#endif
+}
+
+static void dev_xsk_prog_post_load(struct net_device *dev, int err,
+				   struct bpf_prog *prog)
+{
+#ifdef CONFIG_XDP_SOCKETS
+	dev->xsk_prog_running = prog && prog == dev->xsk_prog && err >= 0;
+#endif
+}
+
 /**
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
@@ -7909,16 +7973,25 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (bpf_op == bpf_chk)
 		bpf_chk = generic_xdp_install;
 
-	if (fd >= 0) {
+	dev_xsk_prog_pre_load(dev, fd, &prog, &flags);
+
+	if (fd >= 0 || prog) {
 		if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
-		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
+			if (prog)
+				bpf_prog_put(prog);
 			return -EEXIST;
+		}
 		if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) &&
-		    __dev_xdp_query(dev, bpf_op, query))
+		    __dev_xdp_query(dev, bpf_op, query)) {
+			if (prog)
+				bpf_prog_put(prog);
 			return -EBUSY;
+		}
 
-		prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP,
-					     bpf_op == ops->ndo_bpf);
+		if (!prog)
+			prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP,
+						     bpf_op == ops->ndo_bpf);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
@@ -7934,6 +8007,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
+	dev_xsk_prog_post_load(dev, err, prog);
 	return err;
 }
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 1eff7ac8596d..0d15d25694c4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -30,6 +30,15 @@
 
 #define TX_BATCH_SIZE 16
 
+static const struct bpf_insn xsk_redirect_prog_insn[] = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_xsk_redirect),
+	BPF_EXIT_INSN(),
+};
+
+/* Synchronized via rtnl lock */
+static struct bpf_prog *xsk_redirect_prog; /*builtin XDP program */
+static int xsk_redirect_prog_ref;
+
 static struct xdp_sock *xdp_sk(struct sock *sk)
 {
 	return (struct xdp_sock *)sk;
@@ -347,16 +356,87 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
 	return 0;
 }
 
+static struct bpf_prog *xsk_builtin_prog_get(void)
+{
+	union bpf_attr attr = {};
+	struct bpf_prog *prog;
+
+	if (xsk_redirect_prog) {
+		xsk_redirect_prog_ref++;
+		return xsk_redirect_prog;
+	}
+
+	attr.prog_type = BPF_PROG_TYPE_XDP;
+	attr.insn_cnt = ARRAY_SIZE(xsk_redirect_prog_insn);
+	attr.insns = (uintptr_t)xsk_redirect_prog_insn;
+	attr.license = (uintptr_t)"GPL";
+	memcpy(attr.prog_name, "AF_XDP_BUILTIN",
+	       min_t(size_t, sizeof("AF_XDP_BUILTIN") - 1,
+		     BPF_OBJ_NAME_LEN - 1));
+
+	prog = bpf_prog_load_builtin(&attr);
+	if (IS_ERR(prog)) {
+		WARN(1, "Failed (%d) to load builtin XDP program!\n",
+		     (int)PTR_ERR(prog));
+		return prog;
+	}
+
+	xsk_redirect_prog = prog;
+	xsk_redirect_prog_ref = 1;
+	return xsk_redirect_prog;
+}
+
+static void xsk_builtin_prog_put(void)
+{
+	if (--xsk_redirect_prog_ref == 0) {
+		bpf_prog_put(xsk_redirect_prog);
+		xsk_redirect_prog = NULL;
+	}
+}
+
 static void xsk_detach(struct xdp_sock *xs)
 {
-	WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL);
+	rtnl_lock();
+	if (READ_ONCE(xs->dev->_rx[xs->queue_id].xsk)) {
+		dev_xsk_prog_uninstall(xs->dev);
+		xsk_builtin_prog_put();
+		WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL);
+	}
+	rtnl_unlock();
 }
 
-static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid)
+static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid,
+		      u32 bind_flags)
 {
-	WRITE_ONCE(dev->_rx[qid].xsk, xs);
+	struct bpf_prog *prog;
+	u32 flags = 0;
+	int err;
+
+	rtnl_lock();
+	prog = xsk_builtin_prog_get();
+	if (IS_ERR(prog)) {
+		err = PTR_ERR(prog);
+		goto out;
+	}
 
+	if (bind_flags & XDP_BUILTIN_SKB_MODE)
+		flags |= XDP_FLAGS_SKB_MODE;
+	if (bind_flags & XDP_BUILTIN_DRV_MODE)
+		flags |= XDP_FLAGS_DRV_MODE;
+
+	err = dev_xsk_prog_install(dev, prog, flags);
+	if (err)
+		goto out_put;
+
+	WRITE_ONCE(dev->_rx[qid].xsk, xs);
+	rtnl_unlock();
 	return 0;
+
+out_put:
+	xsk_builtin_prog_put();
+out:
+	rtnl_unlock();
+	return err;
 }
 
 static int xsk_release(struct socket *sock)
@@ -502,7 +582,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			goto out_unlock;
 
 		if (flags & XDP_ATTACH) {
-			err = xsk_attach(xs, dev, qid);
+			err = xsk_attach(xs, dev, qid, flags);
 			if (err)
 				goto out_unlock;
 		}
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
                   ` (5 preceding siblings ...)
  2018-12-07 11:44 ` [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH Björn Töpel
@ 2018-12-07 13:42 ` Jesper Dangaard Brouer
  2018-12-07 14:01   ` Björn Töpel
  2018-12-07 15:27   ` Björn Töpel
  2018-12-07 21:21 ` Alexei Starovoitov
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-07 13:42 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, ast, daniel, netdev,
	Björn Töpel, u9012063, qi.z.zhang, brouer

On Fri,  7 Dec 2018 12:44:24 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
>     
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>         return bpf_xsk_redirect(ctx);
>   }
>     
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.
>     
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, regular XDP program will have precedence over the builtin
> one.
>     
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
>     
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
> 
> The builtin XDP program functionally adds even more complexity to the
> already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> store the program in the struct net_device together with the install
> flags instead of calling the ndo_bpf multiple times?

(As far as I can see from reading the code, correct me if I'm wrong.)

If an AF_XDP program uses XDP_ATTACH, then it installs the
builtin-program as the XDP program on the "entire" device.  That means
all RX-queues will call this XDP-bpf program (indirect call), and it is
actually only relevant for the specific queue_index.  Yes, the helper
call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
and return XDP_PASS if it is NULL:

+BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct xdp_sock *xsk;
+
+	xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
+	if (xsk) {
+		ri->xsk = xsk;
+		return XDP_REDIRECT;
+	}
+
+	return XDP_PASS;
+}

Why do every normal XDP_PASS packet have to pay this overhead (indirect
call), when someone loads an AF_XDP socket program?  The AF_XDP socket
is tied hard and only relevant to a specific RX-queue (which is why we
get a performance boost due to SPSC queues).

I acknowledge there is a need for this, but this use-case shows there
is a need for attaching XDP programs per RX-queue basis.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 13:42 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Jesper Dangaard Brouer
@ 2018-12-07 14:01   ` Björn Töpel
  2018-12-08 14:52     ` Jesper Dangaard Brouer
  2018-12-07 15:27   ` Björn Töpel
  1 sibling, 1 reply; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 14:01 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, William Tu, Zhang, Qi Z

Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri,  7 Dec 2018 12:44:24 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >         return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, regular XDP program will have precedence over the builtin
> > one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > The builtin XDP program functionally adds even more complexity to the
> > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > store the program in the struct net_device together with the install
> > flags instead of calling the ndo_bpf multiple times?
>
> (As far as I can see from reading the code, correct me if I'm wrong.)
>
> If an AF_XDP program uses XDP_ATTACH, then it installs the
> builtin-program as the XDP program on the "entire" device.  That means
> all RX-queues will call this XDP-bpf program (indirect call), and it is
> actually only relevant for the specific queue_index.  Yes, the helper
> call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> and return XDP_PASS if it is NULL:
>

Yes, you are correct. The builtin XDP program, just like a regular XDP
program, affects the whole netdev. So, yes the non-AF_XDP queues would
get a performance hit from this. Just to reiterate -- this isn't new
for this series. This has always been the case for XDP when acting on
just one queue.

> +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> +{
> +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +       struct xdp_sock *xsk;
> +
> +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> +       if (xsk) {
> +               ri->xsk = xsk;
> +               return XDP_REDIRECT;
> +       }
> +
> +       return XDP_PASS;
> +}
>
> Why do every normal XDP_PASS packet have to pay this overhead (indirect
> call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> is tied hard and only relevant to a specific RX-queue (which is why we
> get a performance boost due to SPSC queues).
>
> I acknowledge there is a need for this, but this use-case shows there
> is a need for attaching XDP programs per RX-queue basis.
>

>From my AF_XDP perspective, having a program per queue would make
sense. The discussion of a per-queue has been up before, and I think
the conclusion was that it would be too complex from a
configuration/tooling point-of-view. Again, for AF_XDP this would be
great.

When we started to hack on AF_PACKET v4, we had some ideas of doing
the "queue slicing" on a netdev level. So, e.g. take a netdev, and
create, say, macvlans that took over parts of parents queues
(something in line of what John did with NETIF_F_HW_L2FW_DOFFLOAD for
macvlan) and then use the macvlan interface as the dedicated AF_XDP
interface.

Personally, I like the current queue slicing model, and having a way
of loading an XDP program per queue would be nice -- unless the UX for
the poor sysadmin will be terrible. :-)


Björn

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 13:42 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Jesper Dangaard Brouer
  2018-12-07 14:01   ` Björn Töpel
@ 2018-12-07 15:27   ` Björn Töpel
  1 sibling, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-07 15:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, William Tu, Zhang, Qi Z

Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri,  7 Dec 2018 12:44:24 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >         return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, regular XDP program will have precedence over the builtin
> > one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > The builtin XDP program functionally adds even more complexity to the
> > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > store the program in the struct net_device together with the install
> > flags instead of calling the ndo_bpf multiple times?
>
> (As far as I can see from reading the code, correct me if I'm wrong.)
>
> If an AF_XDP program uses XDP_ATTACH, then it installs the
> builtin-program as the XDP program on the "entire" device.  That means
> all RX-queues will call this XDP-bpf program (indirect call), and it is
> actually only relevant for the specific queue_index.  Yes, the helper
> call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> and return XDP_PASS if it is NULL:
>

A side-note: What one can do, which I did for the plumbers work, is
bypassing the indirect call in bpf_prog_run_xdp by doing a "if the XDP
program is a builtin, call internal_bpf_xsk_redirect directly". Then,
the XDP_PASS path wont be taxed by the indirect call -- but only for
this special XDP_ATTACH program. And you'll still get an additional
if-statement in for the skb-path.


Björn

> +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> +{
> +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +       struct xdp_sock *xsk;
> +
> +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> +       if (xsk) {
> +               ri->xsk = xsk;
> +               return XDP_REDIRECT;
> +       }
> +
> +       return XDP_PASS;
> +}
>
> Why do every normal XDP_PASS packet have to pay this overhead (indirect
> call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> is tied hard and only relevant to a specific RX-queue (which is why we
> get a performance boost due to SPSC queues).
>
> I acknowledge there is a need for this, but this use-case shows there
> is a need for attaching XDP programs per RX-queue basis.
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
                   ` (6 preceding siblings ...)
  2018-12-07 13:42 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Jesper Dangaard Brouer
@ 2018-12-07 21:21 ` Alexei Starovoitov
  2018-12-08  9:31   ` Björn Töpel
  2018-12-08 15:12   ` Jesper Dangaard Brouer
       [not found] ` <20181207114431.18038-8-bjorn.topel@gmail.com>
  2018-12-10 14:01 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
  9 siblings, 2 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2018-12-07 21:21 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, ast, daniel, netdev,
	Björn Töpel, brouer, u9012063, qi.z.zhang

On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Hi!
> 
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
> 
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
>     
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>         return bpf_xsk_redirect(ctx);
>   }
>     
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.

The feature makes sense to me.
May be XDP_ATTACH_BUILTIN would be a better name ?
Also I think it needs another parameter to say which builtin
program to use.
This unconditional xsk_redirect is fine for performance
benchmarking, but for production I suspect the users would want
an easy way to stay safe when they're playing with AF_XDP.
So another builtin program that redirects ssh and ping traffic
back to the kernel would be a nice addition.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 21:21 ` Alexei Starovoitov
@ 2018-12-08  9:31   ` Björn Töpel
  2018-12-08 15:12   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-08  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, Jesper Dangaard Brouer, William Tu, Zhang,
	Qi Z

Den fre 7 dec. 2018 kl 22:21 skrev Alexei Starovoitov
<alexei.starovoitov@gmail.com>:
>
> On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > Hi!
> >
> > This patch set adds support for a new XDP socket bind option,
> > XDP_ATTACH.
> >
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >         return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
>
> The feature makes sense to me.
> May be XDP_ATTACH_BUILTIN would be a better name ?

Yes, agree, or maybe XDP_BUILTIN_ATTACH? Regardless, I'll change the
name for the next revision.

> Also I think it needs another parameter to say which builtin
> program to use.

Yup, I had a plan to add the parameter when it's actually more than
*one* builtin, but you're right, let's do it right away. I'll add a
builtin prog enum field to the struct sockaddr_xdp.

> This unconditional xsk_redirect is fine for performance
> benchmarking, but for production I suspect the users would want
> an easy way to stay safe when they're playing with AF_XDP.

For setups that don't direct the flows explicitly by HW filters,  yes!

> So another builtin program that redirects ssh and ping traffic
> back to the kernel would be a nice addition.
>

I suspect AF_XDP users would prefer redirecting packets to the kernel
via the CPUMAP instead of XDP_PASS -- not paying for the ipstack on
the AF_XDP core. Another builtin would be a tcpdump-like behavior, but
that would require an XDP clone (which Magnus is actually
experimenting with!).

I'll address your input and get back with a new revision. Thanks for
spending time on the series!


Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH bpf-next 7/7] samples: bpf: add support for XDP_ATTACH to xdpsock
       [not found] ` <20181207114431.18038-8-bjorn.topel@gmail.com>
@ 2018-12-08 10:08   ` Zhang, Qi Z
  0 siblings, 0 replies; 24+ messages in thread
From: Zhang, Qi Z @ 2018-12-08 10:08 UTC (permalink / raw)
  To: Björn Töpel, Karlsson, Magnus, magnus.karlsson, ast,
	daniel, netdev
  Cc: Topel, Bjorn, brouer, u9012063

Hi:

Seems there are only 6 patches of the patch set in patchwork

https://patchwork.ozlabs.org/project/netdev/list/?submitter=70569
https://patchwork.ozlabs.org/project/netdev/list/?series=80389

Anyone can help to check why patch 7/7 is missing?

Thanks
Qi

> -----Original Message-----
> From: Björn Töpel [mailto:bjorn.topel@gmail.com]
> Sent: Friday, December 7, 2018 7:45 PM
> To: bjorn.topel@gmail.com; Karlsson, Magnus <magnus.karlsson@intel.com>;
> magnus.karlsson@gmail.com; ast@kernel.org; daniel@iogearbox.net;
> netdev@vger.kernel.org
> Cc: Topel, Bjorn <bjorn.topel@intel.com>; brouer@redhat.com;
> u9012063@gmail.com; Zhang, Qi Z <qi.z.zhang@intel.com>
> Subject: [PATCH bpf-next 7/7] samples: bpf: add support for XDP_ATTACH to
> xdpsock
> 
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Teach the sample xdpsock application about XDP_ATTACH.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 108 +++++++++++++++++++++++--------------
>  1 file changed, 67 insertions(+), 41 deletions(-)
> 
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index
> 57ecadc58403..12f908b60468 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -633,7 +633,8 @@ static void int_exit(int sig)  {
>  	(void)sig;
>  	dump_stats();
> -	bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> +	if (!(opt_xdp_bind_flags & XDP_ATTACH))
> +		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
>  	exit(EXIT_SUCCESS);
>  }
> 
> @@ -650,6 +651,7 @@ static struct option long_options[] = {
>  	{"interval", required_argument, 0, 'n'},
>  	{"zero-copy", no_argument, 0, 'z'},
>  	{"copy", no_argument, 0, 'c'},
> +	{"attach", no_argument, 0, 'a'},
>  	{0, 0, 0, 0}
>  };
> 
> @@ -670,6 +672,7 @@ static void usage(const char *prog)
>  		"  -n, --interval=n	Specify statistics update interval (default 1
> sec).\n"
>  		"  -z, --zero-copy      Force zero-copy mode.\n"
>  		"  -c, --copy           Force copy mode.\n"
> +		"  -a, --attach         XDP_ATTACH mode.\n"
>  		"\n";
>  	fprintf(stderr, str, prog);
>  	exit(EXIT_FAILURE);
> @@ -682,7 +685,7 @@ static void parse_command_line(int argc, char **argv)
>  	opterr = 0;
> 
>  	for (;;) {
> -		c = getopt_long(argc, argv, "rtli:q:psSNn:cz", long_options,
> +		c = getopt_long(argc, argv, "rtli:q:psSNn:cza", long_options,
>  				&option_index);
>  		if (c == -1)
>  			break;
> @@ -725,11 +728,22 @@ static void parse_command_line(int argc, char
> **argv)
>  		case 'c':
>  			opt_xdp_bind_flags |= XDP_COPY;
>  			break;
> +		case 'a':
> +			opt_xdp_bind_flags |= XDP_ATTACH;
> +			break;
>  		default:
>  			usage(basename(argv[0]));
>  		}
>  	}
> 
> +	if (opt_xdp_bind_flags & XDP_ATTACH) {
> +		if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
> +			opt_xdp_bind_flags |= XDP_BUILTIN_DRV_MODE;
> +		if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
> +			opt_xdp_bind_flags |= XDP_BUILTIN_SKB_MODE;
> +
> +	}
> +
>  	opt_ifindex = if_nametoindex(opt_if);
>  	if (!opt_ifindex) {
>  		fprintf(stderr, "ERROR: interface \"%s\" does not exist\n", @@
> -903,7 +917,7 @@ int main(int argc, char **argv)
>  	struct bpf_prog_load_attr prog_load_attr = {
>  		.prog_type	= BPF_PROG_TYPE_XDP,
>  	};
> -	int prog_fd, qidconf_map, xsks_map;
> +	int prog_fd, qidconf_map, xsks_map = -1;
>  	struct bpf_object *obj;
>  	char xdp_filename[256];
>  	struct bpf_map *map;
> @@ -918,59 +932,71 @@ int main(int argc, char **argv)
>  		exit(EXIT_FAILURE);
>  	}
> 
> -	snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
> -	prog_load_attr.file = xdp_filename;
> +	if (!(opt_xdp_bind_flags & XDP_ATTACH)) {
> +		snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o",
> +			 argv[0]);
> +		prog_load_attr.file = xdp_filename;
> 
> -	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
> -		exit(EXIT_FAILURE);
> -	if (prog_fd < 0) {
> -		fprintf(stderr, "ERROR: no program found: %s\n",
> -			strerror(prog_fd));
> -		exit(EXIT_FAILURE);
> -	}
> +		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
> +			exit(EXIT_FAILURE);
> +		if (prog_fd < 0) {
> +			fprintf(stderr, "ERROR: no program found: %s\n",
> +				strerror(prog_fd));
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	map = bpf_object__find_map_by_name(obj, "qidconf_map");
> -	qidconf_map = bpf_map__fd(map);
> -	if (qidconf_map < 0) {
> -		fprintf(stderr, "ERROR: no qidconf map found: %s\n",
> -			strerror(qidconf_map));
> -		exit(EXIT_FAILURE);
> -	}
> +		map = bpf_object__find_map_by_name(obj, "qidconf_map");
> +		qidconf_map = bpf_map__fd(map);
> +		if (qidconf_map < 0) {
> +			fprintf(stderr, "ERROR: no qidconf map found: %s\n",
> +				strerror(qidconf_map));
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	map = bpf_object__find_map_by_name(obj, "xsks_map");
> -	xsks_map = bpf_map__fd(map);
> -	if (xsks_map < 0) {
> -		fprintf(stderr, "ERROR: no xsks map found: %s\n",
> -			strerror(xsks_map));
> -		exit(EXIT_FAILURE);
> -	}
> +		map = bpf_object__find_map_by_name(obj, "xsks_map");
> +		xsks_map = bpf_map__fd(map);
> +		if (xsks_map < 0) {
> +			fprintf(stderr, "ERROR: no xsks map found: %s\n",
> +				strerror(xsks_map));
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags) < 0) {
> -		fprintf(stderr, "ERROR: link set xdp fd failed\n");
> -		exit(EXIT_FAILURE);
> -	}
> +		if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd,
> +					opt_xdp_flags) < 0) {
> +			fprintf(stderr, "ERROR: link set xdp fd failed\n");
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
> -	if (ret) {
> -		fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
> -		exit(EXIT_FAILURE);
> +		ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
> +		if (ret) {
> +			fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
> +			exit(EXIT_FAILURE);
> +		}
>  	}
> 
>  	/* Create sockets... */
>  	xsks[num_socks++] = xsk_configure(NULL);
> 
>  #if RR_LB
> +	if (opt_xdp_bind_flags & XDP_ATTACH)
> +		exit(EXIT_FAILURE);
> +
>  	for (i = 0; i < MAX_SOCKS - 1; i++)
>  		xsks[num_socks++] = xsk_configure(xsks[0]->umem);  #endif
> 
> -	/* ...and insert them into the map. */
> -	for (i = 0; i < num_socks; i++) {
> -		key = i;
> -		ret = bpf_map_update_elem(xsks_map, &key, &xsks[i]->sfd, 0);
> -		if (ret) {
> -			fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
> -			exit(EXIT_FAILURE);
> +	if (!(opt_xdp_bind_flags & XDP_ATTACH)) {
> +		/* ...and insert them into the map. */
> +		for (i = 0; i < num_socks; i++) {
> +			key = i;
> +			ret = bpf_map_update_elem(xsks_map, &key,
> +						  &xsks[i]->sfd, 0);
> +			if (ret) {
> +				fprintf(stderr,
> +					"ERROR: bpf_map_update_elem %d\n",
> +					i);
> +				exit(EXIT_FAILURE);
> +			}
>  		}
>  	}
> 
> --
> 2.19.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 14:01   ` Björn Töpel
@ 2018-12-08 14:52     ` Jesper Dangaard Brouer
  2018-12-08 18:43       ` Björn Töpel
  0 siblings, 1 reply; 24+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 14:52 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, William Tu, Zhang, Qi Z, brouer

On Fri, 7 Dec 2018 15:01:55 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
> >
> > On Fri,  7 Dec 2018 12:44:24 +0100
> > Björn Töpel <bjorn.topel@gmail.com> wrote:
> >  
> > > The rationale behind attach is performance and ease of use. Many XDP
> > > socket users just need a simple way of creating/binding a socket and
> > > receiving frames right away without loading an XDP program.
> > >
> > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > is a kernel provided XDP program that is installed to the netdev when
> > > XDP_ATTACH is being passed as a bind() flag.
> > >
> > > The builtin program is the simplest program possible to redirect a
> > > frame to an attached socket. In restricted C it would look like this:
> > >
> > >   SEC("xdp")
> > >   int xdp_prog(struct xdp_md *ctx)
> > >   {
> > >         return bpf_xsk_redirect(ctx);
> > >   }
> > >
> > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > from regular XDP programs. The easiest way to look at it is as a
> > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > builtin one.
> > >
> > > If no regular XDP program is installed to the netdev, the builtin will
> > > be install. If the builtin program is installed, and a regular is
> > > installed, regular XDP program will have precedence over the builtin
> > > one.
> > >
> > > Further, if a regular program is installed, and later removed, the
> > > builtin one will automatically be installed.
> > >
> > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > corresponding XDP netlink install flags.
> > >
> > > The builtin XDP program functionally adds even more complexity to the
> > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > store the program in the struct net_device together with the install
> > > flags instead of calling the ndo_bpf multiple times?  
> >
> > (As far as I can see from reading the code, correct me if I'm wrong.)
> >
> > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > builtin-program as the XDP program on the "entire" device.  That means
> > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > actually only relevant for the specific queue_index.  Yes, the helper
> > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > and return XDP_PASS if it is NULL:
> >  
> 
> Yes, you are correct. The builtin XDP program, just like a regular XDP
> program, affects the whole netdev. So, yes the non-AF_XDP queues would
> get a performance hit from this. Just to reiterate -- this isn't new
> for this series. This has always been the case for XDP when acting on
> just one queue.
> 
> > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > +{
> > +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > +       struct xdp_sock *xsk;
> > +
> > +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > +       if (xsk) {
> > +               ri->xsk = xsk;
> > +               return XDP_REDIRECT;
> > +       }
> > +
> > +       return XDP_PASS;
> > +}
> >
> > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > is tied hard and only relevant to a specific RX-queue (which is why we
> > get a performance boost due to SPSC queues).
> >
> > I acknowledge there is a need for this, but this use-case shows there
> > is a need for attaching XDP programs per RX-queue basis.
> >  
> 
> From my AF_XDP perspective, having a program per queue would make
> sense. The discussion of a per-queue has been up before, and I think
> the conclusion was that it would be too complex from a
> configuration/tooling point-of-view. Again, for AF_XDP this would be
> great.

I really think it is about time that we introduce these per-queue XDP
programs.  From a configuration/tooling point-of-view, your proposed
solution have the same issues, we have to solve. (Like listing
introspection need to show the per-queue XDP/BPF programs). And you have
almost solved it, by keeping the per-queue program in net_device
struct.  

Alexei already requested more types of builtin programs.  But as the
XDP-driver API can only load XDP for the entire NIC, then how can you
use two builtin programs at the same time?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 21:21 ` Alexei Starovoitov
  2018-12-08  9:31   ` Björn Töpel
@ 2018-12-08 15:12   ` Jesper Dangaard Brouer
  2018-12-08 16:55     ` Andrew Lunn
  2018-12-08 18:50     ` Björn Töpel
  1 sibling, 2 replies; 24+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 15:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, Björn Töpel, u9012063, qi.z.zhang,
	brouer

On Fri, 7 Dec 2018 13:21:08 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> for production I suspect the users would want
> an easy way to stay safe when they're playing with AF_XDP.
> So another builtin program that redirects ssh and ping traffic
> back to the kernel would be a nice addition.

Are you saying a buildin program that need to parse different kinds of
Eth-type headers (DSA, VLAN, QinqQ) and find the TCP port to match port
22 to return XDP_PASS, or else call AF_XDP redurect. That seems to be
pure overhead for this fast-path buildin program for AF_XDP.

Would a solution be to install a NIC hardware filter that redirect SSH
port 22 to another RX-queue. And then have a buildin program that
returns XDP_PASS installed on that RX-queue.   And change Bjørns
semantics, such that RX-queue programs takes precedence over the global
XDP program. This would also be a good fail safe in general for XDP.

If the RX-queues take precedence, I can use this fail safe approach.
E.g. when I want to test my new global XDP program, I'll use ethtool
match my management IP and send that to a specific RX-queue and my
fail-safe BPF program.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-08 15:12   ` Jesper Dangaard Brouer
@ 2018-12-08 16:55     ` Andrew Lunn
  2018-12-08 20:37       ` Jesper Dangaard Brouer
  2018-12-08 18:50     ` Björn Töpel
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Lunn @ 2018-12-08 16:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Björn Töpel, magnus.karlsson,
	magnus.karlsson, ast, daniel, netdev, Björn Töpel,
	u9012063, qi.z.zhang

On Sat, Dec 08, 2018 at 04:12:12PM +0100, Jesper Dangaard Brouer wrote:
> On Fri, 7 Dec 2018 13:21:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > for production I suspect the users would want
> > an easy way to stay safe when they're playing with AF_XDP.
> > So another builtin program that redirects ssh and ping traffic
> > back to the kernel would be a nice addition.
> 
> Are you saying a buildin program that need to parse different kinds of
> Eth-type headers (DSA, VLAN, QinqQ)

Hi Jesper

Parsing DSA headers is quite hard, since most of them are not
discoverable, you need to know they are there, and where they actually
are.

I also don't think there are any use cases for actually trying to
parse them on the master interface. You are more likely to want to run
the eBPF program on the slave interface, once the DSA header has been
removed, and it is clear what interface the frame is actually for.

	 Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-08 14:52     ` Jesper Dangaard Brouer
@ 2018-12-08 18:43       ` Björn Töpel
  2018-12-08 20:42         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 24+ messages in thread
From: Björn Töpel @ 2018-12-08 18:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, William Tu, Zhang, Qi Z

Den lör 8 dec. 2018 kl 15:52 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri, 7 Dec 2018 15:01:55 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
> > >
> > > On Fri,  7 Dec 2018 12:44:24 +0100
> > > Björn Töpel <bjorn.topel@gmail.com> wrote:
> > >
> > > > The rationale behind attach is performance and ease of use. Many XDP
> > > > socket users just need a simple way of creating/binding a socket and
> > > > receiving frames right away without loading an XDP program.
> > > >
> > > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > > is a kernel provided XDP program that is installed to the netdev when
> > > > XDP_ATTACH is being passed as a bind() flag.
> > > >
> > > > The builtin program is the simplest program possible to redirect a
> > > > frame to an attached socket. In restricted C it would look like this:
> > > >
> > > >   SEC("xdp")
> > > >   int xdp_prog(struct xdp_md *ctx)
> > > >   {
> > > >         return bpf_xsk_redirect(ctx);
> > > >   }
> > > >
> > > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > > from regular XDP programs. The easiest way to look at it is as a
> > > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > > builtin one.
> > > >
> > > > If no regular XDP program is installed to the netdev, the builtin will
> > > > be install. If the builtin program is installed, and a regular is
> > > > installed, regular XDP program will have precedence over the builtin
> > > > one.
> > > >
> > > > Further, if a regular program is installed, and later removed, the
> > > > builtin one will automatically be installed.
> > > >
> > > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > > corresponding XDP netlink install flags.
> > > >
> > > > The builtin XDP program functionally adds even more complexity to the
> > > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > > store the program in the struct net_device together with the install
> > > > flags instead of calling the ndo_bpf multiple times?
> > >
> > > (As far as I can see from reading the code, correct me if I'm wrong.)
> > >
> > > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > > builtin-program as the XDP program on the "entire" device.  That means
> > > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > > actually only relevant for the specific queue_index.  Yes, the helper
> > > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > > and return XDP_PASS if it is NULL:
> > >
> >
> > Yes, you are correct. The builtin XDP program, just like a regular XDP
> > program, affects the whole netdev. So, yes the non-AF_XDP queues would
> > get a performance hit from this. Just to reiterate -- this isn't new
> > for this series. This has always been the case for XDP when acting on
> > just one queue.
> >
> > > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > > +{
> > > +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > > +       struct xdp_sock *xsk;
> > > +
> > > +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > > +       if (xsk) {
> > > +               ri->xsk = xsk;
> > > +               return XDP_REDIRECT;
> > > +       }
> > > +
> > > +       return XDP_PASS;
> > > +}
> > >
> > > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > > is tied hard and only relevant to a specific RX-queue (which is why we
> > > get a performance boost due to SPSC queues).
> > >
> > > I acknowledge there is a need for this, but this use-case shows there
> > > is a need for attaching XDP programs per RX-queue basis.
> > >
> >
> > From my AF_XDP perspective, having a program per queue would make
> > sense. The discussion of a per-queue has been up before, and I think
> > the conclusion was that it would be too complex from a
> > configuration/tooling point-of-view. Again, for AF_XDP this would be
> > great.
>
> I really think it is about time that we introduce these per-queue XDP
> programs.  From a configuration/tooling point-of-view, your proposed
> solution have the same issues, we have to solve. (Like listing
> introspection need to show the per-queue XDP/BPF programs). And you have
> almost solved it, by keeping the per-queue program in net_device
> struct.
>

I just to emphasize; My series is only a convenience method for
loading a (sticky/hierarchical) XDP program and a new bpf-function.
Nothing more. I'm not sure what you mean when you say that "proposed
solution have the same issues". Do you mean XDP in general?

*I'm* all in for a per-queue XDP program -- but I might be biased due
to AF_XDP :-P Let's explore in this thread what that would look like
and the semantics of that!

> Alexei already requested more types of builtin programs.  But as the
> XDP-driver API can only load XDP for the entire NIC, then how can you
> use two builtin programs at the same time?
>

You can't. Again, this is just another way of loading an XDP program.
If socket A loads a builtin B on dev C, and socket X tries to load a
builtin Y on dev C this will fail/clash. This is not different from
today. My view on builtin has been "an XDP program provided by the
kernel". A related problem is if socket A attaches with XDP_DRV and
socket B attaches with XDP_SKB on the same netdev. (Ick, this is the
messy parts of XDP. :-()

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-08 15:12   ` Jesper Dangaard Brouer
  2018-12-08 16:55     ` Andrew Lunn
@ 2018-12-08 18:50     ` Björn Töpel
  1 sibling, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-08 18:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Karlsson, Magnus, Magnus Karlsson, ast,
	Daniel Borkmann, Netdev, Björn Töpel, William Tu,
	Zhang, Qi Z

Den lör 8 dec. 2018 kl 16:12 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri, 7 Dec 2018 13:21:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > for production I suspect the users would want
> > an easy way to stay safe when they're playing with AF_XDP.
> > So another builtin program that redirects ssh and ping traffic
> > back to the kernel would be a nice addition.
>
> Are you saying a buildin program that need to parse different kinds of
> Eth-type headers (DSA, VLAN, QinqQ) and find the TCP port to match port
> 22 to return XDP_PASS, or else call AF_XDP redurect. That seems to be
> pure overhead for this fast-path buildin program for AF_XDP.
>
> Would a solution be to install a NIC hardware filter that redirect SSH
> port 22 to another RX-queue. And then have a buildin program that
> returns XDP_PASS installed on that RX-queue.   And change Bjørns
> semantics, such that RX-queue programs takes precedence over the global
> XDP program. This would also be a good fail safe in general for XDP.
>

Exactly this; I'd say this is the most common way of using AF_XDP,
i.e. steer a certain flow to a Rx queue, and *all* packets on that
queue are pulled to the AF_XDP socket. This is why the builtin is the
way it is, it's what is being used, not only for benchmarking
scenarios.

> If the RX-queues take precedence, I can use this fail safe approach.
> E.g. when I want to test my new global XDP program, I'll use ethtool
> match my management IP and send that to a specific RX-queue and my
> fail-safe BPF program.
>

Interesting take to have a *per queue* XDP program that overrides the
regular one. Maybe this is a better way to use builtins?

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-08 16:55     ` Andrew Lunn
@ 2018-12-08 20:37       ` Jesper Dangaard Brouer
  2018-12-08 20:48         ` Andrew Lunn
  0 siblings, 1 reply; 24+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 20:37 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexei Starovoitov, Björn Töpel, magnus.karlsson,
	magnus.karlsson, ast, daniel, netdev, Björn Töpel,
	u9012063, qi.z.zhang, brouer

On Sat, 8 Dec 2018 17:55:05 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> On Sat, Dec 08, 2018 at 04:12:12PM +0100, Jesper Dangaard Brouer wrote:
> > On Fri, 7 Dec 2018 13:21:08 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >   
> > > for production I suspect the users would want
> > > an easy way to stay safe when they're playing with AF_XDP.
> > > So another builtin program that redirects ssh and ping traffic
> > > back to the kernel would be a nice addition.  
> > 
> > Are you saying a buildin program that need to parse different kinds of
> > Eth-type headers (DSA, VLAN, QinqQ)  
> 
> Hi Jesper
> 
> Parsing DSA headers is quite hard, since most of them are not
> discoverable, you need to know they are there, and where they actually
> are.
> 
> I also don't think there are any use cases for actually trying to
> parse them on the master interface. You are more likely to want to run
> the eBPF program on the slave interface, once the DSA header has been
> removed, and it is clear what interface the frame is actually for.

That is not how XDP works. XDP must run on the "master" physical driver
interface, as the "slave" interface is a virtual DSA interface.  I did
mention DSA because I'm handling that on the EspressoBin implementation.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-08 18:43       ` Björn Töpel
@ 2018-12-08 20:42         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 24+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 20:42 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, William Tu, Zhang, Qi Z, brouer

On Sat, 8 Dec 2018 19:43:38 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> Den lör 8 dec. 2018 kl 15:52 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
> >
> > On Fri, 7 Dec 2018 15:01:55 +0100
> > Björn Töpel <bjorn.topel@gmail.com> wrote:
> >  
> > > Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:  
> > > >
> > > > On Fri,  7 Dec 2018 12:44:24 +0100
> > > > Björn Töpel <bjorn.topel@gmail.com> wrote:
> > > >  
> > > > > The rationale behind attach is performance and ease of use. Many XDP
> > > > > socket users just need a simple way of creating/binding a socket and
> > > > > receiving frames right away without loading an XDP program.
> > > > >
> > > > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > > > is a kernel provided XDP program that is installed to the netdev when
> > > > > XDP_ATTACH is being passed as a bind() flag.
> > > > >
> > > > > The builtin program is the simplest program possible to redirect a
> > > > > frame to an attached socket. In restricted C it would look like this:
> > > > >
> > > > >   SEC("xdp")
> > > > >   int xdp_prog(struct xdp_md *ctx)
> > > > >   {
> > > > >         return bpf_xsk_redirect(ctx);
> > > > >   }
> > > > >
> > > > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > > > from regular XDP programs. The easiest way to look at it is as a
> > > > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > > > builtin one.
> > > > >
> > > > > If no regular XDP program is installed to the netdev, the builtin will
> > > > > be install. If the builtin program is installed, and a regular is
> > > > > installed, regular XDP program will have precedence over the builtin
> > > > > one.
> > > > >
> > > > > Further, if a regular program is installed, and later removed, the
> > > > > builtin one will automatically be installed.
> > > > >
> > > > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > > > corresponding XDP netlink install flags.
> > > > >
> > > > > The builtin XDP program functionally adds even more complexity to the
> > > > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > > > store the program in the struct net_device together with the install
> > > > > flags instead of calling the ndo_bpf multiple times?  
> > > >
> > > > (As far as I can see from reading the code, correct me if I'm wrong.)
> > > >
> > > > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > > > builtin-program as the XDP program on the "entire" device.  That means
> > > > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > > > actually only relevant for the specific queue_index.  Yes, the helper
> > > > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > > > and return XDP_PASS if it is NULL:
> > > >  
> > >
> > > Yes, you are correct. The builtin XDP program, just like a regular XDP
> > > program, affects the whole netdev. So, yes the non-AF_XDP queues would
> > > get a performance hit from this. Just to reiterate -- this isn't new
> > > for this series. This has always been the case for XDP when acting on
> > > just one queue.
> > >  
> > > > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > > > +{
> > > > +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > > > +       struct xdp_sock *xsk;
> > > > +
> > > > +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > > > +       if (xsk) {
> > > > +               ri->xsk = xsk;
> > > > +               return XDP_REDIRECT;
> > > > +       }
> > > > +
> > > > +       return XDP_PASS;
> > > > +}
> > > >
> > > > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > > > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > > > is tied hard and only relevant to a specific RX-queue (which is why we
> > > > get a performance boost due to SPSC queues).
> > > >
> > > > I acknowledge there is a need for this, but this use-case shows there
> > > > is a need for attaching XDP programs per RX-queue basis.
> > > >  
> > >
> > > From my AF_XDP perspective, having a program per queue would make
> > > sense. The discussion of a per-queue has been up before, and I think
> > > the conclusion was that it would be too complex from a
> > > configuration/tooling point-of-view. Again, for AF_XDP this would be
> > > great.  
> >
> > I really think it is about time that we introduce these per-queue XDP
> > programs.  From a configuration/tooling point-of-view, your proposed
> > solution have the same issues, we have to solve. (Like listing
> > introspection need to show the per-queue XDP/BPF programs). And you have
> > almost solved it, by keeping the per-queue program in net_device
> > struct.
> >  
> 
> I just to emphasize; My series is only a convenience method for
> loading a (sticky/hierarchical) XDP program and a new bpf-function.
> Nothing more. I'm not sure what you mean when you say that "proposed
> solution have the same issues". Do you mean XDP in general?
> 
> *I'm* all in for a per-queue XDP program -- but I might be biased due
> to AF_XDP :-P Let's explore in this thread what that would look like
> and the semantics of that!
>

Yes, please! (p.s. I'm going on vacation next week, which is
unfortunate as I really want to find a good semantics for this).


> > Alexei already requested more types of builtin programs.  But as the
> > XDP-driver API can only load XDP for the entire NIC, then how can
> > you use two builtin programs at the same time?
> >  
> 
> You can't. Again, this is just another way of loading an XDP program.
> If socket A loads a builtin B on dev C, and socket X tries to load a
> builtin Y on dev C this will fail/clash. This is not different from
> today. My view on builtin has been "an XDP program provided by the
> kernel". A related problem is if socket A attaches with XDP_DRV and
> socket B attaches with XDP_SKB on the same netdev. (Ick, this is the
> messy parts of XDP. :-()

This again show we need per-RX-queue XDP programs.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-08 20:37       ` Jesper Dangaard Brouer
@ 2018-12-08 20:48         ` Andrew Lunn
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Lunn @ 2018-12-08 20:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Björn Töpel, magnus.karlsson,
	magnus.karlsson, ast, daniel, netdev, Björn Töpel,
	u9012063, qi.z.zhang

> That is not how XDP works. XDP must run on the "master" physical driver
> interface, as the "slave" interface is a virtual DSA interface.  I did
> mention DSA because I'm handling that on the EspressoBin implementation.

Hi Jesper

It is going to be interesting to see how you do that.

As a user, i want to attach the XDP program to a slave interface. So i
assume you somehow redirect that to the master, with some extra meta
data to allow XDP on the master to detect packets for a specific
slave?

   Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH
  2018-12-07 11:44 ` [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH Björn Töpel
@ 2018-12-10  2:17   ` Jakub Kicinski
  2018-12-10  7:29     ` Björn Töpel
  0 siblings, 1 reply; 24+ messages in thread
From: Jakub Kicinski @ 2018-12-10  2:17 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, ast, daniel, netdev,
	Björn Töpel, brouer, u9012063, qi.z.zhang

On Fri,  7 Dec 2018 12:44:30 +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> This commit extends the XDP_ATTACH bind option by loading a builtin
> XDP program.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
> 
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>   	return bpf_xsk_redirect(ctx);
>   }
> 
> For many XDP socket users, this program would be the most common one.
> 
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, different from
> regular XDP programs. The easiest way to look at it is as a 2-level
> hierarchy, where regular XDP programs has precedence over the builtin
> one.
> 
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, the regular XDP will have precedence over the builtin one.
> 
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
> 
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Why not add this functionality to libbpf?  Libbpf would really benefit
from better AF_XDP support, this would be a trivial part of it.

Are we going to stop here or take the next logical step of not invoking
the BPF program (and paying retpoline cost) at all if we know its the
"built-in/just redirect to the xsk" program?  Because if the answer is
"yes" we could just cut to the chase we can avoid all this unnecessary
complexity.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH
  2018-12-10  2:17   ` Jakub Kicinski
@ 2018-12-10  7:29     ` Björn Töpel
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-10  7:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Björn Töpel, Jesper Dangaard Brouer, William Tu, Zhang,
	Qi Z

Den mån 10 dec. 2018 kl 03:17 skrev Jakub Kicinski
<jakub.kicinski@netronome.com>:
>
> On Fri,  7 Dec 2018 12:44:30 +0100, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > This commit extends the XDP_ATTACH bind option by loading a builtin
> > XDP program.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >       return bpf_xsk_redirect(ctx);
> >   }
> >
> > For many XDP socket users, this program would be the most common one.
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, different from
> > regular XDP programs. The easiest way to look at it is as a 2-level
> > hierarchy, where regular XDP programs has precedence over the builtin
> > one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, the regular XDP will have precedence over the builtin one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Why not add this functionality to libbpf?  Libbpf would really benefit
> from better AF_XDP support, this would be a trivial part of it.
>

Hmm, maybe this would be a better intermediate step (given the
discussions on per-queue programs/builtins). I.e. the first 3 patches
just assigning an AF_XDP socket to a Rx queue, and the
bpf_xsk_redirect, and supply the builtin program as part of libbpf.

Regarding AF_XDP and libbpf -- we're planning to post the first AF_XDP
patches this week.

> Are we going to stop here or take the next logical step of not invoking
> the BPF program (and paying retpoline cost) at all if we know its the
> "built-in/just redirect to the xsk" program?  Because if the answer is
> "yes" we could just cut to the chase we can avoid all this unnecessary
> complexity.

A next step could be avoiding the retpoline by a builtin check in
bpf_prog_run_xdp (unless you're HW offloaded). However, we do not want
to bypass the BPF control-plane! The indirect call in bpf_prog_run_xdp
is really a performance killer. A batching bpf_prog_run_xdp version
would help some, but it wouldn't be as good as removing the call.

What I don't like about my series is having raw BPF-programs (flow of
instructions instructions) in the kernel. It's messy and hard to read.
Maybe it would be possible to teach the verifier to detect that the
program is "a trivial pass everything to a socket". :-)

As you say, making the XDP loading more complex does leave a bad taste
in the mouth.


Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets
  2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
                   ` (8 preceding siblings ...)
       [not found] ` <20181207114431.18038-8-bjorn.topel@gmail.com>
@ 2018-12-10 14:01 ` Björn Töpel
  9 siblings, 0 replies; 24+ messages in thread
From: Björn Töpel @ 2018-12-10 14:01 UTC (permalink / raw)
  To: Karlsson, Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Jakub Kicinski
  Cc: Björn Töpel, Jesper Dangaard Brouer, William Tu, Zhang, Qi Z

Den fre 7 dec. 2018 kl 12:44 skrev Björn Töpel <bjorn.topel@gmail.com>:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Hi!
>
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
>
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
>
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
>
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
>
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>         return bpf_xsk_redirect(ctx);
>   }
>
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.
>
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, regular XDP program will have precedence over the builtin
> one.
>
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
>
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
>
> The builtin XDP program functionally adds even more complexity to the
> already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> store the program in the struct net_device together with the install
> flags instead of calling the ndo_bpf multiple times?
>
> The outline of the series is as following:
>   patch 1-2: Introduce the first part of XDP_ATTACH, simply adding
>              the socket to the netdev structure.
>   patch 3:   Add a new BPF function, bpf_xsk_redirect, that
>              redirects a frame to an attached socket.
>   patch 4-5: Preparatory commits for built in BPF programs
>   patch 6:   Make XDP_ATTACH load a builtin XDP program
>   patch 7:   Extend the samples application with XDP_ATTACH
>              support
>
> Patch 1 through 3 gives the performance boost and make it possible to
> use AF_XDP sockets without an XSKMAP, but still requires an explicit
> XDP program to be loaded.
>
> Patch 4 through 6 make it possible to use XDP socket without explictly
> loading an XDP program.
>
> The performance numbers for rxdrop (Intel(R) Xeon(R) Gold 6154 CPU @
> 3.00GHz):
>
> XDP_SKB:
> XSKMAP:     2.8 Mpps
> XDP_ATTACH: 2.9 Mpps
>
> XDP_DRV - copy:
> XSKMAP:     8.5 Mpps
> XDP_ATTACH: 9.3 Mpps
>
> XDP_DRV - zero-copy:
> XSKMAP:     15.1 Mpps
> XDP_ATTACH: 17.3 Mpps
>
> Thanks!
> Björn
>

Firstly, thanks for all the feedback on the series.

I will do a respin of the series, but without the builtin
functionality, IOW only the XDP_ATTACH bind() option to assign an XDP
socket to a netdev Rx queue, and the corresponding bpf_xsk_redirect
(patch 1 to 3). This will give a perfomance boost, and also requires a
less complicated XDP program.

The builtin/sticky XDP program has too many down-sides:

* Explicit BPF instructions in the kernel.
* Composability: mix regular BPF programs with builtin ones.
* The sticky XDP program is not intuitive for a user, e.g. "ip link
  dev eth0 xdp off" will not disable the sticky one
* More complex XDP install code.

We'll this route instead: Introduce XDP_ATTACH/bpf_xsk_redirect (as
above) and let libbpf load the trivial AF_XDP program.


Björn

>
> Björn Töpel (7):
>   xsk: simplify AF_XDP socket teardown
>   xsk: add XDP_ATTACH bind() flag
>   bpf: add bpf_xsk_redirect function
>   bpf: prepare for builtin bpf program
>   bpf: add function to load builtin BPF program
>   xsk: load a builtin XDP program on XDP_ATTACH
>   samples: bpf: add support for XDP_ATTACH to xdpsock
>
>  include/linux/bpf.h         |   2 +
>  include/linux/filter.h      |   4 +
>  include/linux/netdevice.h   |  11 +++
>  include/net/xdp_sock.h      |   2 +
>  include/trace/events/xdp.h  |  61 +++++++++++++++
>  include/uapi/linux/bpf.h    |  14 +++-
>  include/uapi/linux/if_xdp.h |   9 ++-
>  kernel/bpf/syscall.c        |  91 ++++++++++++++--------
>  net/core/dev.c              |  84 +++++++++++++++++++--
>  net/core/filter.c           | 100 ++++++++++++++++++++++++
>  net/xdp/xsk.c               | 146 +++++++++++++++++++++++++++++-------
>  samples/bpf/xdpsock_user.c  | 108 ++++++++++++++++----------
>  12 files changed, 524 insertions(+), 108 deletions(-)
>
> --
> 2.19.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-12-10 14:01 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07 11:44 [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel
2018-12-07 11:44 ` [PATCH bpf-next 1/7] xsk: simplify AF_XDP socket teardown Björn Töpel
2018-12-07 11:44 ` [PATCH bpf-next 2/7] xsk: add XDP_ATTACH bind() flag Björn Töpel
2018-12-07 11:44 ` [PATCH bpf-next 3/7] bpf: add bpf_xsk_redirect function Björn Töpel
2018-12-07 11:44 ` [PATCH bpf-next 4/7] bpf: prepare for builtin bpf program Björn Töpel
2018-12-07 11:44 ` [PATCH bpf-next 5/7] bpf: add function to load builtin BPF program Björn Töpel
2018-12-07 11:44 ` [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH Björn Töpel
2018-12-10  2:17   ` Jakub Kicinski
2018-12-10  7:29     ` Björn Töpel
2018-12-07 13:42 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Jesper Dangaard Brouer
2018-12-07 14:01   ` Björn Töpel
2018-12-08 14:52     ` Jesper Dangaard Brouer
2018-12-08 18:43       ` Björn Töpel
2018-12-08 20:42         ` Jesper Dangaard Brouer
2018-12-07 15:27   ` Björn Töpel
2018-12-07 21:21 ` Alexei Starovoitov
2018-12-08  9:31   ` Björn Töpel
2018-12-08 15:12   ` Jesper Dangaard Brouer
2018-12-08 16:55     ` Andrew Lunn
2018-12-08 20:37       ` Jesper Dangaard Brouer
2018-12-08 20:48         ` Andrew Lunn
2018-12-08 18:50     ` Björn Töpel
     [not found] ` <20181207114431.18038-8-bjorn.topel@gmail.com>
2018-12-08 10:08   ` [PATCH bpf-next 7/7] samples: bpf: add support for XDP_ATTACH to xdpsock Zhang, Qi Z
2018-12-10 14:01 ` [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).