All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
@ 2021-11-16  7:37 Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue Ciara Loftus
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

The common case for AF_XDP sockets (xsks) is creating a single xsk on a queue for sending and 
receiving frames as this is analogous to HW packet steering through RSS and other classification 
methods in the NIC. AF_XDP uses the xdp redirect infrastructure to direct packets to the socket. It 
was designed for the much more complicated case of DEVMAP xdp_redirects which directs traffic to 
another netdev and thus potentially another driver. In the xsk redirect case, by skipping the 
unnecessary parts of this common code we can significantly improve performance and pave the way 
for batching in the driver. This RFC proposes one such way to simplify the infrastructure which 
yields a 27% increase in throughput and a decrease in cycles per packet of 24 cycles [1]. The goal 
of this RFC is to start a discussion on how best to simplify the single-socket datapath while 
providing one method as an example.

Current approach:
1. XSK pointer: an xsk is created and a handle to the xsk is stored in the XSKMAP.
2. XDP program: bpf_redirect_map helper triggers the XSKMAP lookup which stores the result (handle 
to the xsk) and the map type (XSKMAP) in the percpu bpf_redirect_info struct. The XDP_REDIRECT 
action is returned.
3. XDP_REDIRECT handling called by the driver: the map type (XSKMAP) is read from the 
bpf_redirect_info which selects the xsk_map_redirect path. The xsk pointer is retrieved from the
bpf_redirect_info and the XDP descriptor is pushed to the xsk's Rx ring. The socket is added to a
list for flushing later.
4. xdp_do_flush: iterate through the lists of all maps that can be used for redirect (CPUMAP, 
DEVMAP and XSKMAP). When XSKMAP is flushed, go through all xsks that had any traffic redirected to 
them and bump the Rx ring head pointer(s).

For the end goal of submitting the descriptor to the Rx ring and bumping the head pointer of that 
ring, only some of these steps are needed. The rest is overhead. The bpf_redirect_map 
infrastructure is needed for all other redirect operations, but is not necessary when redirecting 
to a single AF_XDP socket. And similarly, flushing the list for every map type in step 4 is not 
necessary when only one socket needs to be flushed.

Proposed approach:
1. XSK pointer: an xsk is created and a handle to the xsk is stored both in the XSKMAP and also the 
netdev_rx_queue struct.
2. XDP program: new bpf_redirect_xsk helper returns XDP_REDIRECT_XSK.
3. XDP_REDIRECT_XSK handling called by the driver: the xsk pointer is retrieved from the 
netdev_rx_queue struct and the XDP descriptor is pushed to the xsk's Rx ring.
4. xsk_flush: fetch the handle from the netdev_rx_queue and flush the xsk.

This fast path is triggered on XDP_REDIRECT_XSK if:
  (i) AF_XDP socket SW Rx ring configured
 (ii) Exactly one xsk attached to the queue
If any of these conditions are not met, fall back to the same behavior as the original approach: 
xdp_redirect_map. This is handled under-the-hood in the new bpf_xdp_redirect_xsk helper so the user
does not need to be aware of these conditions.

Batching:
With this new approach it is possible to optimize the driver by submitting a batch of descriptors 
to the Rx ring in Step 3 of the new approach by simply verifying that the action returned from 
every program run of each packet in a batch equals XDP_REDIRECT_XSK. That's because with this 
action we know the socket to redirect to will be the same for each packet in the batch. This is 
not possible with XDP_REDIRECT because the xsk pointer is stored in the bpf_redirect_info and not
guaranteed to be the same for every packet in a batch.

[1] Performance:
The benchmarks were performed on VM running a 2.4GHz Ice Lake host with an i40e device passed 
through. The xdpsock app was run on a single core with busy polling and configured in 'rxonly' mode.
./xdpsock -i <iface> -r -B
The improvement in throughput when using the new bpf helper and XDP action was measured at ~13% for 
scalar processing, with reduction in cycles per packet of ~13. A further ~14% improvement in 
throughput and reduction of ~11 cycles per packet was measured when the batched i40e driver path 
was used, for a total improvement of ~27% in throughput and reduction of ~24 cycles per packet.

Other approaches considered:
Two other approaches were considered. The advantage of both being that neither involved introducing 
a new XDP action. The first alternative approach considered was to create a new map type 
BPF_MAP_TYPE_XSKMAP_DIRECT. When the XDP_REDIRECT action was returned, this map type could be 
checked and used as an indicator to skip the map lookup and use the netdev_rx_queue xsk instead. 
The second approach considered was similar and involved using a new bpf_redirect_info flag which 
could be used in a similar fashion.
While both approaches yielded a performance improvement they were measured at about half of what 
was measured for the approach outlined in this RFC. It seems using bpf_redirect_info is too 
expensive.
Also, centralised processing of XDP actions was investigated. This would involve porting all drivers
to a common interface for handling XDP actions which would greatly simplify the work involved in
adding support for new XDP actions such as XDP_REDIRECT_XSK. However it was deemed at this point to
be more complex than adding support for the new action to every driver. Should this series be
considered worth pursuing for a proper patch set, the intention would be to update each driver 
individually.

Thank you to Magnus Karlsson and Maciej Fijalkowski for several suggestions and insight provided.

TODO:
* Add selftest(s)
* Add support for all copy and zero copy drivers
* Libxdp support

The series applies on commit e5043894b21f ("bpftool: Use libbpf_get_error() to check error")

Thanks,
Ciara

Ciara Loftus (8):
  xsk: add struct xdp_sock to netdev_rx_queue
  bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action
  xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
  i40e: handle the XDP_REDIRECT_XSK action
  xsk: implement a batched version of xsk_rcv
  i40e: isolate descriptor processing in separate function
  i40e: introduce batched XDP rx descriptor processing
  libbpf: use bpf_redirect_xsk in the default program

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  13 +-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 285 +++++++++++++++---
 include/linux/netdevice.h                     |   2 +
 include/net/xdp_sock_drv.h                    |  49 +++
 include/net/xsk_buff_pool.h                   |  22 ++
 include/uapi/linux/bpf.h                      |  13 +
 kernel/bpf/verifier.c                         |   7 +-
 net/core/dev.c                                |  14 +
 net/core/filter.c                             |  26 ++
 net/xdp/xsk.c                                 |  69 ++++-
 net/xdp/xsk_queue.h                           |  31 ++
 tools/include/uapi/linux/bpf.h                |  13 +
 tools/lib/bpf/xsk.c                           |  50 ++-
 14 files changed, 551 insertions(+), 44 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 2/8] bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action Ciara Loftus
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

Storing a reference to the XDP socket in the netdev_rx_queue structure
makes a single socket accessible without requiring a lookup in the XSKMAP.
A future commit will introduce the XDP_REDIRECT_XSK action which
indicates to use this reference instead of performing the lookup. Since
an rx ring is required for redirection, only store the reference if an
rx ring is configured.

When multiple sockets exist for a given context (netdev, qid), a
reference is not stored because in this case we fallback to the default
behavior of using the XSKMAP to redirect the packets.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/linux/netdevice.h |  2 ++
 net/xdp/xsk.c             | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec42495a43a..1ad2491f0391 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -736,6 +736,8 @@ struct netdev_rx_queue {
 	struct net_device		*dev;
 #ifdef CONFIG_XDP_SOCKETS
 	struct xsk_buff_pool            *pool;
+	struct xdp_sock			*xsk;
+	refcount_t			xsk_refcnt;
 #endif
 } ____cacheline_aligned_in_smp;
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f16074eb53c7..94ee524b9ca8 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -728,6 +728,30 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 
 	/* Wait for driver to stop using the xdp socket. */
 	xp_del_xsk(xs->pool, xs);
+	if (xs->rx) {
+		if (refcount_read(&dev->_rx[xs->queue_id].xsk_refcnt) == 1) {
+			refcount_set(&dev->_rx[xs->queue_id].xsk_refcnt, 0);
+			WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL);
+		} else {
+			refcount_dec(&dev->_rx[xs->queue_id].xsk_refcnt);
+			/* If the refcnt returns to one again store the reference to the
+			 * remaining socket in the netdev_rx_queue.
+			 */
+			if (refcount_read(&dev->_rx[xs->queue_id].xsk_refcnt) == 1) {
+				struct net *net = dev_net(dev);
+				struct xdp_sock *xsk;
+				struct sock *sk;
+
+				mutex_lock(&net->xdp.lock);
+				sk = sk_head(&net->xdp.list);
+				xsk = xdp_sk(sk);
+				mutex_lock(&xsk->mutex);
+				WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, xsk);
+				mutex_unlock(&xsk->mutex);
+				mutex_unlock(&net->xdp.lock);
+			}
+		}
+	}
 	xs->dev = NULL;
 	synchronize_net();
 	dev_put(dev);
@@ -972,6 +996,16 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	xs->queue_id = qid;
 	xp_add_xsk(xs->pool, xs);
 
+	if (xs->rx) {
+		if (refcount_read(&dev->_rx[xs->queue_id].xsk_refcnt) == 0) {
+			WRITE_ONCE(dev->_rx[qid].xsk, xs);
+			refcount_set(&dev->_rx[qid].xsk_refcnt, 1);
+		} else {
+			refcount_inc(&dev->_rx[qid].xsk_refcnt);
+			WRITE_ONCE(dev->_rx[qid].xsk, NULL);
+		}
+	}
+
 out_unlock:
 	if (err) {
 		dev_put(dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 2/8] bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush Ciara Loftus
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

Add a new XDP redirect helper called bpf_redirect_xsk which simply
returns the new XDP_REDIRECT_XSK action if the xsk refcnt for the
netdev_rx_queue is equal to one. Checking this value verifies that the
AF_XDP socket Rx ring is configured and there is exactly one xsk attached
to the queue.

XDP_REDIRECT_XSK indicates to the driver that the XSKMAP lookup can be
skipped and the pointer to the socket to redirect to can instead be
retrieved from the netdev_rx_queue on which the packet was received.
If the aforementioned conditions are not met, fallback to the behavior of
xdp_redirect_map which returns XDP_REDIRECT for a successful XSKMAP lookup.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/uapi/linux/bpf.h | 13 +++++++++++++
 kernel/bpf/verifier.c    |  7 ++++++-
 net/core/filter.c        | 22 ++++++++++++++++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6297eafdc40f..a33cc63c8e6f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4957,6 +4957,17 @@ union bpf_attr {
  *		**-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
  *		**-EBUSY** if failed to try lock mmap_lock.
  *		**-EINVAL** for invalid **flags**.
+ *
+ * long bpf_redirect_xsk(void *ctx, struct bpf_map *map, u32 key, u64 flags)
+ *	Description
+ *		Redirect the packet to the XDP socket associated with the netdev queue if
+ *		the socket has an rx ring configured and is the only socket attached to the
+ *		queue. Fall back to bpf_redirect_map behavior if either condition is not met.
+ *	Return
+ *		**XDP_REDIRECT_XSK** if successful.
+ *
+ *		**XDP_REDIRECT** if the fall back was successful, or the value of the
+ *		two lower bits of the *flags* argument on error
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5140,6 +5151,7 @@ union bpf_attr {
 	FN(skc_to_unix_sock),		\
 	FN(kallsyms_lookup_name),	\
 	FN(find_vma),			\
+	FN(redirect_xsk),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -5520,6 +5532,7 @@ enum xdp_action {
 	XDP_PASS,
 	XDP_TX,
 	XDP_REDIRECT,
+	XDP_REDIRECT_XSK,
 };
 
 /* user accessible metadata for XDP packet hook
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d31a031ab377..59a973f43965 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5526,7 +5526,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		break;
 	case BPF_MAP_TYPE_XSKMAP:
 		if (func_id != BPF_FUNC_redirect_map &&
-		    func_id != BPF_FUNC_map_lookup_elem)
+		    func_id != BPF_FUNC_map_lookup_elem &&
+		    func_id != BPF_FUNC_redirect_xsk)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_ARRAY_OF_MAPS:
@@ -5629,6 +5630,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    map->map_type != BPF_MAP_TYPE_XSKMAP)
 			goto error;
 		break;
+	case BPF_FUNC_redirect_xsk:
+		if (map->map_type != BPF_MAP_TYPE_XSKMAP)
+			goto error;
+		break;
 	case BPF_FUNC_sk_redirect_map:
 	case BPF_FUNC_msg_redirect_map:
 	case BPF_FUNC_sock_map_update:
diff --git a/net/core/filter.c b/net/core/filter.c
index 46f09a8fba20..4497ad046790 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4140,6 +4140,26 @@ static const struct bpf_func_proto bpf_xdp_redirect_map_proto = {
 	.arg3_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_xdp_redirect_xsk, struct xdp_buff *, xdp, struct bpf_map *, map,
+	   u32, ifindex, u64, flags)
+{
+#ifdef CONFIG_XDP_SOCKETS
+	if (likely(refcount_read(&xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk_refcnt) == 1))
+		return XDP_REDIRECT_XSK;
+#endif
+	return map->ops->map_redirect(map, ifindex, flags);
+}
+
+static const struct bpf_func_proto bpf_xdp_redirect_xsk_proto = {
+	.func           = bpf_xdp_redirect_xsk,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_CONST_MAP_PTR,
+	.arg3_type      = ARG_ANYTHING,
+	.arg4_type      = ARG_ANYTHING,
+};
+
 static unsigned long bpf_skb_copy(void *dst_buff, const void *skb,
 				  unsigned long off, unsigned long len)
 {
@@ -7469,6 +7489,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_redirect_proto;
 	case BPF_FUNC_redirect_map:
 		return &bpf_xdp_redirect_map_proto;
+	case BPF_FUNC_redirect_xsk:
+		return &bpf_xdp_redirect_xsk_proto;
 	case BPF_FUNC_xdp_adjust_tail:
 		return &bpf_xdp_adjust_tail_proto;
 	case BPF_FUNC_fib_lookup:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 2/8] bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-27 11:44     ` kernel test robot
  2021-11-16  7:37 ` [RFC PATCH bpf-next 4/8] i40e: handle the XDP_REDIRECT_XSK action Ciara Loftus
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

Handle the XDP_REDIRECT_XSK action on the SKB path by retrieving the
handle to the socket from the netdev_rx_queue struct and immediately
calling the xsk_generic_rcv function. Also, prepare for supporting
this action in the drivers by exposing the xsk_rcv and xsk_flush functions
so they can be used directly in the drivers.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/net/xdp_sock_drv.h | 21 +++++++++++++++++++++
 net/core/dev.c             | 14 ++++++++++++++
 net/core/filter.c          |  4 ++++
 net/xdp/xsk.c              |  6 ++++--
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 443d45951564..e923f5d1adb6 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -22,6 +22,8 @@ void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool);
 void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool);
 void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool);
 bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool);
+int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
+void xsk_flush(struct xdp_sock *xs);
 
 static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
 {
@@ -130,6 +132,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 	xp_dma_sync_for_device(pool, dma, size);
 }
 
+static inline struct xdp_sock *xsk_get_redirect_xsk(struct netdev_rx_queue *q)
+{
+	return READ_ONCE(q->xsk);
+}
+
 #else
 
 static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
@@ -179,6 +186,15 @@ static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
 	return false;
 }
 
+static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+	return 0;
+}
+
+static inline void xsk_flush(struct xdp_sock *xs)
+{
+}
+
 static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
 {
 	return 0;
@@ -264,6 +280,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 {
 }
 
+static inline struct xdp_sock *xsk_get_redirect_xsk(struct netdev_rx_queue *q)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_DRV_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index edeb811c454e..9b38b50f1f97 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -106,6 +106,7 @@
 #include <net/pkt_sched.h>
 #include <net/pkt_cls.h>
 #include <net/checksum.h>
+#include <net/xdp_sock.h>
 #include <net/xfrm.h>
 #include <linux/highmem.h>
 #include <linux/init.h>
@@ -4771,6 +4772,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
 	 * kfree_skb in response to actions it cannot handle/XDP_DROP).
 	 */
 	switch (act) {
+	case XDP_REDIRECT_XSK:
 	case XDP_REDIRECT:
 	case XDP_TX:
 		__skb_push(skb, mac_len);
@@ -4819,6 +4821,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 
 	act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog);
 	switch (act) {
+	case XDP_REDIRECT_XSK:
 	case XDP_REDIRECT:
 	case XDP_TX:
 	case XDP_PASS:
@@ -4875,6 +4878,17 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 		act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
 		if (act != XDP_PASS) {
 			switch (act) {
+#ifdef CONFIG_XDP_SOCKETS
+			case XDP_REDIRECT_XSK:
+				struct xdp_sock *xs =
+					READ_ONCE(skb->dev->_rx[xdp.rxq->queue_index].xsk);
+
+				err = xsk_generic_rcv(xs, &xdp);
+				if (err)
+					goto out_redir;
+				consume_skb(skb);
+				break;
+#endif
 			case XDP_REDIRECT:
 				err = xdp_do_generic_redirect(skb->dev, skb,
 							      &xdp, xdp_prog);
diff --git a/net/core/filter.c b/net/core/filter.c
index 4497ad046790..c65262722c64 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8203,7 +8203,11 @@ static bool xdp_is_valid_access(int off, int size,
 
 void bpf_warn_invalid_xdp_action(u32 act)
 {
+#ifdef CONFIG_XDP_SOCKETS
+	const u32 act_max = XDP_REDIRECT_XSK;
+#else
 	const u32 act_max = XDP_REDIRECT;
+#endif
 
 	WARN_ONCE(1, "%s XDP return value %u, expect packet loss!\n",
 		  act > act_max ? "Illegal" : "Driver unsupported",
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 94ee524b9ca8..ce004f5fae64 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -226,12 +226,13 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
 	return 0;
 }
 
-static void xsk_flush(struct xdp_sock *xs)
+void xsk_flush(struct xdp_sock *xs)
 {
 	xskq_prod_submit(xs->rx);
 	__xskq_cons_release(xs->pool->fq);
 	sock_def_readable(&xs->sk);
 }
+EXPORT_SYMBOL(xsk_flush);
 
 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -247,7 +248,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	return err;
 }
 
-static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	int err;
 	u32 len;
@@ -266,6 +267,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		xdp_return_buff(xdp);
 	return err;
 }
+EXPORT_SYMBOL(xsk_rcv);
 
 int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 4/8] i40e: handle the XDP_REDIRECT_XSK action
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (2 preceding siblings ...)
  2021-11-16  7:37 ` [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 5/8] xsk: implement a batched version of xsk_rcv Ciara Loftus
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

If the BPF program returns XDP_REDIRECT_XSK, obtain the pointer to the
socket from the netdev_rx_queue struct and call the newly exposed xsk_rcv
function to push the XDP descriptor to the Rx ring. Then use xsk_flush to
flush the socket.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 13 +++++++++++-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 +++++++++++++------
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 10a83e5385c7..b6a883a8d088 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -4,6 +4,7 @@
 #include <linux/prefetch.h>
 #include <linux/bpf_trace.h>
 #include <net/xdp.h>
+#include <net/xdp_sock_drv.h>
 #include "i40e.h"
 #include "i40e_trace.h"
 #include "i40e_prototype.h"
@@ -2296,6 +2297,7 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	int err, result = I40E_XDP_PASS;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
+	struct xdp_sock *xs;
 	u32 act;
 
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -2315,6 +2317,12 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		if (result == I40E_XDP_CONSUMED)
 			goto out_failure;
 		break;
+	case XDP_REDIRECT_XSK:
+		xs = xsk_get_redirect_xsk(&rx_ring->netdev->_rx[xdp->rxq->queue_index]);
+		err = xsk_rcv(xs, xdp);
+		if (err)
+			goto out_failure;
+		return I40E_XDP_REDIR_XSK;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
 		if (err)
@@ -2401,6 +2409,9 @@ void i40e_update_rx_stats(struct i40e_ring *rx_ring,
  **/
 void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res)
 {
+	if (xdp_res & I40E_XDP_REDIR_XSK)
+		xsk_flush(xsk_get_redirect_xsk(&rx_ring->netdev->_rx[rx_ring->queue_index]));
+
 	if (xdp_res & I40E_XDP_REDIR)
 		xdp_do_flush_map();
 
@@ -2516,7 +2527,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		}
 
 		if (xdp_res) {
-			if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) {
+			if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR | I40E_XDP_REDIR_XSK)) {
 				xdp_xmit |= xdp_res;
 				i40e_rx_buffer_flip(rx_ring, rx_buffer, size);
 			} else {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 19da3b22160f..17e521a71201 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -20,6 +20,7 @@ void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val);
 #define I40E_XDP_CONSUMED	BIT(0)
 #define I40E_XDP_TX		BIT(1)
 #define I40E_XDP_REDIR		BIT(2)
+#define I40E_XDP_REDIR_XSK	BIT(3)
 
 /*
  * build_ctob - Builds the Tx descriptor (cmd, offset and type) qword
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index ea06e957393e..31b794672ea5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -144,13 +144,14 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
  * @rx_ring: Rx ring
  * @xdp: xdp_buff used as input to the XDP program
  *
- * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR}
+ * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK}
  **/
 static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 {
 	int err, result = I40E_XDP_PASS;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
+	struct xdp_sock *xs;
 	u32 act;
 
 	/* NB! xdp_prog will always be !NULL, due to the fact that
@@ -159,14 +160,21 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
-	if (likely(act == XDP_REDIRECT)) {
-		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+	if (likely(act == XDP_REDIRECT_XSK)) {
+		xs = xsk_get_redirect_xsk(&rx_ring->netdev->_rx[xdp->rxq->queue_index]);
+		err = xsk_rcv(xs, xdp);
 		if (err)
 			goto out_failure;
-		return I40E_XDP_REDIR;
+		return I40E_XDP_REDIR_XSK;
 	}
 
 	switch (act) {
+	case XDP_REDIRECT:
+		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+		if (err)
+			goto out_failure;
+		result = I40E_XDP_REDIR;
+		break;
 	case XDP_PASS:
 		break;
 	case XDP_TX:
@@ -275,7 +283,8 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring,
 	*rx_packets = 1;
 	*rx_bytes = size;
 
-	if (likely(xdp_res == I40E_XDP_REDIR) || xdp_res == I40E_XDP_TX)
+	if (likely(xdp_res == I40E_XDP_REDIR_XSK) || xdp_res == I40E_XDP_REDIR ||
+	    xdp_res == I40E_XDP_TX)
 		return;
 
 	if (xdp_res == I40E_XDP_CONSUMED) {
@@ -371,7 +380,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 					  &rx_bytes, size, xdp_res);
 		total_rx_packets += rx_packets;
 		total_rx_bytes += rx_bytes;
-		xdp_xmit |= xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR);
+		xdp_xmit |= xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR | I40E_XDP_REDIR_XSK);
 		next_to_clean = (next_to_clean + 1) & count_mask;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 5/8] xsk: implement a batched version of xsk_rcv
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (3 preceding siblings ...)
  2021-11-16  7:37 ` [RFC PATCH bpf-next 4/8] i40e: handle the XDP_REDIRECT_XSK action Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 6/8] i40e: isolate descriptor processing in separate function Ciara Loftus
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

Introduce a batched version of xsk_rcv called xsk_rcv_batch which takes
an array of xdp_buffs and pushes them to the Rx ring. Also introduce a
batched version of xsk_buff_dma_sync_for_cpu.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 include/net/xdp_sock_drv.h  | 28 ++++++++++++++++++++++++++++
 include/net/xsk_buff_pool.h | 22 ++++++++++++++++++++++
 net/xdp/xsk.c               | 29 +++++++++++++++++++++++++++++
 net/xdp/xsk_queue.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 110 insertions(+)

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index e923f5d1adb6..0b352d7a34af 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -23,6 +23,7 @@ void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool);
 void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool);
 bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool);
 int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
+int xsk_rcv_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size);
 void xsk_flush(struct xdp_sock *xs);
 
 static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
@@ -125,6 +126,22 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_bu
 	xp_dma_sync_for_cpu(xskb);
 }
 
+static inline void xsk_buff_dma_sync_for_cpu_batch(struct xdp_buff **bufs,
+						   struct xsk_buff_pool *pool,
+						   int batch_size)
+{
+	struct xdp_buff_xsk *xskb;
+	int i;
+
+	if (!pool->dma_need_sync)
+		return;
+
+	for (i = 0; i < batch_size; i++) {
+		xskb = container_of(*(bufs + i), struct xdp_buff_xsk, xdp);
+		xp_dma_sync_for_cpu(xskb);
+	}
+}
+
 static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 						    dma_addr_t dma,
 						    size_t size)
@@ -191,6 +208,11 @@ static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	return 0;
 }
 
+static inline int xsk_rcv_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size)
+{
+	return 0;
+}
+
 static inline void xsk_flush(struct xdp_sock *xs)
 {
 }
@@ -274,6 +296,12 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_bu
 {
 }
 
+static inline void xsk_buff_dma_sync_for_cpu_batch(struct xdp_buff **bufs,
+						   struct xsk_buff_pool *pool,
+						   int batch_size)
+{
+}
+
 static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 						    dma_addr_t dma,
 						    size_t size)
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index ddeefc4a1040..f6d76c7eaf6b 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -214,6 +214,28 @@ static inline void xp_release(struct xdp_buff_xsk *xskb)
 		xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb;
 }
 
+/* Release a batch of xdp_buffs back to an xdp_buff_pool.
+ * The batch of buffs must all come from the same xdp_buff_pool. This way
+ * it is safe to push the batch to the top of the free_heads stack, because
+ * at least the same amount will have been popped from the stack earlier in
+ * the datapath.
+ */
+static inline void xp_release_batch(struct xdp_buff **bufs, int batch_size)
+{
+	struct xdp_buff_xsk *xskb = container_of(*bufs, struct xdp_buff_xsk, xdp);
+	struct xsk_buff_pool *pool = xskb->pool;
+	u32 tail = pool->free_heads_cnt;
+	u32 i;
+
+	if (pool->unaligned) {
+		for (i = 0; i < batch_size; i++) {
+			xskb = container_of(*(bufs + i), struct xdp_buff_xsk, xdp);
+			pool->free_heads[tail + i] = xskb;
+		}
+		pool->free_heads_cnt += batch_size;
+	}
+}
+
 static inline u64 xp_get_handle(struct xdp_buff_xsk *xskb)
 {
 	u64 offset = xskb->xdp.data - xskb->xdp.data_hard_start;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ce004f5fae64..22d00173a96f 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -151,6 +151,20 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	return 0;
 }
 
+static int __xsk_rcv_zc_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size)
+{
+	int err;
+
+	err = xskq_prod_reserve_desc_batch(xs->rx, bufs, batch_size);
+	if (err) {
+		xs->rx_queue_full++;
+		return -1;
+	}
+
+	xp_release_batch(bufs, batch_size);
+	return 0;
+}
+
 static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len)
 {
 	void *from_buf, *to_buf;
@@ -269,6 +283,21 @@ int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 }
 EXPORT_SYMBOL(xsk_rcv);
 
+int xsk_rcv_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size)
+{
+	int err;
+
+	err = xsk_rcv_check(xs, *bufs);
+	if (err)
+		return err;
+
+	if ((*bufs)->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL)
+		return -1;
+
+	return __xsk_rcv_zc_batch(xs, bufs, batch_size);
+}
+EXPORT_SYMBOL(xsk_rcv_batch);
+
 int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list);
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index e9aa2c236356..3be9f4a01d77 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -338,6 +338,11 @@ static inline bool xskq_prod_is_full(struct xsk_queue *q)
 	return xskq_prod_nb_free(q, 1) ? false : true;
 }
 
+static inline bool xskq_prod_is_full_n(struct xsk_queue *q, u32 n)
+{
+	return xskq_prod_nb_free(q, n) ? false : true;
+}
+
 static inline void xskq_prod_cancel(struct xsk_queue *q)
 {
 	q->cached_prod--;
@@ -399,6 +404,32 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 	return 0;
 }
 
+static inline int xskq_prod_reserve_desc_batch(struct xsk_queue *q, struct xdp_buff **bufs,
+					       int batch_size)
+{
+	struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
+	struct xdp_buff_xsk *xskb;
+	u64 addr;
+	u32 len;
+	u32 i;
+
+	if (xskq_prod_is_full_n(q, batch_size))
+		return -ENOSPC;
+
+	/* A, matches D */
+	for (i = 0; i < batch_size; i++) {
+		len = (*(bufs + i))->data_end - (*(bufs + i))->data;
+		xskb = container_of(*(bufs + i), struct xdp_buff_xsk, xdp);
+		addr = xp_get_handle(xskb);
+		ring->desc[(q->cached_prod + i) & q->ring_mask].addr = addr;
+		ring->desc[(q->cached_prod + i) & q->ring_mask].len = len;
+	}
+
+	q->cached_prod += batch_size;
+
+	return 0;
+}
+
 static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx)
 {
 	smp_store_release(&q->ring->producer, idx); /* B, matches C */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 6/8] i40e: isolate descriptor processing in separate function
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (4 preceding siblings ...)
  2021-11-16  7:37 ` [RFC PATCH bpf-next 5/8] xsk: implement a batched version of xsk_rcv Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 7/8] i40e: introduce batched XDP rx descriptor processing Ciara Loftus
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus, Cristian Dumitrescu

To prepare for batched processing, first isolate descriptor processing
in a separate function to make it easier to introduce the batched
interfaces.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 51 +++++++++++++++-------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 31b794672ea5..c994b4d9c38a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -323,28 +323,23 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring,
 	WARN_ON_ONCE(1);
 }
 
-/**
- * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
- * @rx_ring: Rx ring
- * @budget: NAPI budget
- *
- * Returns amount of work completed
- **/
-int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
+static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring,
+					 unsigned int *stat_rx_packets,
+					 unsigned int *stat_rx_bytes,
+					 unsigned int *xmit,
+					 int budget)
 {
-	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+	unsigned int total_rx_packets = *stat_rx_packets, total_rx_bytes = *stat_rx_bytes;
 	u16 next_to_clean = rx_ring->next_to_clean;
 	u16 count_mask = rx_ring->count - 1;
-	unsigned int xdp_res, xdp_xmit = 0;
-	bool failure = false;
+	unsigned int xdp_xmit = *xmit;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union i40e_rx_desc *rx_desc;
+		unsigned int size, xdp_res;
 		unsigned int rx_packets;
 		unsigned int rx_bytes;
 		struct xdp_buff *bi;
-		unsigned int size;
 		u64 qword;
 
 		rx_desc = I40E_RX_DESC(rx_ring, next_to_clean);
@@ -385,7 +380,33 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	}
 
 	rx_ring->next_to_clean = next_to_clean;
-	cleaned_count = (next_to_clean - rx_ring->next_to_use - 1) & count_mask;
+	*stat_rx_packets = total_rx_packets;
+	*stat_rx_bytes = total_rx_bytes;
+	*xmit = xdp_xmit;
+}
+
+/**
+ * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
+ * @rx_ring: Rx ring
+ * @budget: NAPI budget
+ *
+ * Returns amount of work completed
+ **/
+int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
+{
+	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
+	u16 count_mask = rx_ring->count - 1;
+	unsigned int xdp_xmit = 0;
+	bool failure = false;
+	u16 cleaned_count;
+
+	i40e_clean_rx_desc_zc(rx_ring,
+			      &total_rx_packets,
+			      &total_rx_bytes,
+			      &xdp_xmit,
+			      budget);
+
+	cleaned_count = (rx_ring->next_to_clean - rx_ring->next_to_use - 1) & count_mask;
 
 	if (cleaned_count >= I40E_RX_BUFFER_WRITE)
 		failure = !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
@@ -394,7 +415,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
 
 	if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
-		if (failure || next_to_clean == rx_ring->next_to_use)
+		if (failure ||  rx_ring->next_to_clean == rx_ring->next_to_use)
 			xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
 		else
 			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 7/8] i40e: introduce batched XDP rx descriptor processing
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (5 preceding siblings ...)
  2021-11-16  7:37 ` [RFC PATCH bpf-next 6/8] i40e: isolate descriptor processing in separate function Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  7:37 ` [RFC PATCH bpf-next 8/8] libbpf: use bpf_redirect_xsk in the default program Ciara Loftus
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus, Cristian Dumitrescu

Introduce batched processing of XDP frames in the i40e driver. The batch
size is fixed at 64.

First, the driver performs a lookahead in the rx ring to determine if
there are 64 contiguous descriptors available to be processed. If so,
and if the action returned from the bpf program run for each of the 64
descriptors is XDP_REDIRECT_XSK, the new xsk_rcv_batch API is used to
push the batch to the XDP socket Rx ring.

Logic to fallback to scalar processing is included for situations where
batch processing is not possible eg. not enough descriptors, ring wrap,
different actions returned from bpf program, etc.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 219 +++++++++++++++++++--
 1 file changed, 200 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index c994b4d9c38a..a578bb7b3b99 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -10,6 +10,9 @@
 #include "i40e_txrx_common.h"
 #include "i40e_xsk.h"
 
+#define I40E_DESCS_PER_BATCH 64
+#define I40E_XSK_BATCH_MASK ~(I40E_DESCS_PER_BATCH - 1)
+
 int i40e_alloc_rx_bi_zc(struct i40e_ring *rx_ring)
 {
 	unsigned long sz = sizeof(*rx_ring->rx_bi_zc) * rx_ring->count;
@@ -139,26 +142,12 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
 		i40e_xsk_pool_disable(vsi, qid);
 }
 
-/**
- * i40e_run_xdp_zc - Executes an XDP program on an xdp_buff
- * @rx_ring: Rx ring
- * @xdp: xdp_buff used as input to the XDP program
- *
- * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK}
- **/
-static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
+static int i40e_handle_xdp_action(struct i40e_ring *rx_ring, struct xdp_buff *xdp,
+				  struct bpf_prog *xdp_prog, u32 act)
 {
 	int err, result = I40E_XDP_PASS;
 	struct i40e_ring *xdp_ring;
-	struct bpf_prog *xdp_prog;
 	struct xdp_sock *xs;
-	u32 act;
-
-	/* NB! xdp_prog will always be !NULL, due to the fact that
-	 * this path is enabled by setting an XDP program.
-	 */
-	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
-	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	if (likely(act == XDP_REDIRECT_XSK)) {
 		xs = xsk_get_redirect_xsk(&rx_ring->netdev->_rx[xdp->rxq->queue_index]);
@@ -197,6 +186,21 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	return result;
 }
 
+/**
+ * i40e_run_xdp_zc - Executes an XDP program on an xdp_buff
+ * @rx_ring: Rx ring
+ * @xdp: xdp_buff used as input to the XDP program
+ *
+ * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK}
+ **/
+static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp,
+			   struct bpf_prog *xdp_prog)
+{
+	u32 act = bpf_prog_run_xdp(xdp_prog, xdp);
+
+	return i40e_handle_xdp_action(rx_ring, xdp, xdp_prog, act);
+}
+
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 {
 	u16 ntu = rx_ring->next_to_use;
@@ -218,6 +222,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 		dma = xsk_buff_xdp_get_dma(*xdp);
 		rx_desc->read.pkt_addr = cpu_to_le64(dma);
 		rx_desc->read.hdr_addr = 0;
+		rx_desc->wb.qword1.status_error_len = 0;
 
 		rx_desc++;
 		xdp++;
@@ -324,6 +329,7 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring,
 }
 
 static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring,
+					 struct bpf_prog *xdp_prog,
 					 unsigned int *stat_rx_packets,
 					 unsigned int *stat_rx_bytes,
 					 unsigned int *xmit,
@@ -370,7 +376,7 @@ static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring,
 		xsk_buff_set_size(bi, size);
 		xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool);
 
-		xdp_res = i40e_run_xdp_zc(rx_ring, bi);
+		xdp_res = i40e_run_xdp_zc(rx_ring, bi, xdp_prog);
 		i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets,
 					  &rx_bytes, size, xdp_res);
 		total_rx_packets += rx_packets;
@@ -385,6 +391,172 @@ static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring,
 	*xmit = xdp_xmit;
 }
 
+/**
+ * i40_rx_ring_lookahead - check for new descriptors in the rx ring
+ * @rx_ring: Rx ring
+ * @budget: NAPI budget
+ *
+ * Returns the number of available descriptors in contiguous memory ie.
+ * without a ring wrap.
+ *
+ **/
+static inline unsigned int i40_rx_ring_lookahead(struct i40e_ring *rx_ring,
+						 unsigned int budget)
+{
+	u32 used = (rx_ring->next_to_clean - rx_ring->next_to_use - 1) & (rx_ring->count - 1);
+	union i40e_rx_desc *rx_desc0 = (union i40e_rx_desc *)rx_ring->desc, *rx_desc;
+	u32 next_to_clean = rx_ring->next_to_clean;
+	u32 potential = rx_ring->count - used;
+	u16 count_mask = rx_ring->count - 1;
+	unsigned int size;
+	u64 qword;
+
+	budget &= I40E_XSK_BATCH_MASK;
+
+	while (budget) {
+		if (budget > potential)
+			goto next;
+		rx_desc = rx_desc0 + ((next_to_clean + budget - 1) & count_mask);
+		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+		dma_rmb();
+
+		size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
+			I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
+		if (size && ((next_to_clean + budget) <= count_mask))
+			return budget;
+
+next:
+		budget >>= 1;
+		budget &= I40E_XSK_BATCH_MASK;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_run_xdp_zc_batch - Executes an XDP program on an array of xdp_buffs
+ * @rx_ring: Rx ring
+ * @bufs: array of xdp_buffs used as input to the XDP program
+ * @res: array of ints with result for each buf if an error occurs or slow path taken.
+ *
+ * Returns zero if all xdp_buffs successfully took the fast path (XDP_REDIRECT_XSK).
+ * Otherwise returns -1 and sets individual results for each buf in the array *res.
+ * Individual results are one of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK}
+ **/
+static int i40e_run_xdp_zc_batch(struct i40e_ring *rx_ring, struct xdp_buff **bufs,
+				 struct bpf_prog *xdp_prog, int *res)
+{
+	u32 last_act = XDP_REDIRECT_XSK;
+	int runs = 0, ret = 0, err, i;
+
+	while ((runs < I40E_DESCS_PER_BATCH) && (last_act == XDP_REDIRECT_XSK))
+		last_act = bpf_prog_run_xdp(xdp_prog, *(bufs + runs++));
+
+	if (likely(runs == I40E_DESCS_PER_BATCH)) {
+		struct xdp_sock *xs =
+			xsk_get_redirect_xsk(&rx_ring->netdev->_rx[(*bufs)->rxq->queue_index]);
+
+		err = xsk_rcv_batch(xs, bufs, I40E_DESCS_PER_BATCH);
+		if (unlikely(err)) {
+			ret = -1;
+			for (i = 0; i < I40E_DESCS_PER_BATCH; i++)
+				*(res + i) = I40E_XDP_PASS;
+		}
+	} else {
+		/* Handle the result of each program run individually */
+		u32 act;
+
+		ret = -1;
+		for (i = 0; i < I40E_DESCS_PER_BATCH; i++) {
+			struct xdp_buff *xdp = *(bufs + i);
+
+			/* The result of the first runs-2 programs was XDP_REDIRECT_XSK.
+			 * The result of the subsequent program run was last_act.
+			 * Any remaining bufs have not yet had the program executed, so
+			 * execute it now.
+			 */
+
+			if (i < runs - 2)
+				act = XDP_REDIRECT_XSK;
+			else if (i == runs - 1)
+				act = last_act;
+			else
+				act = bpf_prog_run_xdp(xdp_prog, xdp);
+
+			*(res + i) = i40e_handle_xdp_action(rx_ring, xdp, xdp_prog, act);
+		}
+	}
+
+	return ret;
+}
+
+static inline void i40e_clean_rx_desc_zc_batch(struct i40e_ring *rx_ring,
+					       struct bpf_prog *xdp_prog,
+					       unsigned int *total_rx_packets,
+					       unsigned int *total_rx_bytes,
+					       unsigned int *xdp_xmit)
+{
+	u16 next_to_clean = rx_ring->next_to_clean;
+	unsigned int xdp_res[I40E_DESCS_PER_BATCH];
+	unsigned int size[I40E_DESCS_PER_BATCH];
+	unsigned int rx_packets, rx_bytes = 0;
+	union i40e_rx_desc *rx_desc;
+	struct xdp_buff **bufs;
+	int j, ret;
+	u64 qword;
+
+	rx_desc = I40E_RX_DESC(rx_ring, next_to_clean);
+
+	prefetch(rx_desc + I40E_DESCS_PER_BATCH);
+
+	for (j = 0; j < I40E_DESCS_PER_BATCH; j++) {
+		qword = le64_to_cpu((rx_desc + j)->wb.qword1.status_error_len);
+		size[j] = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
+			I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
+	}
+
+	/* This memory barrier is needed to keep us from reading
+	 * any other fields out of the rx_descs until we have
+	 * verified the descriptors have been written back.
+	 */
+	dma_rmb();
+
+	bufs = i40e_rx_bi(rx_ring, next_to_clean);
+
+	for (j = 0; j < I40E_DESCS_PER_BATCH; j++)
+		xsk_buff_set_size(*(bufs + j), size[j]);
+
+	xsk_buff_dma_sync_for_cpu_batch(bufs, rx_ring->xsk_pool, I40E_DESCS_PER_BATCH);
+
+	ret = i40e_run_xdp_zc_batch(rx_ring, bufs, xdp_prog, xdp_res);
+
+	if (unlikely(ret)) {
+		unsigned int err_rx_packets = 0, err_rx_bytes = 0;
+
+		rx_packets = 0;
+		rx_bytes = 0;
+
+		for (j = 0; j < I40E_DESCS_PER_BATCH; j++) {
+			i40e_handle_xdp_result_zc(rx_ring, *(bufs + j), rx_desc + j,
+						  &err_rx_packets, &err_rx_bytes, size[j],
+						  xdp_res[j]);
+			*xdp_xmit |= (xdp_res[j] & (I40E_XDP_TX | I40E_XDP_REDIR |
+						    I40E_XDP_REDIR_XSK));
+			rx_packets += err_rx_packets;
+			rx_bytes += err_rx_bytes;
+		}
+	} else {
+		rx_packets = I40E_DESCS_PER_BATCH;
+		for (j = 0; j < I40E_DESCS_PER_BATCH; j++)
+			rx_bytes += size[j];
+		*xdp_xmit |= I40E_XDP_REDIR_XSK;
+	}
+
+	rx_ring->next_to_clean += I40E_DESCS_PER_BATCH;
+	*total_rx_packets += rx_packets;
+	*total_rx_bytes += rx_bytes;
+}
+
 /**
  * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
  * @rx_ring: Rx ring
@@ -394,17 +566,26 @@ static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring,
  **/
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 {
+	int batch_budget = i40_rx_ring_lookahead(rx_ring, (unsigned int)budget);
+	struct bpf_prog *xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 count_mask = rx_ring->count - 1;
 	unsigned int xdp_xmit = 0;
 	bool failure = false;
 	u16 cleaned_count;
+	int i;
+
+	for (i = 0; i < batch_budget; i += I40E_DESCS_PER_BATCH)
+		i40e_clean_rx_desc_zc_batch(rx_ring, xdp_prog,
+					    &total_rx_packets,
+					    &total_rx_bytes,
+					    &xdp_xmit);
 
-	i40e_clean_rx_desc_zc(rx_ring,
+	i40e_clean_rx_desc_zc(rx_ring, xdp_prog,
 			      &total_rx_packets,
 			      &total_rx_bytes,
 			      &xdp_xmit,
-			      budget);
+			      (unsigned int)budget - total_rx_packets);
 
 	cleaned_count = (rx_ring->next_to_clean - rx_ring->next_to_use - 1) & count_mask;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH bpf-next 8/8] libbpf: use bpf_redirect_xsk in the default program
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (6 preceding siblings ...)
  2021-11-16  7:37 ` [RFC PATCH bpf-next 7/8] i40e: introduce batched XDP rx descriptor processing Ciara Loftus
@ 2021-11-16  7:37 ` Ciara Loftus
  2021-11-16  9:43 ` [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Jesper Dangaard Brouer
  2021-11-18  2:54 ` Alexei Starovoitov
  9 siblings, 0 replies; 17+ messages in thread
From: Ciara Loftus @ 2021-11-16  7:37 UTC (permalink / raw)
  To: netdev, bpf
  Cc: ast, daniel, davem, kuba, hawk, john.fastabend, toke, bjorn,
	magnus.karlsson, jonathan.lemon, maciej.fijalkowski,
	Ciara Loftus

NOTE: This will be committed to libxdp, not libbpf as the xsk support in
that library has been deprecated. It is only here to serve as an example
of what will be added into libxdp.

Use the new bpf_redirect_xsk helper in the default program if the kernel
supports it.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 tools/include/uapi/linux/bpf.h | 13 +++++++++
 tools/lib/bpf/xsk.c            | 50 ++++++++++++++++++++++++++++++++--
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6297eafdc40f..a33cc63c8e6f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4957,6 +4957,17 @@ union bpf_attr {
  *		**-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
  *		**-EBUSY** if failed to try lock mmap_lock.
  *		**-EINVAL** for invalid **flags**.
+ *
+ * long bpf_redirect_xsk(void *ctx, struct bpf_map *map, u32 key, u64 flags)
+ *	Description
+ *		Redirect the packet to the XDP socket associated with the netdev queue if
+ *		the socket has an rx ring configured and is the only socket attached to the
+ *		queue. Fall back to bpf_redirect_map behavior if either condition is not met.
+ *	Return
+ *		**XDP_REDIRECT_XSK** if successful.
+ *
+ *		**XDP_REDIRECT** if the fall back was successful, or the value of the
+ *		two lower bits of the *flags* argument on error
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5140,6 +5151,7 @@ union bpf_attr {
 	FN(skc_to_unix_sock),		\
 	FN(kallsyms_lookup_name),	\
 	FN(find_vma),			\
+	FN(redirect_xsk),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -5520,6 +5532,7 @@ enum xdp_action {
 	XDP_PASS,
 	XDP_TX,
 	XDP_REDIRECT,
+	XDP_REDIRECT_XSK,
 };
 
 /* user accessible metadata for XDP packet hook
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index fdb22f5405c9..ec66d4206af0 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -50,6 +50,7 @@
 enum xsk_prog {
 	XSK_PROG_FALLBACK,
 	XSK_PROG_REDIRECT_FLAGS,
+	XSK_PROG_REDIRECT_XSK_FLAGS,
 };
 
 struct xsk_umem {
@@ -374,7 +375,15 @@ static enum xsk_prog get_xsk_prog(void)
 		BPF_EMIT_CALL(BPF_FUNC_redirect_map),
 		BPF_EXIT_INSN(),
 	};
-	int prog_fd, map_fd, ret, insn_cnt = ARRAY_SIZE(insns);
+	struct bpf_insn insns_xsk[] = {
+		BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1),
+		BPF_LD_MAP_FD(BPF_REG_2, 0),
+		BPF_MOV64_IMM(BPF_REG_3, 0),
+		BPF_MOV64_IMM(BPF_REG_4, XDP_PASS),
+		BPF_EMIT_CALL(BPF_FUNC_redirect_xsk),
+		BPF_EXIT_INSN(),
+	};
+	int prog_fd, map_fd, ret;
 
 	memset(&map_attr, 0, sizeof(map_attr));
 	map_attr.map_type = BPF_MAP_TYPE_XSKMAP;
@@ -386,9 +395,25 @@ static enum xsk_prog get_xsk_prog(void)
 	if (map_fd < 0)
 		return detected;
 
+	insns_xsk[1].imm = map_fd;
+
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns_xsk, ARRAY_SIZE(insns_xsk),
+				NULL);
+	if (prog_fd < 0)
+		goto prog_redirect;
+
+	ret = bpf_prog_test_run(prog_fd, 0, &data_in, 1, &data_out, &size_out, &retval, &duration);
+	if (!ret && retval == XDP_PASS) {
+		detected = XSK_PROG_REDIRECT_XSK_FLAGS;
+		close(map_fd);
+		close(prog_fd);
+		return detected;
+	}
+
+prog_redirect:
 	insns[0].imm = map_fd;
 
-	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, NULL);
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, ARRAY_SIZE(insns), NULL);
 	if (prog_fd < 0) {
 		close(map_fd);
 		return detected;
@@ -483,10 +508,29 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		BPF_EMIT_CALL(BPF_FUNC_redirect_map),
 		BPF_EXIT_INSN(),
 	};
+
+	/* This is the post-5.13 kernel C-program:
+	 * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
+	 * {
+	 *     return bpf_redirect_xsk(ctx, &xsks_map, ctx->rx_queue_index, XDP_PASS);
+	 * }
+	 */
+	struct bpf_insn prog_redirect_xsk_flags[] = {
+		/* r3 = *(u32 *)(r1 + 16) */
+		BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, 16),
+		/* r2 = xskmap[] */
+		BPF_LD_MAP_FD(BPF_REG_2, ctx->xsks_map_fd),
+		/* r4 = XDP_PASS */
+		BPF_MOV64_IMM(BPF_REG_4, 2),
+		/* call bpf_redirect_xsk */
+		BPF_EMIT_CALL(BPF_FUNC_redirect_xsk),
+		BPF_EXIT_INSN(),
+	};
 	size_t insns_cnt[] = {sizeof(prog) / sizeof(struct bpf_insn),
 			      sizeof(prog_redirect_flags) / sizeof(struct bpf_insn),
+			      sizeof(prog_redirect_xsk_flags) / sizeof(struct bpf_insn),
 	};
-	struct bpf_insn *progs[] = {prog, prog_redirect_flags};
+	struct bpf_insn *progs[] = {prog, prog_redirect_flags, prog_redirect_xsk_flags};
 	enum xsk_prog option = get_xsk_prog();
 	LIBBPF_OPTS(bpf_prog_load_opts, opts,
 		.log_buf = log_buf,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (7 preceding siblings ...)
  2021-11-16  7:37 ` [RFC PATCH bpf-next 8/8] libbpf: use bpf_redirect_xsk in the default program Ciara Loftus
@ 2021-11-16  9:43 ` Jesper Dangaard Brouer
  2021-11-16 16:29   ` Loftus, Ciara
  2021-11-18  2:54 ` Alexei Starovoitov
  9 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2021-11-16  9:43 UTC (permalink / raw)
  To: Ciara Loftus, netdev, bpf
  Cc: brouer, ast, daniel, davem, kuba, hawk, john.fastabend, toke,
	bjorn, magnus.karlsson, jonathan.lemon, maciej.fijalkowski



On 16/11/2021 08.37, Ciara Loftus wrote:
> The common case for AF_XDP sockets (xsks) is creating a single xsk on a queue for sending and
> receiving frames as this is analogous to HW packet steering through RSS and other classification
> methods in the NIC. AF_XDP uses the xdp redirect infrastructure to direct packets to the socket. It
> was designed for the much more complicated case of DEVMAP xdp_redirects which directs traffic to
> another netdev and thus potentially another driver. In the xsk redirect case, by skipping the
> unnecessary parts of this common code we can significantly improve performance and pave the way
> for batching in the driver. This RFC proposes one such way to simplify the infrastructure which
> yields a 27% increase in throughput and a decrease in cycles per packet of 24 cycles [1]. The goal
> of this RFC is to start a discussion on how best to simplify the single-socket datapath while
> providing one method as an example.
> 
> Current approach:
> 1. XSK pointer: an xsk is created and a handle to the xsk is stored in the XSKMAP.
> 2. XDP program: bpf_redirect_map helper triggers the XSKMAP lookup which stores the result (handle
> to the xsk) and the map type (XSKMAP) in the percpu bpf_redirect_info struct. The XDP_REDIRECT
> action is returned.
> 3. XDP_REDIRECT handling called by the driver: the map type (XSKMAP) is read from the
> bpf_redirect_info which selects the xsk_map_redirect path. The xsk pointer is retrieved from the
> bpf_redirect_info and the XDP descriptor is pushed to the xsk's Rx ring. The socket is added to a
> list for flushing later.
> 4. xdp_do_flush: iterate through the lists of all maps that can be used for redirect (CPUMAP,
> DEVMAP and XSKMAP). When XSKMAP is flushed, go through all xsks that had any traffic redirected to
> them and bump the Rx ring head pointer(s).
> 
> For the end goal of submitting the descriptor to the Rx ring and bumping the head pointer of that
> ring, only some of these steps are needed. The rest is overhead. The bpf_redirect_map
> infrastructure is needed for all other redirect operations, but is not necessary when redirecting
> to a single AF_XDP socket. And similarly, flushing the list for every map type in step 4 is not
> necessary when only one socket needs to be flushed.
> 
> Proposed approach:
> 1. XSK pointer: an xsk is created and a handle to the xsk is stored both in the XSKMAP and also the
> netdev_rx_queue struct.
> 2. XDP program: new bpf_redirect_xsk helper returns XDP_REDIRECT_XSK.
> 3. XDP_REDIRECT_XSK handling called by the driver: the xsk pointer is retrieved from the
> netdev_rx_queue struct and the XDP descriptor is pushed to the xsk's Rx ring.
> 4. xsk_flush: fetch the handle from the netdev_rx_queue and flush the xsk.
> 
> This fast path is triggered on XDP_REDIRECT_XSK if:
>    (i) AF_XDP socket SW Rx ring configured
>   (ii) Exactly one xsk attached to the queue
> If any of these conditions are not met, fall back to the same behavior as the original approach:
> xdp_redirect_map. This is handled under-the-hood in the new bpf_xdp_redirect_xsk helper so the user
> does not need to be aware of these conditions.
> 
> Batching:
> With this new approach it is possible to optimize the driver by submitting a batch of descriptors
> to the Rx ring in Step 3 of the new approach by simply verifying that the action returned from
> every program run of each packet in a batch equals XDP_REDIRECT_XSK. That's because with this
> action we know the socket to redirect to will be the same for each packet in the batch. This is
> not possible with XDP_REDIRECT because the xsk pointer is stored in the bpf_redirect_info and not
> guaranteed to be the same for every packet in a batch.
> 
> [1] Performance:
> The benchmarks were performed on VM running a 2.4GHz Ice Lake host with an i40e device passed
> through. The xdpsock app was run on a single core with busy polling and configured in 'rxonly' mode.
> ./xdpsock -i <iface> -r -B
> The improvement in throughput when using the new bpf helper and XDP action was measured at ~13% for
> scalar processing, with reduction in cycles per packet of ~13. A further ~14% improvement in
> throughput and reduction of ~11 cycles per packet was measured when the batched i40e driver path
> was used, for a total improvement of ~27% in throughput and reduction of ~24 cycles per packet.
> 
> Other approaches considered:
> Two other approaches were considered. The advantage of both being that neither involved introducing
> a new XDP action. The first alternative approach considered was to create a new map type
> BPF_MAP_TYPE_XSKMAP_DIRECT. When the XDP_REDIRECT action was returned, this map type could be
> checked and used as an indicator to skip the map lookup and use the netdev_rx_queue xsk instead.
> The second approach considered was similar and involved using a new bpf_redirect_info flag which
> could be used in a similar fashion.
> While both approaches yielded a performance improvement they were measured at about half of what
> was measured for the approach outlined in this RFC. It seems using bpf_redirect_info is too
> expensive.

I think it was Bjørn that discovered that accessing the per CPU 
bpf_redirect_info struct have an overhead of approx 2 ns (times 2.4GHz 
~4.8 cycles).  Your reduction in cycles per packet was ~13, where ~4.8 
seem to be large.

The code access this_cpu_ptr(&bpf_redirect_info) two times.
One time in the BPF-helper redirect call and second in xdp_do_redirect.
(Hint xdp_redirect_map end-up calling __bpf_xdp_redirect_map)

Thus, it seems (as you say), the bpf_redirect_info approach is too 
expensive.  May be should look at storing bpf_redirect_info in a place 
that doesn't requires the this_cpu_ptr() lookup... or cache the lookup 
per NAPI cycle.
Have you tried this?


> Also, centralised processing of XDP actions was investigated. This would involve porting all drivers
> to a common interface for handling XDP actions which would greatly simplify the work involved in
> adding support for new XDP actions such as XDP_REDIRECT_XSK. However it was deemed at this point to
> be more complex than adding support for the new action to every driver. Should this series be
> considered worth pursuing for a proper patch set, the intention would be to update each driver
> individually.

I'm fine with adding a new helper, but I don't like introducing a new 
XDP_REDIRECT_XSK action, which requires updating ALL the drivers.

With XDP_REDIRECT infra we beleived we didn't need to add more 
XDP-action code to drivers, as we multiplex/add new features by 
extending the bpf_redirect_info.
In this extreme performance case, it seems the this_cpu_ptr "lookup" of 
bpf_redirect_info is the performance issue itself.

Could you experiement with different approaches that modify 
xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called, 
prior to this_cpu_ptr() call.
(Thus, avoiding to introduce a new XDP-action).

> Thank you to Magnus Karlsson and Maciej Fijalkowski for several suggestions and insight provided.
> 
> TODO:
> * Add selftest(s)
> * Add support for all copy and zero copy drivers
> * Libxdp support
> 
> The series applies on commit e5043894b21f ("bpftool: Use libbpf_get_error() to check error")
> 
> Thanks,
> Ciara
> 
> Ciara Loftus (8):
>    xsk: add struct xdp_sock to netdev_rx_queue
>    bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action
>    xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
>    i40e: handle the XDP_REDIRECT_XSK action
>    xsk: implement a batched version of xsk_rcv
>    i40e: isolate descriptor processing in separate function
>    i40e: introduce batched XDP rx descriptor processing
>    libbpf: use bpf_redirect_xsk in the default program
> 
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  13 +-
>   .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 285 +++++++++++++++---
>   include/linux/netdevice.h                     |   2 +
>   include/net/xdp_sock_drv.h                    |  49 +++
>   include/net/xsk_buff_pool.h                   |  22 ++
>   include/uapi/linux/bpf.h                      |  13 +
>   kernel/bpf/verifier.c                         |   7 +-
>   net/core/dev.c                                |  14 +
>   net/core/filter.c                             |  26 ++
>   net/xdp/xsk.c                                 |  69 ++++-
>   net/xdp/xsk_queue.h                           |  31 ++
>   tools/include/uapi/linux/bpf.h                |  13 +
>   tools/lib/bpf/xsk.c                           |  50 ++-
>   14 files changed, 551 insertions(+), 44 deletions(-)
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
  2021-11-16  9:43 ` [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Jesper Dangaard Brouer
@ 2021-11-16 16:29   ` Loftus, Ciara
  2021-11-17 14:24     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Loftus, Ciara @ 2021-11-16 16:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, bpf
  Cc: brouer, ast, daniel, davem, kuba, hawk, john.fastabend, toke,
	bjorn, Karlsson, Magnus, jonathan.lemon, Fijalkowski, Maciej

> 
> 
> 
> On 16/11/2021 08.37, Ciara Loftus wrote:
> > The common case for AF_XDP sockets (xsks) is creating a single xsk on a
> queue for sending and
> > receiving frames as this is analogous to HW packet steering through RSS
> and other classification
> > methods in the NIC. AF_XDP uses the xdp redirect infrastructure to direct
> packets to the socket. It
> > was designed for the much more complicated case of DEVMAP
> xdp_redirects which directs traffic to
> > another netdev and thus potentially another driver. In the xsk redirect
> case, by skipping the
> > unnecessary parts of this common code we can significantly improve
> performance and pave the way
> > for batching in the driver. This RFC proposes one such way to simplify the
> infrastructure which
> > yields a 27% increase in throughput and a decrease in cycles per packet of
> 24 cycles [1]. The goal
> > of this RFC is to start a discussion on how best to simplify the single-socket
> datapath while
> > providing one method as an example.
> >
> > Current approach:
> > 1. XSK pointer: an xsk is created and a handle to the xsk is stored in the
> XSKMAP.
> > 2. XDP program: bpf_redirect_map helper triggers the XSKMAP lookup
> which stores the result (handle
> > to the xsk) and the map type (XSKMAP) in the percpu bpf_redirect_info
> struct. The XDP_REDIRECT
> > action is returned.
> > 3. XDP_REDIRECT handling called by the driver: the map type (XSKMAP) is
> read from the
> > bpf_redirect_info which selects the xsk_map_redirect path. The xsk
> pointer is retrieved from the
> > bpf_redirect_info and the XDP descriptor is pushed to the xsk's Rx ring. The
> socket is added to a
> > list for flushing later.
> > 4. xdp_do_flush: iterate through the lists of all maps that can be used for
> redirect (CPUMAP,
> > DEVMAP and XSKMAP). When XSKMAP is flushed, go through all xsks that
> had any traffic redirected to
> > them and bump the Rx ring head pointer(s).
> >
> > For the end goal of submitting the descriptor to the Rx ring and bumping
> the head pointer of that
> > ring, only some of these steps are needed. The rest is overhead. The
> bpf_redirect_map
> > infrastructure is needed for all other redirect operations, but is not
> necessary when redirecting
> > to a single AF_XDP socket. And similarly, flushing the list for every map type
> in step 4 is not
> > necessary when only one socket needs to be flushed.
> >
> > Proposed approach:
> > 1. XSK pointer: an xsk is created and a handle to the xsk is stored both in
> the XSKMAP and also the
> > netdev_rx_queue struct.
> > 2. XDP program: new bpf_redirect_xsk helper returns XDP_REDIRECT_XSK.
> > 3. XDP_REDIRECT_XSK handling called by the driver: the xsk pointer is
> retrieved from the
> > netdev_rx_queue struct and the XDP descriptor is pushed to the xsk's Rx
> ring.
> > 4. xsk_flush: fetch the handle from the netdev_rx_queue and flush the
> xsk.
> >
> > This fast path is triggered on XDP_REDIRECT_XSK if:
> >    (i) AF_XDP socket SW Rx ring configured
> >   (ii) Exactly one xsk attached to the queue
> > If any of these conditions are not met, fall back to the same behavior as the
> original approach:
> > xdp_redirect_map. This is handled under-the-hood in the new
> bpf_xdp_redirect_xsk helper so the user
> > does not need to be aware of these conditions.
> >
> > Batching:
> > With this new approach it is possible to optimize the driver by submitting a
> batch of descriptors
> > to the Rx ring in Step 3 of the new approach by simply verifying that the
> action returned from
> > every program run of each packet in a batch equals XDP_REDIRECT_XSK.
> That's because with this
> > action we know the socket to redirect to will be the same for each packet in
> the batch. This is
> > not possible with XDP_REDIRECT because the xsk pointer is stored in the
> bpf_redirect_info and not
> > guaranteed to be the same for every packet in a batch.
> >
> > [1] Performance:
> > The benchmarks were performed on VM running a 2.4GHz Ice Lake host
> with an i40e device passed
> > through. The xdpsock app was run on a single core with busy polling and
> configured in 'rxonly' mode.
> > ./xdpsock -i <iface> -r -B
> > The improvement in throughput when using the new bpf helper and XDP
> action was measured at ~13% for
> > scalar processing, with reduction in cycles per packet of ~13. A further ~14%
> improvement in
> > throughput and reduction of ~11 cycles per packet was measured when
> the batched i40e driver path
> > was used, for a total improvement of ~27% in throughput and reduction of
> ~24 cycles per packet.
> >
> > Other approaches considered:
> > Two other approaches were considered. The advantage of both being that
> neither involved introducing
> > a new XDP action. The first alternative approach considered was to create a
> new map type
> > BPF_MAP_TYPE_XSKMAP_DIRECT. When the XDP_REDIRECT action was
> returned, this map type could be
> > checked and used as an indicator to skip the map lookup and use the
> netdev_rx_queue xsk instead.
> > The second approach considered was similar and involved using a new
> bpf_redirect_info flag which
> > could be used in a similar fashion.
> > While both approaches yielded a performance improvement they were
> measured at about half of what
> > was measured for the approach outlined in this RFC. It seems using
> bpf_redirect_info is too
> > expensive.
> 
> I think it was Bjørn that discovered that accessing the per CPU
> bpf_redirect_info struct have an overhead of approx 2 ns (times 2.4GHz
> ~4.8 cycles).  Your reduction in cycles per packet was ~13, where ~4.8
> seem to be large.
> 
> The code access this_cpu_ptr(&bpf_redirect_info) two times.
> One time in the BPF-helper redirect call and second in xdp_do_redirect.
> (Hint xdp_redirect_map end-up calling __bpf_xdp_redirect_map)
> 
> Thus, it seems (as you say), the bpf_redirect_info approach is too
> expensive.  May be should look at storing bpf_redirect_info in a place
> that doesn't requires the this_cpu_ptr() lookup... or cache the lookup
> per NAPI cycle.
> Have you tried this?
> 
> 
> > Also, centralised processing of XDP actions was investigated. This would
> involve porting all drivers
> > to a common interface for handling XDP actions which would greatly
> simplify the work involved in
> > adding support for new XDP actions such as XDP_REDIRECT_XSK. However
> it was deemed at this point to
> > be more complex than adding support for the new action to every driver.
> Should this series be
> > considered worth pursuing for a proper patch set, the intention would be
> to update each driver
> > individually.
> 
> I'm fine with adding a new helper, but I don't like introducing a new
> XDP_REDIRECT_XSK action, which requires updating ALL the drivers.
> 
> With XDP_REDIRECT infra we beleived we didn't need to add more
> XDP-action code to drivers, as we multiplex/add new features by
> extending the bpf_redirect_info.
> In this extreme performance case, it seems the this_cpu_ptr "lookup" of
> bpf_redirect_info is the performance issue itself.
> 
> Could you experiement with different approaches that modify
> xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called,
> prior to this_cpu_ptr() call.
> (Thus, avoiding to introduce a new XDP-action).

Thanks for your feedback Jesper.
I understand the hesitation of adding a new action. If we can achieve the same improvement without
introducing a new action I would be very happy!
Without new the action we'll need a new way to indicate that the bpf_redirect_xsk helper was
called. Maybe another new field in the netdev alongside the xsk_refcnt. Or else extend
bpf_redirect_info - if we find a new home for it that it's too costly to access.
Thanks for your suggestions. I'll experiment as you suggested and report back.

Thanks,
Ciara

> 
> > Thank you to Magnus Karlsson and Maciej Fijalkowski for several
> suggestions and insight provided.
> >
> > TODO:
> > * Add selftest(s)
> > * Add support for all copy and zero copy drivers
> > * Libxdp support
> >
> > The series applies on commit e5043894b21f ("bpftool: Use
> libbpf_get_error() to check error")
> >
> > Thanks,
> > Ciara
> >
> > Ciara Loftus (8):
> >    xsk: add struct xdp_sock to netdev_rx_queue
> >    bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action
> >    xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
> >    i40e: handle the XDP_REDIRECT_XSK action
> >    xsk: implement a batched version of xsk_rcv
> >    i40e: isolate descriptor processing in separate function
> >    i40e: introduce batched XDP rx descriptor processing
> >    libbpf: use bpf_redirect_xsk in the default program
> >
> >   drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  13 +-
> >   .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 285 +++++++++++++++---
> >   include/linux/netdevice.h                     |   2 +
> >   include/net/xdp_sock_drv.h                    |  49 +++
> >   include/net/xsk_buff_pool.h                   |  22 ++
> >   include/uapi/linux/bpf.h                      |  13 +
> >   kernel/bpf/verifier.c                         |   7 +-
> >   net/core/dev.c                                |  14 +
> >   net/core/filter.c                             |  26 ++
> >   net/xdp/xsk.c                                 |  69 ++++-
> >   net/xdp/xsk_queue.h                           |  31 ++
> >   tools/include/uapi/linux/bpf.h                |  13 +
> >   tools/lib/bpf/xsk.c                           |  50 ++-
> >   14 files changed, 551 insertions(+), 44 deletions(-)
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
  2021-11-16 16:29   ` Loftus, Ciara
@ 2021-11-17 14:24     ` Toke Høiland-Jørgensen
  2021-11-19 15:48       ` Loftus, Ciara
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-17 14:24 UTC (permalink / raw)
  To: Loftus, Ciara, Jesper Dangaard Brouer, netdev, bpf
  Cc: brouer, ast, daniel, davem, kuba, hawk, john.fastabend, bjorn,
	Karlsson, Magnus, jonathan.lemon, Fijalkowski, Maciej

"Loftus, Ciara" <ciara.loftus@intel.com> writes:

>> I'm fine with adding a new helper, but I don't like introducing a new
>> XDP_REDIRECT_XSK action, which requires updating ALL the drivers.
>> 
>> With XDP_REDIRECT infra we beleived we didn't need to add more
>> XDP-action code to drivers, as we multiplex/add new features by
>> extending the bpf_redirect_info.
>> In this extreme performance case, it seems the this_cpu_ptr "lookup" of
>> bpf_redirect_info is the performance issue itself.
>> 
>> Could you experiement with different approaches that modify
>> xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called,
>> prior to this_cpu_ptr() call.
>> (Thus, avoiding to introduce a new XDP-action).
>
> Thanks for your feedback Jesper.
> I understand the hesitation of adding a new action. If we can achieve the same improvement without
> introducing a new action I would be very happy!
> Without new the action we'll need a new way to indicate that the bpf_redirect_xsk helper was
> called. Maybe another new field in the netdev alongside the xsk_refcnt. Or else extend
> bpf_redirect_info - if we find a new home for it that it's too costly to access.
> Thanks for your suggestions. I'll experiment as you suggested and
> report back.

I'll add a +1 to the "let's try to solve this without a new return code" :)

Also, I don't think we need a new helper either; the bpf_redirect()
helper takes a flags argument, so we could just use ifindex=0,
flags=DEV_XSK or something like that.

Also, I think the batching in the driver idea can be generalised: we
just need to generalise the idea of "are all these packets going to the
same place" and have a batched version of xdp_do_redirect(), no? The
other map types do batching internally already, though, so I'm wondering
why batching in the driver helps XSK?

-Toke


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
  2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
                   ` (8 preceding siblings ...)
  2021-11-16  9:43 ` [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Jesper Dangaard Brouer
@ 2021-11-18  2:54 ` Alexei Starovoitov
  9 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2021-11-18  2:54 UTC (permalink / raw)
  To: Ciara Loftus
  Cc: netdev, bpf, ast, daniel, davem, kuba, hawk, john.fastabend,
	toke, bjorn, magnus.karlsson, jonathan.lemon, maciej.fijalkowski

On Tue, Nov 16, 2021 at 07:37:34AM +0000, Ciara Loftus wrote:
> The common case for AF_XDP sockets (xsks) is creating a single xsk on a queue for sending and 
> receiving frames as this is analogous to HW packet steering through RSS and other classification 
> methods in the NIC. AF_XDP uses the xdp redirect infrastructure to direct packets to the socket. It 
> was designed for the much more complicated case of DEVMAP xdp_redirects which directs traffic to 
> another netdev and thus potentially another driver. In the xsk redirect case, by skipping the 
> unnecessary parts of this common code we can significantly improve performance and pave the way 
> for batching in the driver. This RFC proposes one such way to simplify the infrastructure which 
> yields a 27% increase in throughput and a decrease in cycles per packet of 24 cycles [1]. The goal 
> of this RFC is to start a discussion on how best to simplify the single-socket datapath while 
> providing one method as an example.
> 
> Current approach:
> 1. XSK pointer: an xsk is created and a handle to the xsk is stored in the XSKMAP.
> 2. XDP program: bpf_redirect_map helper triggers the XSKMAP lookup which stores the result (handle 
> to the xsk) and the map type (XSKMAP) in the percpu bpf_redirect_info struct. The XDP_REDIRECT 
> action is returned.
> 3. XDP_REDIRECT handling called by the driver: the map type (XSKMAP) is read from the 
> bpf_redirect_info which selects the xsk_map_redirect path. The xsk pointer is retrieved from the
> bpf_redirect_info and the XDP descriptor is pushed to the xsk's Rx ring. The socket is added to a
> list for flushing later.
> 4. xdp_do_flush: iterate through the lists of all maps that can be used for redirect (CPUMAP, 
> DEVMAP and XSKMAP). When XSKMAP is flushed, go through all xsks that had any traffic redirected to 
> them and bump the Rx ring head pointer(s).
> 
> For the end goal of submitting the descriptor to the Rx ring and bumping the head pointer of that 
> ring, only some of these steps are needed. The rest is overhead. The bpf_redirect_map 
> infrastructure is needed for all other redirect operations, but is not necessary when redirecting 
> to a single AF_XDP socket. And similarly, flushing the list for every map type in step 4 is not 
> necessary when only one socket needs to be flushed.
> 
> Proposed approach:
> 1. XSK pointer: an xsk is created and a handle to the xsk is stored both in the XSKMAP and also the 
> netdev_rx_queue struct.
> 2. XDP program: new bpf_redirect_xsk helper returns XDP_REDIRECT_XSK.
> 3. XDP_REDIRECT_XSK handling called by the driver: the xsk pointer is retrieved from the 
> netdev_rx_queue struct and the XDP descriptor is pushed to the xsk's Rx ring.
> 4. xsk_flush: fetch the handle from the netdev_rx_queue and flush the xsk.
> 
> This fast path is triggered on XDP_REDIRECT_XSK if:
>   (i) AF_XDP socket SW Rx ring configured
>  (ii) Exactly one xsk attached to the queue
> If any of these conditions are not met, fall back to the same behavior as the original approach: 
> xdp_redirect_map. This is handled under-the-hood in the new bpf_xdp_redirect_xsk helper so the user
> does not need to be aware of these conditions.

I don't think the micro optimization for specific use case warrants addition of new apis.
Please optimize it without adding new actions and new helpers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
  2021-11-17 14:24     ` Toke Høiland-Jørgensen
@ 2021-11-19 15:48       ` Loftus, Ciara
  2021-11-22 14:38         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Loftus, Ciara @ 2021-11-19 15:48 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev, bpf
  Cc: brouer, ast, daniel, davem, kuba, hawk, john.fastabend, bjorn,
	Karlsson, Magnus, jonathan.lemon, Fijalkowski, Maciej

> "Loftus, Ciara" <ciara.loftus@intel.com> writes:
> 
> >> I'm fine with adding a new helper, but I don't like introducing a new
> >> XDP_REDIRECT_XSK action, which requires updating ALL the drivers.
> >>
> >> With XDP_REDIRECT infra we beleived we didn't need to add more
> >> XDP-action code to drivers, as we multiplex/add new features by
> >> extending the bpf_redirect_info.
> >> In this extreme performance case, it seems the this_cpu_ptr "lookup" of
> >> bpf_redirect_info is the performance issue itself.
> >>
> >> Could you experiement with different approaches that modify
> >> xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called,
> >> prior to this_cpu_ptr() call.
> >> (Thus, avoiding to introduce a new XDP-action).
> >
> > Thanks for your feedback Jesper.
> > I understand the hesitation of adding a new action. If we can achieve the
> same improvement without
> > introducing a new action I would be very happy!
> > Without new the action we'll need a new way to indicate that the
> bpf_redirect_xsk helper was
> > called. Maybe another new field in the netdev alongside the xsk_refcnt. Or
> else extend
> > bpf_redirect_info - if we find a new home for it that it's too costly to
> access.
> > Thanks for your suggestions. I'll experiment as you suggested and
> > report back.
> 
> I'll add a +1 to the "let's try to solve this without a new return code" :)
> 
> Also, I don't think we need a new helper either; the bpf_redirect()
> helper takes a flags argument, so we could just use ifindex=0,
> flags=DEV_XSK or something like that.

The advantage of a new helper is that we can access the netdev 
struct from it and check if there's a valid xsk stored in it, before
returning XDP_REDIRECT without the xskmap lookup. However,
I think your suggestion could work too. We would just
have to delay the check until xdp_do_redirect. At this point
though, if there isn't a valid xsk we might have to drop the packet
instead of falling back to the xskmap.

> 
> Also, I think the batching in the driver idea can be generalised: we
> just need to generalise the idea of "are all these packets going to the
> same place" and have a batched version of xdp_do_redirect(), no? The
> other map types do batching internally already, though, so I'm wondering
> why batching in the driver helps XSK?
> 

With the current infrastructure figuring out if "all the packets are going
to the same place" looks like an expensive operation which could undo
the benefits of the batching that would come after it. We would need
to run the program N=batch_size times, store the actions and
bpf_redirect_info for each run and perform a series of compares. The new
action really helped here because it could easily indicate if all the
packets in a batch were going to the same place. But I understand it's
not an option. Maybe if we can mitigate the cost of accessing the
bpf_redirect_info as Jesper suggested, we can use a flag in it to signal
what the new action was signalling.
I'm not familiar with how the other map types and how they handle
batching so I will look into that.

Appreciate your feedback. I have a few avenues to explore.

Ciara

> -Toke


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
  2021-11-19 15:48       ` Loftus, Ciara
@ 2021-11-22 14:38         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-22 14:38 UTC (permalink / raw)
  To: Loftus, Ciara, Jesper Dangaard Brouer, netdev, bpf
  Cc: brouer, ast, daniel, davem, kuba, hawk, john.fastabend, bjorn,
	Karlsson, Magnus, jonathan.lemon, Fijalkowski, Maciej

"Loftus, Ciara" <ciara.loftus@intel.com> writes:

>> "Loftus, Ciara" <ciara.loftus@intel.com> writes:
>> 
>> >> I'm fine with adding a new helper, but I don't like introducing a new
>> >> XDP_REDIRECT_XSK action, which requires updating ALL the drivers.
>> >>
>> >> With XDP_REDIRECT infra we beleived we didn't need to add more
>> >> XDP-action code to drivers, as we multiplex/add new features by
>> >> extending the bpf_redirect_info.
>> >> In this extreme performance case, it seems the this_cpu_ptr "lookup" of
>> >> bpf_redirect_info is the performance issue itself.
>> >>
>> >> Could you experiement with different approaches that modify
>> >> xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called,
>> >> prior to this_cpu_ptr() call.
>> >> (Thus, avoiding to introduce a new XDP-action).
>> >
>> > Thanks for your feedback Jesper.
>> > I understand the hesitation of adding a new action. If we can achieve the
>> same improvement without
>> > introducing a new action I would be very happy!
>> > Without new the action we'll need a new way to indicate that the
>> bpf_redirect_xsk helper was
>> > called. Maybe another new field in the netdev alongside the xsk_refcnt. Or
>> else extend
>> > bpf_redirect_info - if we find a new home for it that it's too costly to
>> access.
>> > Thanks for your suggestions. I'll experiment as you suggested and
>> > report back.
>> 
>> I'll add a +1 to the "let's try to solve this without a new return code" :)
>> 
>> Also, I don't think we need a new helper either; the bpf_redirect()
>> helper takes a flags argument, so we could just use ifindex=0,
>> flags=DEV_XSK or something like that.
>
> The advantage of a new helper is that we can access the netdev 
> struct from it and check if there's a valid xsk stored in it, before
> returning XDP_REDIRECT without the xskmap lookup. However,
> I think your suggestion could work too. We would just
> have to delay the check until xdp_do_redirect. At this point
> though, if there isn't a valid xsk we might have to drop the packet
> instead of falling back to the xskmap.

I think it's OK to require the user to make sure that there is such a
socket loaded before using that flag...

>> Also, I think the batching in the driver idea can be generalised: we
>> just need to generalise the idea of "are all these packets going to the
>> same place" and have a batched version of xdp_do_redirect(), no? The
>> other map types do batching internally already, though, so I'm wondering
>> why batching in the driver helps XSK?
>> 
>
> With the current infrastructure figuring out if "all the packets are going
> to the same place" looks like an expensive operation which could undo
> the benefits of the batching that would come after it. We would need
> to run the program N=batch_size times, store the actions and
> bpf_redirect_info for each run and perform a series of compares. The new
> action really helped here because it could easily indicate if all the
> packets in a batch were going to the same place. But I understand it's
> not an option. Maybe if we can mitigate the cost of accessing the
> bpf_redirect_info as Jesper suggested, we can use a flag in it to signal
> what the new action was signalling.

Yes, it would probably require comparing the contents of the whole
bpf_redirect_info, at least as it is now; but assuming we can find a way
to mitigate the cost of accessing that structure, this may not be so
bad, and I believe there is some potential for compressing the state
further so we can get down to a single, or maybe two, 64-bit compares...

-Toke


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
  2021-11-16  7:37 ` [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush Ciara Loftus
@ 2021-11-27 11:44     ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-11-27 11:44 UTC (permalink / raw)
  To: Ciara Loftus; +Cc: llvm, kbuild-all

Hi Ciara,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Ciara-Loftus/XDP_REDIRECT_XSK-and-Batched-AF_XDP-Rx/20211116-153952
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-a004-20211116 (https://download.01.org/0day-ci/archive/20211127/202111271955.w9AzLVNN-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project fbe72e41b99dc7994daac300d208a955be3e4a0a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06daa8f48870f74b06ed02fe7f36b820c09844c9
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ciara-Loftus/XDP_REDIRECT_XSK-and-Batched-AF_XDP-Rx/20211116-153952
        git checkout 06daa8f48870f74b06ed02fe7f36b820c09844c9
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> net/core/dev.c:4883:5: error: expected expression
                                   struct xdp_sock *xs =
                                   ^
>> net/core/dev.c:4886:27: error: use of undeclared identifier 'xs'
                                   err = xsk_generic_rcv(xs, &xdp);
                                                         ^
   2 errors generated.


vim +4883 net/core/dev.c

  4870	
  4871	int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
  4872	{
  4873		if (xdp_prog) {
  4874			struct xdp_buff xdp;
  4875			u32 act;
  4876			int err;
  4877	
  4878			act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
  4879			if (act != XDP_PASS) {
  4880				switch (act) {
  4881	#ifdef CONFIG_XDP_SOCKETS
  4882				case XDP_REDIRECT_XSK:
> 4883					struct xdp_sock *xs =
  4884						READ_ONCE(skb->dev->_rx[xdp.rxq->queue_index].xsk);
  4885	
> 4886					err = xsk_generic_rcv(xs, &xdp);
  4887					if (err)
  4888						goto out_redir;
  4889					consume_skb(skb);
  4890					break;
  4891	#endif
  4892				case XDP_REDIRECT:
  4893					err = xdp_do_generic_redirect(skb->dev, skb,
  4894								      &xdp, xdp_prog);
  4895					if (err)
  4896						goto out_redir;
  4897					break;
  4898				case XDP_TX:
  4899					generic_xdp_tx(skb, xdp_prog);
  4900					break;
  4901				}
  4902				return XDP_DROP;
  4903			}
  4904		}
  4905		return XDP_PASS;
  4906	out_redir:
  4907		kfree_skb(skb);
  4908		return XDP_DROP;
  4909	}
  4910	EXPORT_SYMBOL_GPL(do_xdp_generic);
  4911	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
@ 2021-11-27 11:44     ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-11-27 11:44 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3102 bytes --]

Hi Ciara,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Ciara-Loftus/XDP_REDIRECT_XSK-and-Batched-AF_XDP-Rx/20211116-153952
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-a004-20211116 (https://download.01.org/0day-ci/archive/20211127/202111271955.w9AzLVNN-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project fbe72e41b99dc7994daac300d208a955be3e4a0a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06daa8f48870f74b06ed02fe7f36b820c09844c9
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ciara-Loftus/XDP_REDIRECT_XSK-and-Batched-AF_XDP-Rx/20211116-153952
        git checkout 06daa8f48870f74b06ed02fe7f36b820c09844c9
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> net/core/dev.c:4883:5: error: expected expression
                                   struct xdp_sock *xs =
                                   ^
>> net/core/dev.c:4886:27: error: use of undeclared identifier 'xs'
                                   err = xsk_generic_rcv(xs, &xdp);
                                                         ^
   2 errors generated.


vim +4883 net/core/dev.c

  4870	
  4871	int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
  4872	{
  4873		if (xdp_prog) {
  4874			struct xdp_buff xdp;
  4875			u32 act;
  4876			int err;
  4877	
  4878			act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
  4879			if (act != XDP_PASS) {
  4880				switch (act) {
  4881	#ifdef CONFIG_XDP_SOCKETS
  4882				case XDP_REDIRECT_XSK:
> 4883					struct xdp_sock *xs =
  4884						READ_ONCE(skb->dev->_rx[xdp.rxq->queue_index].xsk);
  4885	
> 4886					err = xsk_generic_rcv(xs, &xdp);
  4887					if (err)
  4888						goto out_redir;
  4889					consume_skb(skb);
  4890					break;
  4891	#endif
  4892				case XDP_REDIRECT:
  4893					err = xdp_do_generic_redirect(skb->dev, skb,
  4894								      &xdp, xdp_prog);
  4895					if (err)
  4896						goto out_redir;
  4897					break;
  4898				case XDP_TX:
  4899					generic_xdp_tx(skb, xdp_prog);
  4900					break;
  4901				}
  4902				return XDP_DROP;
  4903			}
  4904		}
  4905		return XDP_PASS;
  4906	out_redir:
  4907		kfree_skb(skb);
  4908		return XDP_DROP;
  4909	}
  4910	EXPORT_SYMBOL_GPL(do_xdp_generic);
  4911	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-11-27 11:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 2/8] bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush Ciara Loftus
2021-11-27 11:44   ` kernel test robot
2021-11-27 11:44     ` kernel test robot
2021-11-16  7:37 ` [RFC PATCH bpf-next 4/8] i40e: handle the XDP_REDIRECT_XSK action Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 5/8] xsk: implement a batched version of xsk_rcv Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 6/8] i40e: isolate descriptor processing in separate function Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 7/8] i40e: introduce batched XDP rx descriptor processing Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 8/8] libbpf: use bpf_redirect_xsk in the default program Ciara Loftus
2021-11-16  9:43 ` [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Jesper Dangaard Brouer
2021-11-16 16:29   ` Loftus, Ciara
2021-11-17 14:24     ` Toke Høiland-Jørgensen
2021-11-19 15:48       ` Loftus, Ciara
2021-11-22 14:38         ` Toke Høiland-Jørgensen
2021-11-18  2:54 ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.