All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] Generic XDP improvements
@ 2021-06-20 23:31 Kumar Kartikeya Dwivedi
  2021-06-20 23:31 ` [PATCH net-next 1/4] net: core: split out code to run generic XDP prog Kumar Kartikeya Dwivedi
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-20 23:31 UTC (permalink / raw)
  To: netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	John Fastabend, Martin KaFai Lau, bpf

This small series makes some improvements to generic XDP mode and brings it
closer to native XDP. Patch 1 splits out generic XDP processing into reusable
parts, patch 2 implements generic cpumap support (details in commit) and patch 3
allows devmap bpf prog execution before generic_xdp_tx is called.

Patch 4 just updates a couple of selftests to adapt to changes in behavior (in
that specifying devmap/cpumap prog fd in generic mode is now allowed).

Kumar Kartikeya Dwivedi (4):
  net: core: split out code to run generic XDP prog
  net: implement generic cpumap
  bpf: devmap: implement devmap prog execution for generic XDP
  bpf: update XDP selftests to not fail with generic XDP

 include/linux/bpf.h                           |   8 +
 include/linux/netdevice.h                     |   2 +
 include/linux/skbuff.h                        |  10 +-
 kernel/bpf/cpumap.c                           | 151 ++++++++++++++++--
 kernel/bpf/devmap.c                           |  42 ++++-
 net/core/dev.c                                |  86 ++++++----
 net/core/filter.c                             |   6 +-
 .../bpf/prog_tests/xdp_cpumap_attach.c        |   4 +-
 .../bpf/prog_tests/xdp_devmap_attach.c        |   4 +-
 9 files changed, 255 insertions(+), 58 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/4] net: core: split out code to run generic XDP prog
  2021-06-20 23:31 [PATCH net-next 0/4] Generic XDP improvements Kumar Kartikeya Dwivedi
@ 2021-06-20 23:31 ` Kumar Kartikeya Dwivedi
  2021-06-20 23:31 ` [PATCH net-next 2/4] net: implement generic cpumap Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-20 23:31 UTC (permalink / raw)
  To: netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	John Fastabend, Martin KaFai Lau, bpf

This helper can later be utilized in code that runs cpumap and devmap
programs in generic redirect mode and adjust skb based on changes made
to xdp_buff.

When returning XDP_REDIRECT/XDP_TX, it invokes __skb_push, so whenever a
generic redirect path invokes devmap/cpumap prog if set, it must
__skb_pull again as we expect mac header to be pulled.

It also drops the skb_reset_mac_len call after do_xdp_generic, as the
mac_header and network_header are advanced by the same offset, so the
difference (mac_len) remains constant.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
NB: I am not too sure why the skb_reset_mac_len was there, so I removed it since
the offset addition/subtraction should be same for network_header and
mac_header, but I could be missing something important...
---
 include/linux/netdevice.h |  2 +
 net/core/dev.c            | 86 ++++++++++++++++++++++++---------------
 2 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index be1dcceda5e4..90472ea70db2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3984,6 +3984,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 	__dev_kfree_skb_any(skb, SKB_REASON_CONSUMED);
 }

+u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
+			     struct bpf_prog *xdp_prog);
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
 int netif_rx(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 50531a2d0b20..e86c6091f9cf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4717,44 +4717,17 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }

-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
-				     struct xdp_buff *xdp,
-				     struct bpf_prog *xdp_prog)
+u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
+			     struct bpf_prog *xdp_prog)
 {
 	void *orig_data, *orig_data_end, *hard_start;
 	struct netdev_rx_queue *rxqueue;
-	u32 metalen, act = XDP_DROP;
 	bool orig_bcast, orig_host;
 	u32 mac_len, frame_sz;
 	__be16 orig_eth_type;
 	struct ethhdr *eth;
-	int off;
-
-	/* Reinjected packets coming from act_mirred or similar should
-	 * not get XDP generic processing.
-	 */
-	if (skb_is_redirected(skb))
-		return XDP_PASS;
-
-	/* XDP packets must be linear and must have sufficient headroom
-	 * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
-	 * native XDP provides, thus we need to do it here as well.
-	 */
-	if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
-	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
-		int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
-		int troom = skb->tail + skb->data_len - skb->end;
-
-		/* In case we have to go down the path and also linearize,
-		 * then lets do the pskb_expand_head() work just once here.
-		 */
-		if (pskb_expand_head(skb,
-				     hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0,
-				     troom > 0 ? troom + 128 : 0, GFP_ATOMIC))
-			goto do_drop;
-		if (skb_linearize(skb))
-			goto do_drop;
-	}
+	u32 act = XDP_DROP;
+	int off, metalen;

 	/* The XDP program wants to see the packet starting at the MAC
 	 * header.
@@ -4810,6 +4783,13 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		skb->protocol = eth_type_trans(skb, skb->dev);
 	}

+	/* Redirect/Tx gives L2 packet, code that will reuse skb must __skb_pull
+	 * before calling us again on redirect path. We do not call do_redirect
+	 * as we leave that up to the caller.
+	 *
+	 * Caller is responsible for managing lifetime of skb (i.e. calling
+	 * kfree_skb in response to actions it cannot handle/XDP_DROP).
+	 */
 	switch (act) {
 	case XDP_REDIRECT:
 	case XDP_TX:
@@ -4820,6 +4800,49 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		if (metalen)
 			skb_metadata_set(skb, metalen);
 		break;
+	}
+
+	return act;
+}
+
+static u32 netif_receive_generic_xdp(struct sk_buff *skb,
+				     struct xdp_buff *xdp,
+				     struct bpf_prog *xdp_prog)
+{
+	u32 act = XDP_DROP;
+
+	/* Reinjected packets coming from act_mirred or similar should
+	 * not get XDP generic processing.
+	 */
+	if (skb_is_redirected(skb))
+		return XDP_PASS;
+
+	/* XDP packets must be linear and must have sufficient headroom
+	 * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
+	 * native XDP provides, thus we need to do it here as well.
+	 */
+	if (skb_cloned(skb) || skb_is_nonlinear(skb) ||
+	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
+		int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
+		int troom = skb->tail + skb->data_len - skb->end;
+
+		/* In case we have to go down the path and also linearize,
+		 * then lets do the pskb_expand_head() work just once here.
+		 */
+		if (pskb_expand_head(skb,
+				     hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0,
+				     troom > 0 ? troom + 128 : 0, GFP_ATOMIC))
+			goto do_drop;
+		if (skb_linearize(skb))
+			goto do_drop;
+	}
+
+	act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog);
+	switch (act) {
+	case XDP_REDIRECT:
+	case XDP_TX:
+	case XDP_PASS:
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
 		fallthrough;
@@ -5285,7 +5308,6 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 			ret = NET_RX_DROP;
 			goto out;
 		}
-		skb_reset_mac_len(skb);
 	}

 	if (eth_type_vlan(skb->protocol)) {
--
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/4] net: implement generic cpumap
  2021-06-20 23:31 [PATCH net-next 0/4] Generic XDP improvements Kumar Kartikeya Dwivedi
  2021-06-20 23:31 ` [PATCH net-next 1/4] net: core: split out code to run generic XDP prog Kumar Kartikeya Dwivedi
@ 2021-06-20 23:31 ` Kumar Kartikeya Dwivedi
  2021-06-21 15:43   ` Toke Høiland-Jørgensen
  2021-06-20 23:31 ` [PATCH net-next 3/4] bpf: devmap: implement devmap prog execution for generic XDP Kumar Kartikeya Dwivedi
  2021-06-20 23:32 ` [PATCH net-next 4/4] bpf: update XDP selftests to not fail with " Kumar Kartikeya Dwivedi
  3 siblings, 1 reply; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-20 23:31 UTC (permalink / raw)
  To: netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	John Fastabend, Martin KaFai Lau, bpf

This change implements CPUMAP redirect support for generic XDP programs.
The idea is to reuse the cpu map entry's queue that is used to push
native xdp frames for redirecting skb to a different CPU. This will
match native XDP behavior (in that RPS is invoked again for packet
reinjected into networking stack).

To be able to determine whether the incoming skb is from the driver or
cpumap, we reuse skb->redirected bit that skips generic XDP processing
when it is set. To always make use of this, CONFIG_NET_REDIRECT guard on
it has been lifted and it is always available.

From the redirect side, we add the skb to ptr_ring with its lowest bit
set to 1.  This should be safe as skb is not 1-byte aligned. This allows
kthread to discern between xdp_frames and sk_buff. On consumption of the
ptr_ring item, the lowest bit is unset.

In the end, the skb is simply added to the list that kthread is anyway
going to maintain for xdp_frames converted to skb, and then received
again by using netif_receive_skb_list.

Bulking optimization for generic cpumap is left as an exercise for a
future patch for now.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h    |   8 +++
 include/linux/skbuff.h |  10 +--
 kernel/bpf/cpumap.c    | 151 +++++++++++++++++++++++++++++++++++++----
 net/core/filter.c      |   6 +-
 4 files changed, 154 insertions(+), 21 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f309fc1509f2..46e6587d3ee6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1513,6 +1513,8 @@ bool dev_map_can_have_prog(struct bpf_map *map);
 void __cpu_map_flush(void);
 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
+int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
+			     struct sk_buff *skb);
 bool cpu_map_prog_allowed(struct bpf_map *map);
 
 /* Return map's numa specified by userspace */
@@ -1710,6 +1712,12 @@ static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
 	return 0;
 }
 
+static inline int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
+					   struct sk_buff *skb)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline bool cpu_map_prog_allowed(struct bpf_map *map)
 {
 	return false;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b2db9cd9a73f..f19190820e63 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -863,8 +863,8 @@ struct sk_buff {
 	__u8			tc_skip_classify:1;
 	__u8			tc_at_ingress:1;
 #endif
-#ifdef CONFIG_NET_REDIRECT
 	__u8			redirected:1;
+#ifdef CONFIG_NET_REDIRECT
 	__u8			from_ingress:1;
 #endif
 #ifdef CONFIG_TLS_DEVICE
@@ -4664,17 +4664,13 @@ static inline __wsum lco_csum(struct sk_buff *skb)
 
 static inline bool skb_is_redirected(const struct sk_buff *skb)
 {
-#ifdef CONFIG_NET_REDIRECT
 	return skb->redirected;
-#else
-	return false;
-#endif
 }
 
 static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress)
 {
-#ifdef CONFIG_NET_REDIRECT
 	skb->redirected = 1;
+#ifdef CONFIG_NET_REDIRECT
 	skb->from_ingress = from_ingress;
 	if (skb->from_ingress)
 		skb->tstamp = 0;
@@ -4683,9 +4679,7 @@ static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress)
 
 static inline void skb_reset_redirect(struct sk_buff *skb)
 {
-#ifdef CONFIG_NET_REDIRECT
 	skb->redirected = 0;
-#endif
 }
 
 static inline bool skb_csum_is_sctp(struct sk_buff *skb)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a1a0c4e791c6..f016daf8fdcc 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -16,6 +16,7 @@
  * netstack, and assigning dedicated CPUs for this stage.  This
  * basically allows for 10G wirespeed pre-filtering via bpf.
  */
+#include <linux/bitops.h>
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/ptr_ring.h>
@@ -79,6 +80,29 @@ struct bpf_cpu_map {
 
 static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list);
 
+static void *__ptr_set_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr |= BIT(bit);
+	return (void *)__ptr;
+}
+
+static void *__ptr_clear_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr &= ~BIT(bit);
+	return (void *)__ptr;
+}
+
+static int __ptr_test_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	return __ptr & BIT(bit);
+}
+
 static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 {
 	u32 value_size = attr->value_size;
@@ -168,6 +192,64 @@ static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
 	}
 }
 
+static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu,
+				    void **frames, int skb_n,
+				    struct xdp_cpumap_stats *stats,
+				    struct list_head *listp)
+{
+	struct xdp_rxq_info rxq = {};
+	struct xdp_buff xdp;
+	int err, i;
+	u32 act;
+
+	xdp.rxq = &rxq;
+
+	if (!rcpu->prog)
+		goto insert;
+
+	for (i = 0; i < skb_n; i++) {
+		struct sk_buff *skb = frames[i];
+
+		rxq.dev = skb->dev;
+
+		act = bpf_prog_run_generic_xdp(skb, &xdp, rcpu->prog);
+		switch (act) {
+		case XDP_PASS:
+			list_add_tail(&skb->list, listp);
+			break;
+		case XDP_REDIRECT:
+			err = xdp_do_generic_redirect(skb->dev, skb, &xdp,
+						      rcpu->prog);
+			if (unlikely(err)) {
+				kfree_skb(skb);
+				stats->drop++;
+			} else {
+				stats->redirect++;
+			}
+			return;
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			fallthrough;
+		case XDP_ABORTED:
+			trace_xdp_exception(skb->dev, rcpu->prog, act);
+			fallthrough;
+		case XDP_DROP:
+			kfree_skb(skb);
+			stats->drop++;
+			return;
+		}
+	}
+
+	return;
+
+insert:
+	for (i = 0; i < skb_n; i++) {
+		struct sk_buff *skb = frames[i];
+
+		list_add_tail(&skb->list, listp);
+	}
+}
+
 static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 				    void **frames, int n,
 				    struct xdp_cpumap_stats *stats)
@@ -179,8 +261,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 	if (!rcpu->prog)
 		return n;
 
-	rcu_read_lock_bh();
-
 	xdp_set_return_frame_no_direct();
 	xdp.rxq = &rxq;
 
@@ -227,17 +307,36 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 		}
 	}
 
+	xdp_clear_return_frame_no_direct();
+
+	return nframes;
+}
+
+#define CPUMAP_BATCH 8
+
+static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu,
+				void **frames, int xdp_n, int skb_n,
+				struct xdp_cpumap_stats *stats,
+				struct list_head *list)
+{
+	int nframes;
+
+	rcu_read_lock_bh();
+
+	nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats);
+
 	if (stats->redirect)
-		xdp_do_flush_map();
+		xdp_do_flush();
 
-	xdp_clear_return_frame_no_direct();
+	if (unlikely(skb_n))
+		cpu_map_bpf_prog_run_skb(rcpu, frames + CPUMAP_BATCH, skb_n,
+					 stats, list);
 
-	rcu_read_unlock_bh(); /* resched point, may call do_softirq() */
+	rcu_read_unlock_bh();
 
 	return nframes;
 }
 
-#define CPUMAP_BATCH 8
 
 static int cpu_map_kthread_run(void *data)
 {
@@ -254,9 +353,9 @@ static int cpu_map_kthread_run(void *data)
 		struct xdp_cpumap_stats stats = {}; /* zero stats */
 		unsigned int kmem_alloc_drops = 0, sched = 0;
 		gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
-		void *frames[CPUMAP_BATCH];
+		int i, n, m, nframes, xdp_n, skb_n;
+		void *frames[CPUMAP_BATCH * 2];
 		void *skbs[CPUMAP_BATCH];
-		int i, n, m, nframes;
 		LIST_HEAD(list);
 
 		/* Release CPU reschedule checks */
@@ -280,9 +379,17 @@ static int cpu_map_kthread_run(void *data)
 		 */
 		n = __ptr_ring_consume_batched(rcpu->queue, frames,
 					       CPUMAP_BATCH);
-		for (i = 0; i < n; i++) {
+		for (i = 0, xdp_n = 0, skb_n = 0; i < n; i++) {
 			void *f = frames[i];
-			struct page *page = virt_to_page(f);
+			struct page *page;
+
+			if (unlikely(__ptr_test_bit(f, 0))) {
+				frames[CPUMAP_BATCH + skb_n++] = __ptr_clear_bit(f, 0);
+				continue;
+			}
+
+			frames[xdp_n++] = f;
+			page = virt_to_page(f);
 
 			/* Bring struct page memory area to curr CPU. Read by
 			 * build_skb_around via page_is_pfmemalloc(), and when
@@ -292,7 +399,7 @@ static int cpu_map_kthread_run(void *data)
 		}
 
 		/* Support running another XDP prog on this CPU */
-		nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, n, &stats);
+		nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, skb_n, &stats, &list);
 		if (nframes) {
 			m = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, skbs);
 			if (unlikely(m == 0)) {
@@ -316,6 +423,7 @@ static int cpu_map_kthread_run(void *data)
 
 			list_add_tail(&skb->list, &list);
 		}
+
 		netif_receive_skb_list(&list);
 
 		/* Feedback loop via tracepoint */
@@ -333,7 +441,8 @@ static int cpu_map_kthread_run(void *data)
 bool cpu_map_prog_allowed(struct bpf_map *map)
 {
 	return map->map_type == BPF_MAP_TYPE_CPUMAP &&
-	       map->value_size != offsetofend(struct bpf_cpumap_val, qsize);
+	       map->value_size != offsetofend(struct bpf_cpumap_val, qsize) &&
+	       map->value_size != offsetofend(struct bpf_cpumap_val, bpf_prog.fd);
 }
 
 static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu, int fd)
@@ -696,6 +805,24 @@ int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 	return 0;
 }
 
+int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
+			     struct sk_buff *skb)
+{
+	int ret;
+
+	__skb_pull(skb, skb->mac_len);
+	skb_set_redirected(skb, false);
+
+	ret = ptr_ring_produce(rcpu->queue, __ptr_set_bit(skb, 0));
+	if (ret < 0)
+		goto trace;
+
+	wake_up_process(rcpu->kthread);
+trace:
+	trace_xdp_cpumap_enqueue(rcpu->map_id, !ret, !!ret, rcpu->cpu);
+	return ret;
+}
+
 void __cpu_map_flush(void)
 {
 	struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list);
diff --git a/net/core/filter.c b/net/core/filter.c
index 0b13d8157a8f..4a21fde3028f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4038,8 +4038,12 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
 			goto err;
 		consume_skb(skb);
 		break;
+	case BPF_MAP_TYPE_CPUMAP:
+		err = cpu_map_generic_redirect(fwd, skb);
+		if (unlikely(err))
+			goto err;
+		break;
 	default:
-		/* TODO: Handle BPF_MAP_TYPE_CPUMAP */
 		err = -EBADRQC;
 		goto err;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 3/4] bpf: devmap: implement devmap prog execution for generic XDP
  2021-06-20 23:31 [PATCH net-next 0/4] Generic XDP improvements Kumar Kartikeya Dwivedi
  2021-06-20 23:31 ` [PATCH net-next 1/4] net: core: split out code to run generic XDP prog Kumar Kartikeya Dwivedi
  2021-06-20 23:31 ` [PATCH net-next 2/4] net: implement generic cpumap Kumar Kartikeya Dwivedi
@ 2021-06-20 23:31 ` Kumar Kartikeya Dwivedi
  2021-06-21 15:50   ` Toke Høiland-Jørgensen
  2021-06-20 23:32 ` [PATCH net-next 4/4] bpf: update XDP selftests to not fail with " Kumar Kartikeya Dwivedi
  3 siblings, 1 reply; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-20 23:31 UTC (permalink / raw)
  To: netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	John Fastabend, Martin KaFai Lau, bpf

This lifts the restriction on running devmap BPF progs in generic
redirect mode. To match native XDP behavior, it is invoked right before
generic_xdp_tx is called, and only supports XDP_PASS/XDP_ABORTED/
XDP_DROP actions.

We also return 0 even if devmap program drops the packet, as
semantically redirect has already succeeded and the devmap prog is the
last point before TX of the packet to device where it can deliver a
verdict on the packet.

This also means it must take care of freeing the skb, as
xdp_do_generic_redirect callers only do that in case an error is
returned.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/devmap.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 2a75e6c2d27d..db3ed8b20c8c 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -322,7 +322,8 @@ bool dev_map_can_have_prog(struct bpf_map *map)
 {
 	if ((map->map_type == BPF_MAP_TYPE_DEVMAP ||
 	     map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) &&
-	    map->value_size != offsetofend(struct bpf_devmap_val, ifindex))
+	    map->value_size != offsetofend(struct bpf_devmap_val, ifindex) &&
+	    map->value_size != offsetofend(struct bpf_devmap_val, bpf_prog.fd))
 		return true;
 
 	return false;
@@ -499,6 +500,37 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
 	return 0;
 }
 
+static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_prog *xdp_prog)
+{
+	struct xdp_txq_info txq = { .dev = skb->dev };
+	struct xdp_buff xdp;
+	u32 act;
+
+	if (!xdp_prog)
+		return XDP_PASS;
+
+	__skb_pull(skb, skb->mac_len);
+	xdp.txq = &txq;
+
+	act = bpf_prog_run_generic_xdp(skb, &xdp, xdp_prog);
+	switch (act) {
+	case XDP_PASS:
+		__skb_push(skb, skb->mac_len);
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		fallthrough;
+	case XDP_ABORTED:
+		trace_xdp_exception(skb->dev, xdp_prog, act);
+		fallthrough;
+	case XDP_DROP:
+		kfree_skb(skb);
+		break;
+	}
+
+	return act;
+}
+
 int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
 		    struct net_device *dev_rx)
 {
@@ -615,6 +647,14 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
 	if (unlikely(err))
 		return err;
 	skb->dev = dst->dev;
+
+	/* Redirect has already succeeded semantically at this point, so we just
+	 * return 0 even if packet is dropped. Helper below takes care of
+	 * freeing skb.
+	 */
+	if (dev_map_bpf_prog_run_skb(skb, dst->xdp_prog) != XDP_PASS)
+		return 0;
+
 	generic_xdp_tx(skb, xdp_prog);
 
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 4/4] bpf: update XDP selftests to not fail with generic XDP
  2021-06-20 23:31 [PATCH net-next 0/4] Generic XDP improvements Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2021-06-20 23:31 ` [PATCH net-next 3/4] bpf: devmap: implement devmap prog execution for generic XDP Kumar Kartikeya Dwivedi
@ 2021-06-20 23:32 ` Kumar Kartikeya Dwivedi
  3 siblings, 0 replies; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-20 23:32 UTC (permalink / raw)
  To: netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	John Fastabend, Martin KaFai Lau, bpf

Generic XDP devmaps and cpumaps now allow setting value_size to 8 bytes
(so that prog_fd can be specified) and XDP progs using them succeed in
SKB mode now. Adjust the checks.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c | 4 ++--
 tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
index 0176573fe4e7..42e46d2ae349 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
@@ -29,8 +29,8 @@ void test_xdp_with_cpumap_helpers(void)
 	 */
 	prog_fd = bpf_program__fd(skel->progs.xdp_redir_prog);
 	err = bpf_set_link_xdp_fd(IFINDEX_LO, prog_fd, XDP_FLAGS_SKB_MODE);
-	CHECK(err == 0, "Generic attach of program with 8-byte CPUMAP",
-	      "should have failed\n");
+	CHECK(err, "Generic attach of program with 8-byte CPUMAP",
+	      "shouldn't have failed\n");
 
 	prog_fd = bpf_program__fd(skel->progs.xdp_dummy_cm);
 	map_fd = bpf_map__fd(skel->maps.cpu_map);
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
index 88ef3ec8ac4c..861db508ace2 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
@@ -31,8 +31,8 @@ void test_xdp_with_devmap_helpers(void)
 	 */
 	dm_fd = bpf_program__fd(skel->progs.xdp_redir_prog);
 	err = bpf_set_link_xdp_fd(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE);
-	CHECK(err == 0, "Generic attach of program with 8-byte devmap",
-	      "should have failed\n");
+	CHECK(err, "Generic attach of program with 8-byte devmap",
+	      "shouldn't have failed\n");
 
 	dm_fd = bpf_program__fd(skel->progs.xdp_dummy_dm);
 	map_fd = bpf_map__fd(skel->maps.dm_ports);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/4] net: implement generic cpumap
  2021-06-20 23:31 ` [PATCH net-next 2/4] net: implement generic cpumap Kumar Kartikeya Dwivedi
@ 2021-06-21 15:43   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-21 15:43 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jesper Dangaard Brouer, David S. Miller,
	Jakub Kicinski, John Fastabend, Martin KaFai Lau, bpf

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> This change implements CPUMAP redirect support for generic XDP programs.
> The idea is to reuse the cpu map entry's queue that is used to push
> native xdp frames for redirecting skb to a different CPU. This will
> match native XDP behavior (in that RPS is invoked again for packet
> reinjected into networking stack).
>
> To be able to determine whether the incoming skb is from the driver or
> cpumap, we reuse skb->redirected bit that skips generic XDP processing
> when it is set. To always make use of this, CONFIG_NET_REDIRECT guard on
> it has been lifted and it is always available.
>
> From the redirect side, we add the skb to ptr_ring with its lowest bit
> set to 1.  This should be safe as skb is not 1-byte aligned. This allows
> kthread to discern between xdp_frames and sk_buff. On consumption of the
> ptr_ring item, the lowest bit is unset.
>
> In the end, the skb is simply added to the list that kthread is anyway
> going to maintain for xdp_frames converted to skb, and then received
> again by using netif_receive_skb_list.
>
> Bulking optimization for generic cpumap is left as an exercise for a
> future patch for now.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h    |   8 +++
>  include/linux/skbuff.h |  10 +--
>  kernel/bpf/cpumap.c    | 151 +++++++++++++++++++++++++++++++++++++----
>  net/core/filter.c      |   6 +-
>  4 files changed, 154 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f309fc1509f2..46e6587d3ee6 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1513,6 +1513,8 @@ bool dev_map_can_have_prog(struct bpf_map *map);
>  void __cpu_map_flush(void);
>  int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
>  		    struct net_device *dev_rx);
> +int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
> +			     struct sk_buff *skb);
>  bool cpu_map_prog_allowed(struct bpf_map *map);
>  
>  /* Return map's numa specified by userspace */
> @@ -1710,6 +1712,12 @@ static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
>  	return 0;
>  }
>  
> +static inline int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
> +					   struct sk_buff *skb)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  static inline bool cpu_map_prog_allowed(struct bpf_map *map)
>  {
>  	return false;
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index b2db9cd9a73f..f19190820e63 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -863,8 +863,8 @@ struct sk_buff {
>  	__u8			tc_skip_classify:1;
>  	__u8			tc_at_ingress:1;
>  #endif
> -#ifdef CONFIG_NET_REDIRECT
>  	__u8			redirected:1;
> +#ifdef CONFIG_NET_REDIRECT
>  	__u8			from_ingress:1;
>  #endif
>  #ifdef CONFIG_TLS_DEVICE
> @@ -4664,17 +4664,13 @@ static inline __wsum lco_csum(struct sk_buff *skb)
>  
>  static inline bool skb_is_redirected(const struct sk_buff *skb)
>  {
> -#ifdef CONFIG_NET_REDIRECT
>  	return skb->redirected;
> -#else
> -	return false;
> -#endif
>  }
>  
>  static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress)
>  {
> -#ifdef CONFIG_NET_REDIRECT
>  	skb->redirected = 1;
> +#ifdef CONFIG_NET_REDIRECT
>  	skb->from_ingress = from_ingress;
>  	if (skb->from_ingress)
>  		skb->tstamp = 0;
> @@ -4683,9 +4679,7 @@ static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress)
>  
>  static inline void skb_reset_redirect(struct sk_buff *skb)
>  {
> -#ifdef CONFIG_NET_REDIRECT
>  	skb->redirected = 0;
> -#endif
>  }
>  
>  static inline bool skb_csum_is_sctp(struct sk_buff *skb)
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index a1a0c4e791c6..f016daf8fdcc 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -16,6 +16,7 @@
>   * netstack, and assigning dedicated CPUs for this stage.  This
>   * basically allows for 10G wirespeed pre-filtering via bpf.
>   */
> +#include <linux/bitops.h>
>  #include <linux/bpf.h>
>  #include <linux/filter.h>
>  #include <linux/ptr_ring.h>
> @@ -79,6 +80,29 @@ struct bpf_cpu_map {
>  
>  static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list);
>  
> +static void *__ptr_set_bit(void *ptr, int bit)
> +{
> +	unsigned long __ptr = (unsigned long)ptr;
> +
> +	__ptr |= BIT(bit);
> +	return (void *)__ptr;
> +}
> +
> +static void *__ptr_clear_bit(void *ptr, int bit)
> +{
> +	unsigned long __ptr = (unsigned long)ptr;
> +
> +	__ptr &= ~BIT(bit);
> +	return (void *)__ptr;
> +}
> +
> +static int __ptr_test_bit(void *ptr, int bit)
> +{
> +	unsigned long __ptr = (unsigned long)ptr;
> +
> +	return __ptr & BIT(bit);
> +}

Why not put these into bitops.h instead?

>  static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>  {
>  	u32 value_size = attr->value_size;
> @@ -168,6 +192,64 @@ static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
>  	}
>  }
>  
> +static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu,
> +				    void **frames, int skb_n,
> +				    struct xdp_cpumap_stats *stats,
> +				    struct list_head *listp)
> +{
> +	struct xdp_rxq_info rxq = {};
> +	struct xdp_buff xdp;
> +	int err, i;
> +	u32 act;
> +
> +	xdp.rxq = &rxq;
> +
> +	if (!rcpu->prog)
> +		goto insert;
> +
> +	for (i = 0; i < skb_n; i++) {
> +		struct sk_buff *skb = frames[i];
> +
> +		rxq.dev = skb->dev;
> +
> +		act = bpf_prog_run_generic_xdp(skb, &xdp, rcpu->prog);
> +		switch (act) {
> +		case XDP_PASS:
> +			list_add_tail(&skb->list, listp);
> +			break;
> +		case XDP_REDIRECT:
> +			err = xdp_do_generic_redirect(skb->dev, skb, &xdp,
> +						      rcpu->prog);
> +			if (unlikely(err)) {
> +				kfree_skb(skb);
> +				stats->drop++;
> +			} else {
> +				stats->redirect++;
> +			}
> +			return;
> +		default:
> +			bpf_warn_invalid_xdp_action(act);
> +			fallthrough;
> +		case XDP_ABORTED:
> +			trace_xdp_exception(skb->dev, rcpu->prog, act);
> +			fallthrough;
> +		case XDP_DROP:
> +			kfree_skb(skb);
> +			stats->drop++;
> +			return;
> +		}
> +	}
> +
> +	return;
> +
> +insert:
> +	for (i = 0; i < skb_n; i++) {
> +		struct sk_buff *skb = frames[i];
> +
> +		list_add_tail(&skb->list, listp);
> +	}
> +}
> +
>  static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
>  				    void **frames, int n,
>  				    struct xdp_cpumap_stats *stats)
> @@ -179,8 +261,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
>  	if (!rcpu->prog)
>  		return n;
>  
> -	rcu_read_lock_bh();
> -
>  	xdp_set_return_frame_no_direct();
>  	xdp.rxq = &rxq;
>  
> @@ -227,17 +307,36 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
>  		}
>  	}
>  
> +	xdp_clear_return_frame_no_direct();
> +
> +	return nframes;
> +}
> +
> +#define CPUMAP_BATCH 8
> +
> +static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu,
> +				void **frames, int xdp_n, int skb_n,
> +				struct xdp_cpumap_stats *stats,
> +				struct list_head *list)
> +{
> +	int nframes;
> +
> +	rcu_read_lock_bh();
> +
> +	nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats);
> +
>  	if (stats->redirect)
> -		xdp_do_flush_map();
> +		xdp_do_flush();
>  
> -	xdp_clear_return_frame_no_direct();
> +	if (unlikely(skb_n))
> +		cpu_map_bpf_prog_run_skb(rcpu, frames + CPUMAP_BATCH, skb_n,
> +					 stats, list);
>  
> -	rcu_read_unlock_bh(); /* resched point, may call do_softirq() */
> +	rcu_read_unlock_bh();
>  
>  	return nframes;
>  }
>  
> -#define CPUMAP_BATCH 8
>  
>  static int cpu_map_kthread_run(void *data)
>  {
> @@ -254,9 +353,9 @@ static int cpu_map_kthread_run(void *data)
>  		struct xdp_cpumap_stats stats = {}; /* zero stats */
>  		unsigned int kmem_alloc_drops = 0, sched = 0;
>  		gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
> -		void *frames[CPUMAP_BATCH];
> +		int i, n, m, nframes, xdp_n, skb_n;
> +		void *frames[CPUMAP_BATCH * 2];

This double-sized array thing is clever, but it hurts readability. You'd
get basically the same code by having them as two separate arrays and
passing in two separate pointers to cpu_map_bpf_prog_run().

Or you could even just use 'list' - you're passing in that anyway, just
to have cpu_map_bpf_prog_run_skb() add the skbs to it; so why not just
add them right here in the caller, and have cpu_map_bpf_prog_run_skb()
remove them again if the rcpu prog doesn't return XDP_PASS?

-Toke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 3/4] bpf: devmap: implement devmap prog execution for generic XDP
  2021-06-20 23:31 ` [PATCH net-next 3/4] bpf: devmap: implement devmap prog execution for generic XDP Kumar Kartikeya Dwivedi
@ 2021-06-21 15:50   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-21 15:50 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, netdev
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jesper Dangaard Brouer, David S. Miller,
	Jakub Kicinski, John Fastabend, Martin KaFai Lau, bpf

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> This lifts the restriction on running devmap BPF progs in generic
> redirect mode. To match native XDP behavior, it is invoked right before
> generic_xdp_tx is called, and only supports XDP_PASS/XDP_ABORTED/
> XDP_DROP actions.
>
> We also return 0 even if devmap program drops the packet, as
> semantically redirect has already succeeded and the devmap prog is the
> last point before TX of the packet to device where it can deliver a
> verdict on the packet.
>
> This also means it must take care of freeing the skb, as
> xdp_do_generic_redirect callers only do that in case an error is
> returned.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/devmap.c | 42 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 2a75e6c2d27d..db3ed8b20c8c 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -322,7 +322,8 @@ bool dev_map_can_have_prog(struct bpf_map *map)
>  {
>  	if ((map->map_type == BPF_MAP_TYPE_DEVMAP ||
>  	     map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) &&
> -	    map->value_size != offsetofend(struct bpf_devmap_val, ifindex))
> +	    map->value_size != offsetofend(struct bpf_devmap_val, ifindex) &&
> +	    map->value_size != offsetofend(struct bpf_devmap_val, bpf_prog.fd))
>  		return true;

With this you've basically removed the need for the check that calls
this, so why not just get rid of it entirely? Same thing for cpumap,
instead of updating cpu_map_prog_allowed(), just get rid of it...

-Toke


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-21 15:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-20 23:31 [PATCH net-next 0/4] Generic XDP improvements Kumar Kartikeya Dwivedi
2021-06-20 23:31 ` [PATCH net-next 1/4] net: core: split out code to run generic XDP prog Kumar Kartikeya Dwivedi
2021-06-20 23:31 ` [PATCH net-next 2/4] net: implement generic cpumap Kumar Kartikeya Dwivedi
2021-06-21 15:43   ` Toke Høiland-Jørgensen
2021-06-20 23:31 ` [PATCH net-next 3/4] bpf: devmap: implement devmap prog execution for generic XDP Kumar Kartikeya Dwivedi
2021-06-21 15:50   ` Toke Høiland-Jørgensen
2021-06-20 23:32 ` [PATCH net-next 4/4] bpf: update XDP selftests to not fail with " Kumar Kartikeya Dwivedi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.