All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
@ 2023-03-13 19:08 Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics Alexander Lobakin
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 19:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway.

__xdp_build_skb_from_frame() missed the moment when the networking stack
became able to recycle skb pages backed by a page_pool. This was making
e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was
also affected in some scenarios.
A lot of drivers use skb_mark_for_recycle() already, it's been almost
two years and seems like there are no issues in using it in the generic
code too. {__,}xdp_release_frame() can be then removed as it losts its
last user.
Page Pool becomes then zero-alloc (or almost) in the abovementioned
cases, too. Other memory type models (who needs them at this point)
have no changes.

Some numbers on 1 Xeon Platinum core bombed with 27 Mpps of 64-byte
IPv6 UDP, iavf w/XDP[0] (CONFIG_PAGE_POOL_STATS is enabled):

Plain %XDP_PASS on baseline, Page Pool driver:

src cpu Rx     drops  dst cpu Rx
  2.1 Mpps       N/A    2.1 Mpps

cpumap redirect (cross-core, w/o leaving its NUMA node) on baseline:

  6.8 Mpps  5.0 Mpps    1.8 Mpps

cpumap redirect with skb PP recycling:

  7.9 Mpps  5.7 Mpps    2.2 Mpps
                       +22% (from cpumap redir on baseline)

[0] https://github.com/alobakin/linux/commits/iavf-xdp

Alexander Lobakin (4):
  selftests/bpf: robustify test_xdp_do_redirect with more payload magics
  net: page_pool, skbuff: make skb_mark_for_recycle() always available
  xdp: recycle Page Pool backed skbs built from XDP frames
  xdp: remove unused {__,}xdp_release_frame()

 include/linux/skbuff.h                        |  4 +--
 include/net/xdp.h                             | 29 ---------------
 net/core/xdp.c                                | 19 ++--------
 .../bpf/progs/test_xdp_do_redirect.c          | 36 +++++++++++++------
 4 files changed, 30 insertions(+), 58 deletions(-)

---
From v2[1]:
* fix the test_xdp_do_redirect selftest failing after the series: it was
  relying on that %XDP_PASS frames can't be recycled on veth
  (BPF CI, Alexei);
* explain "w/o leaving its node" in the cover letter (Jesper).

From v1[2]:
* make skb_mark_for_recycle() always available, otherwise there are build
  failures on non-PP systems (kbuild bot);
* 'Page Pool' -> 'page_pool' when it's about a page_pool instance, not
  API (Jesper);
* expanded test system info a bit in the cover letter (Jesper).

[1] https://lore.kernel.org/bpf/20230303133232.2546004-1-aleksander.lobakin@intel.com
[2] https://lore.kernel.org/bpf/20230301160315.1022488-1-aleksander.lobakin@intel.com
-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics
  2023-03-13 19:08 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
@ 2023-03-13 19:08 ` Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 2/4] net: page_pool, skbuff: make skb_mark_for_recycle() always available Alexander Lobakin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 19:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

Currently, the test relies on that only dropped ("xmitted") frames will
be recycled and if a frame became an skb, it will be freed later by the
stack and never come back to its page_pool.
So, it easily gets broken by trying to recycle skbs[0]:

  test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
  test_xdp_do_redirect:FAIL:pkt_count_zero unexpected pkt_count_zero:
actual 9936 != expected 2
  test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec

That huge mismatch happened because after the TC ingress hook zeroes the
magic, the page gets recycled when skb is freed, not returned to the MM
layer. "Live frames" mode initializes only new pages and keeps the
recycled ones as is by design, so they appear with zeroed magic on the
Rx path again.
Expand the possible magic values from two: 0 (was "xmitted"/dropped or
did hit the TC hook) and 0x42 (hit the input XDP prog) to three: the new
one will mark frames hit the TC hook, so that they will elide both
@pkt_count_zero and @pkt_count_xdp. They can then be recycled to their
page_pool or returned to the page allocator, this won't affect the
counters anyhow. Just make sure to mark them as "input" (0x42) when they
appear on the Rx path again.
Also make an enum from those magics, so that they will be always visible
and can be changed in just one place anytime. This also eases adding any
new marks later on.

Link: https://github.com/kernel-patches/bpf/actions/runs/4386538411/jobs/7681081789
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 .../bpf/progs/test_xdp_do_redirect.c          | 36 +++++++++++++------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
index 77a123071940..cd2d4e3258b8 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -4,6 +4,19 @@
 
 #define ETH_ALEN 6
 #define HDR_SZ (sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
+
+/**
+ * enum frame_mark - magics to distinguish page/packet paths
+ * @MARK_XMIT: page was recycled due to the frame being "xmitted" by the NIC.
+ * @MARK_IN: frame is being processed by the input XDP prog.
+ * @MARK_SKB: frame did hit the TC ingress hook as an skb.
+ */
+enum frame_mark {
+	MARK_XMIT	= 0U,
+	MARK_IN		= 0x42,
+	MARK_SKB	= 0x45,
+};
+
 const volatile int ifindex_out;
 const volatile int ifindex_in;
 const volatile __u8 expect_dst[ETH_ALEN];
@@ -34,10 +47,10 @@ int xdp_redirect(struct xdp_md *xdp)
 	if (*metadata != 0x42)
 		return XDP_ABORTED;
 
-	if (*payload == 0) {
-		*payload = 0x42;
+	if (*payload == MARK_XMIT)
 		pkts_seen_zero++;
-	}
+
+	*payload = MARK_IN;
 
 	if (bpf_xdp_adjust_meta(xdp, 4))
 		return XDP_ABORTED;
@@ -51,7 +64,7 @@ int xdp_redirect(struct xdp_md *xdp)
 	return ret;
 }
 
-static bool check_pkt(void *data, void *data_end)
+static bool check_pkt(void *data, void *data_end, const __u32 mark)
 {
 	struct ipv6hdr *iph = data + sizeof(struct ethhdr);
 	__u8 *payload = data + HDR_SZ;
@@ -59,13 +72,13 @@ static bool check_pkt(void *data, void *data_end)
 	if (payload + 1 > data_end)
 		return false;
 
-	if (iph->nexthdr != IPPROTO_UDP || *payload != 0x42)
+	if (iph->nexthdr != IPPROTO_UDP || *payload != MARK_IN)
 		return false;
 
 	/* reset the payload so the same packet doesn't get counted twice when
 	 * it cycles back through the kernel path and out the dst veth
 	 */
-	*payload = 0;
+	*payload = mark;
 	return true;
 }
 
@@ -75,11 +88,11 @@ int xdp_count_pkts(struct xdp_md *xdp)
 	void *data = (void *)(long)xdp->data;
 	void *data_end = (void *)(long)xdp->data_end;
 
-	if (check_pkt(data, data_end))
+	if (check_pkt(data, data_end, MARK_XMIT))
 		pkts_seen_xdp++;
 
-	/* Return XDP_DROP to make sure the data page is recycled, like when it
-	 * exits a physical NIC. Recycled pages will be counted in the
+	/* Return %XDP_DROP to recycle the data page with %MARK_XMIT, like
+	 * it exited a physical NIC. Those pages will be counted in the
 	 * pkts_seen_zero counter above.
 	 */
 	return XDP_DROP;
@@ -91,9 +104,12 @@ int tc_count_pkts(struct __sk_buff *skb)
 	void *data = (void *)(long)skb->data;
 	void *data_end = (void *)(long)skb->data_end;
 
-	if (check_pkt(data, data_end))
+	if (check_pkt(data, data_end, MARK_SKB))
 		pkts_seen_tc++;
 
+	/* Will be either recycled or freed, %MARK_SKB makes sure it won't
+	 * hit any of the counters above.
+	 */
 	return 0;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 2/4] net: page_pool, skbuff: make skb_mark_for_recycle() always available
  2023-03-13 19:08 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics Alexander Lobakin
@ 2023-03-13 19:08 ` Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 3/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 4/4] xdp: remove unused {__,}xdp_release_frame() Alexander Lobakin
  3 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 19:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel, kernel test robot

skb_mark_for_recycle() is guarded with CONFIG_PAGE_POOL, this creates
unneeded complication when using it in the generic code. For now, it's
only used in the drivers always selecting Page Pool, so this works.
Move the guards so that preprocessor will cut out only the operation
itself and the function will still be a noop on !PAGE_POOL systems,
but available there as well.
No functional changes.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303020342.Wi2PRFFH-lkp@intel.com
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/skbuff.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe661011644b..3f3a2a82a86b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -5069,12 +5069,12 @@ static inline u64 skb_get_kcov_handle(struct sk_buff *skb)
 #endif
 }
 
-#ifdef CONFIG_PAGE_POOL
 static inline void skb_mark_for_recycle(struct sk_buff *skb)
 {
+#ifdef CONFIG_PAGE_POOL
 	skb->pp_recycle = 1;
-}
 #endif
+}
 
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 3/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-13 19:08 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 2/4] net: page_pool, skbuff: make skb_mark_for_recycle() always available Alexander Lobakin
@ 2023-03-13 19:08 ` Alexander Lobakin
  2023-03-13 19:08 ` [PATCH bpf-next v3 4/4] xdp: remove unused {__,}xdp_release_frame() Alexander Lobakin
  3 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 19:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

__xdp_build_skb_from_frame() state(d):

/* Until page_pool get SKB return path, release DMA here */

Page Pool got skb pages recycling in April 2021, but missed this
function.

xdp_release_frame() is relevant only for Page Pool backed frames and it
detaches the page from the corresponding page_pool in order to make it
freeable via page_frag_free(). It can instead just mark the output skb
as eligible for recycling if the frame is backed by a pp. No change for
other memory model types (the same condition check as before).
cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
almost).

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 net/core/xdp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/xdp.c b/net/core/xdp.c
index 8c92fc553317..a2237cfca8e9 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 	 * - RX ring dev queue index	(skb_record_rx_queue)
 	 */
 
-	/* Until page_pool get SKB return path, release DMA here */
-	xdp_release_frame(xdpf);
+	if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
+		skb_mark_for_recycle(skb);
 
 	/* Allow SKB to reuse area used by xdp_frame */
 	xdp_scrub_frame(xdpf);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 4/4] xdp: remove unused {__,}xdp_release_frame()
  2023-03-13 19:08 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
                   ` (2 preceding siblings ...)
  2023-03-13 19:08 ` [PATCH bpf-next v3 3/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
@ 2023-03-13 19:08 ` Alexander Lobakin
  3 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 19:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

__xdp_build_skb_from_frame() was the last user of
{__,}xdp_release_frame(), which detaches pages from the page_pool.
All the consumers now recycle Page Pool skbs and page, except mlx5,
stmmac and tsnep drivers, which use page_pool_release_page() directly
(might change one day). It's safe to assume this functionality is not
needed anymore and can be removed (in favor of recycling).

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/xdp.h | 29 -----------------------------
 net/core/xdp.c    | 15 ---------------
 2 files changed, 44 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index d517bfac937b..5393b3ebe56e 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -317,35 +317,6 @@ void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
 void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 			   struct xdp_frame_bulk *bq);
 
-/* When sending xdp_frame into the network stack, then there is no
- * return point callback, which is needed to release e.g. DMA-mapping
- * resources with page_pool.  Thus, have explicit function to release
- * frame resources.
- */
-void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
-static inline void xdp_release_frame(struct xdp_frame *xdpf)
-{
-	struct xdp_mem_info *mem = &xdpf->mem;
-	struct skb_shared_info *sinfo;
-	int i;
-
-	/* Curr only page_pool needs this */
-	if (mem->type != MEM_TYPE_PAGE_POOL)
-		return;
-
-	if (likely(!xdp_frame_has_frags(xdpf)))
-		goto out;
-
-	sinfo = xdp_get_shared_info_from_frame(xdpf);
-	for (i = 0; i < sinfo->nr_frags; i++) {
-		struct page *page = skb_frag_page(&sinfo->frags[i]);
-
-		__xdp_release_frame(page_address(page), mem);
-	}
-out:
-	__xdp_release_frame(xdpf->data, mem);
-}
-
 static __always_inline unsigned int xdp_get_frame_len(struct xdp_frame *xdpf)
 {
 	struct skb_shared_info *sinfo;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index a2237cfca8e9..8d3ad315f18d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -531,21 +531,6 @@ void xdp_return_buff(struct xdp_buff *xdp)
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
 
-/* Only called for MEM_TYPE_PAGE_POOL see xdp.h */
-void __xdp_release_frame(void *data, struct xdp_mem_info *mem)
-{
-	struct xdp_mem_allocator *xa;
-	struct page *page;
-
-	rcu_read_lock();
-	xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
-	page = virt_to_head_page(data);
-	if (xa)
-		page_pool_release_page(xa->page_pool, page);
-	rcu_read_unlock();
-}
-EXPORT_SYMBOL_GPL(__xdp_release_frame);
-
 void xdp_attachment_setup(struct xdp_attachment_info *info,
 			  struct netdev_bpf *bpf)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15 18:26                 ` Ilya Leoshkevich
@ 2023-03-16 13:22                   ` Alexander Lobakin
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-16 13:22 UTC (permalink / raw)
  To: Ilya Leoshkevich, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

From: Ilya Leoshkevich <iii@linux.ibm.com>
Date: Wed, 15 Mar 2023 19:26:00 +0100

> On Wed, 2023-03-15 at 19:12 +0100, Alexander Lobakin wrote:
>> From: Ilya Leoshkevich <iii@linux.ibm.com>
>> Date: Wed, 15 Mar 2023 19:00:47 +0100
>>
>>> On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote:
>>>> On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote:
>>>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>>> Date: Wed, 15 Mar 2023 10:56:25 +0100
>>>>>
>>>>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>>>> Date: Tue, 14 Mar 2023 16:54:25 -0700
>>>>>>
>>>>>>> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
>>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> test_xdp_do_redirect:PASS:prog_run 0 nsec
>>>>>>> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
>>>>>>> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
>>>>>>> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected
>>>>>>> pkt_count_tc:
>>>>>>> actual
>>>>>>> 220 != expected 9998
>>>>>>> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
>>>>>>> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
>>>>>>> close_netns:PASS:setns 0 nsec
>>>>>>> #289 xdp_do_redirect:FAIL
>>>>>>> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
>>>>>>>
>>>>>>> Alex,
>>>>>>> could you please take a look at why it's happening?
>>>>>>>
>>>>>>> I suspect it's an endianness issue in:
>>>>>>>         if (*metadata != 0x42)
>>>>>>>                 return XDP_ABORTED;
>>>>>>> but your patch didn't change that,
>>>>>>> so I'm not sure why it worked before.
>>>>>>
>>>>>> Sure, lemme fix it real quick.
>>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> Do you have s390 testing setups? Maybe you could take a look,
>>>>> since
>>>>> I
>>>>> don't have one and can't debug it? Doesn't seem to be
>>>>> Endianness
>>>>> issue.
>>>>> I mean, I have this (the below patch), but not sure it will fix
>>>>> anything -- IIRC eBPF arch always matches the host arch ._.
>>>>> I can't figure out from the code what does happen wrongly :s
>>>>> And it
>>>>> happens only on s390.
>>>>>
>>>>> Thanks,
>>>>> Olek
>>>>> ---
>>>>> diff --git
>>>>> a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>>>> b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>>>> index 662b6c6c5ed7..b21371668447 100644
>>>>> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>>>> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>>>> @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
>>>>>                             .attach_point = BPF_TC_INGRESS);
>>>>>  
>>>>>         memcpy(&data[sizeof(__u32)], &pkt_udp,
>>>>> sizeof(pkt_udp));
>>>>> -       *((__u32 *)data) = 0x42; /* metadata test value */
>>>>> +       *((__u32 *)data) = htonl(0x42); /* metadata test value
>>>>> */
>>>>>  
>>>>>         skel = test_xdp_do_redirect__open();
>>>>>         if (!ASSERT_OK_PTR(skel, "skel"))
>>>>> diff --git
>>>>> a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>>>> b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>>>> index cd2d4e3258b8..2475bc30ced2 100644
>>>>> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>>>> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>>>> @@ -1,5 +1,6 @@
>>>>>  // SPDX-License-Identifier: GPL-2.0
>>>>>  #include <vmlinux.h>
>>>>> +#include <bpf/bpf_endian.h>
>>>>>  #include <bpf/bpf_helpers.h>
>>>>>  
>>>>>  #define ETH_ALEN 6
>>>>> @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
>>>>>  SEC("xdp")
>>>>>  int xdp_redirect(struct xdp_md *xdp)
>>>>>  {
>>>>> -       __u32 *metadata = (void *)(long)xdp->data_meta;
>>>>> +       __be32 *metadata = (void *)(long)xdp->data_meta;
>>>>>         void *data_end = (void *)(long)xdp->data_end;
>>>>>         void *data = (void *)(long)xdp->data;
>>>>>  
>>>>> @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
>>>>>         if (metadata + 1 > data)
>>>>>                 return XDP_ABORTED;
>>>>>  
>>>>> -       if (*metadata != 0x42)
>>>>> +       if (*metadata != __bpf_htonl(0x42))
>>>>>                 return XDP_ABORTED;
>>>>>  
>>>>>         if (*payload == MARK_XMIT)
>>>>
>>>> Okay, I'll take a look. Two quick observations for now:
>>>>
>>>> - Unfortunately the above patch does not help.
>>>>
>>>> - In dmesg I see:
>>>>
>>>>     Driver unsupported XDP return value 0 on prog xdp_redirect
>>>> (id
>>>> 23)
>>>>     dev N/A, expect packet loss!
>>>
>>> I haven't identified the issue yet, but I have found a couple more
>>> things that might be helpful:
>>>
>>> - In problematic cases metadata contains 0, so this is not an
>>>   endianness issue. data is still reasonable though. I'm trying to
>>>   understand what is causing this.
>>>
>>> - Applying the following diff:
>>>
>>> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>> @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)
>>>  
>>>         *payload = MARK_IN;
>>>  
>>> -       if (bpf_xdp_adjust_meta(xdp, 4))
>>> +       if (false && bpf_xdp_adjust_meta(xdp, 4))
>>>                 return XDP_ABORTED;
>>>  
>>>         if (retcode > XDP_PASS)
>>>
>>> causes a kernel panic even on x86_64:
>>>
>>> BUG: kernel NULL pointer dereference, address:
>>> 0000000000000d28       
>>> ...
>>> Call Trace:            
>>>  <TASK>                                                            
>>>    
>>>  build_skb_around+0x22/0xb0
>>>  __xdp_build_skb_from_frame+0x4e/0x130
>>>  bpf_test_run_xdp_live+0x65f/0x7c0
>>>  ? __pfx_xdp_test_run_init_page+0x10/0x10
>>>  bpf_prog_test_run_xdp+0x2ba/0x480
>>>  bpf_prog_test_run+0xeb/0x110
>>>  __sys_bpf+0x2b9/0x570
>>>  __x64_sys_bpf+0x1c/0x30
>>>  do_syscall_64+0x48/0xa0
>>>  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> I haven't looked into this at all, but I believe this needs to be
>>> fixed - BPF should never cause kernel panics.
>>
>> This one is basically the same issue as syzbot mentioned today
>> (separate
>> subthread). I'm waiting for a feedback from Toke on which way of
>> fixing
>> he'd prefer (I proposed 2). If those zeroed metadata magics that you
>> observe have the same roots with the panic, one fix will smash 2
>> issues.
>>
>> Thanks,
>> Olek
> 
> Sounds good, I will wait for an update then.
> 
> In the meantime, I found the code that overwrites the metadata:
> 
> #0  0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
> skb=0x88142200) at linux/include/net/neighbour.h:503
> #1  0x0000000000ab2cda in neigh_output (skip_cache=false,
> skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
> #2  ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
> skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
> #3  0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
> net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
> #4  ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
> linux/net/ipv6/ip6_output.c:206
> #5  0x0000000000ab5cbc in dst_input (skb=<optimized out>) at
> linux/include/net/dst.h:454
> #6  ip6_sublist_rcv_finish (head=head@entry=0x38000dbf520) at
> linux/net/ipv6/ip6_input.c:88
> #7  0x0000000000ab6104 in ip6_list_rcv_finish (net=<optimized out>,
> head=<optimized out>, sk=0x0) at linux/net/ipv6/ip6_input.c:145
> #8  0x0000000000ab72bc in ipv6_list_rcv (head=0x38000dbf638,
> pt=<optimized out>, orig_dev=<optimized out>) at
> linux/net/ipv6/ip6_input.c:354
> #9  0x00000000008b3710 in __netif_receive_skb_list_ptype
> (orig_dev=0x880b8000, pt_prev=0x176b7f8 <ipv6_packet_type>,
> head=0x38000dbf638) at linux/net/core/dev.c:5520
> #10 __netif_receive_skb_list_core (head=head@entry=0x38000dbf7b8,
> pfmemalloc=pfmemalloc@entry=false) at linux/net/core/dev.c:5568
> #11 0x00000000008b4390 in __netif_receive_skb_list (head=0x38000dbf7b8)
> at linux/net/core/dev.c:5620
> #12 netif_receive_skb_list_internal (head=head@entry=0x38000dbf7b8) at
> linux/net/core/dev.c:5711
> #13 0x00000000008b45ce in netif_receive_skb_list
> (head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5763
> #14 0x0000000000950782 in xdp_recv_frames (dev=<optimized out>,
> skbs=<optimized out>, nframes=62, frames=0x8587c600) at
> linux/net/bpf/test_run.c:256
> #15 xdp_test_run_batch (xdp=xdp@entry=0x38000dbf900,
> prog=prog@entry=0x37fffe75000, repeat=<optimized out>) at
> linux/net/bpf/test_run.c:334
> 
> namely:
> 
> static inline int neigh_hh_output(const struct hh_cache *hh, struct
> sk_buff *skb)
>    ...
>    memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);
> 
> It's hard for me to see what is going on here, since I'm not familiar
> with the networking code - since XDP metadata is located at the end of
> headroom, should not there be something that prevents the network stack
> from overwriting it? Or can it be that netif_receive_skb_list() is

Ok I got it. Much thanks for the debugging! It gets overwritten when
neigh puts the MAC addr back when xmitting. It works on LE, because 0x42
fits into one bit and it's stored in `mac_addr - 4`. On BE this byte is
`mac_addr - 1`. Neigh rounds 14 bytes of Ethernet header to 16, so two
bytes of metadata get wiped.

This is not the same bug as the one that syzbot reported, but they are
provoked by the same: assumptions that %XDP_PASS guarantees the page
won't come back to the pool.
So there are two ways: the first one is to fix those two bugs (two
oneliners basically), the second one is to turn off the recycling for
bpf_test_run at all. I like the first more as it theoretically keeps the
perf boost for bpf_test_run gained from enabling the recycling, but it
can't guarantee similar stuff won't happen soon :D Something else might
get overwritten somewhere else and so on. The second one will
effectively revert the logics for bpf_test_run to the pre-recycling state.
I'll submit the first way today, it will be a series of two. Will see
then how it goes, I'm fine with both ways.

> free to do whatever it wants with that memory and we cannot expect to
> receive it back intact?
Ideally, yes, after a frame is passed outside the driver, you can't
assume anything on its page won't be changed. But that's one of the
bpf_test_run's "features" that gives it nice perf :D So as long as
assumptions don't get broken by some change, like this one with the
recycling, it just works.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-13 21:42 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
@ 2023-03-16 11:57 ` Alexander Lobakin
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-16 11:57 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

From: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Mon, 13 Mar 2023 22:42:56 +0100

> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway.
> 
> __xdp_build_skb_from_frame() missed the moment when the networking stack
> became able to recycle skb pages backed by a page_pool. This was making
> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was
> also affected in some scenarios.
> A lot of drivers use skb_mark_for_recycle() already, it's been almost
> two years and seems like there are no issues in using it in the generic
> code too. {__,}xdp_release_frame() can be then removed as it losts its
> last user.
> Page Pool becomes then zero-alloc (or almost) in the abovementioned
> cases, too. Other memory type models (who needs them at this point)
> have no changes.

Sorry, our SMTP proxy went crazy and resent several times all my
messages sent via git-send-email during the last couple days. Please
ignore this.

> 
> Some numbers on 1 Xeon Platinum core bombed with 27 Mpps of 64-byte
> IPv6 UDP, iavf w/XDP[0] (CONFIG_PAGE_POOL_STATS is enabled):
> 
> Plain %XDP_PASS on baseline, Page Pool driver:
> 
> src cpu Rx     drops  dst cpu Rx
>   2.1 Mpps       N/A    2.1 Mpps
> 
> cpumap redirect (cross-core, w/o leaving its NUMA node) on baseline:
> 
>   6.8 Mpps  5.0 Mpps    1.8 Mpps
> 
> cpumap redirect with skb PP recycling:
> 
>   7.9 Mpps  5.7 Mpps    2.2 Mpps
>                        +22% (from cpumap redir on baseline)
> 
> [0] https://github.com/alobakin/linux/commits/iavf-xdp
> 
> Alexander Lobakin (4):
>   selftests/bpf: robustify test_xdp_do_redirect with more payload magics
>   net: page_pool, skbuff: make skb_mark_for_recycle() always available
>   xdp: recycle Page Pool backed skbs built from XDP frames
>   xdp: remove unused {__,}xdp_release_frame()
> 
>  include/linux/skbuff.h                        |  4 +--
>  include/net/xdp.h                             | 29 ---------------
>  net/core/xdp.c                                | 19 ++--------
>  .../bpf/progs/test_xdp_do_redirect.c          | 36 +++++++++++++------
>  4 files changed, 30 insertions(+), 58 deletions(-)
> 
> ---
> From v2[1]:
> * fix the test_xdp_do_redirect selftest failing after the series: it was
>   relying on that %XDP_PASS frames can't be recycled on veth
>   (BPF CI, Alexei);
> * explain "w/o leaving its node" in the cover letter (Jesper).
> 
> From v1[2]:
> * make skb_mark_for_recycle() always available, otherwise there are build
>   failures on non-PP systems (kbuild bot);
> * 'Page Pool' -> 'page_pool' when it's about a page_pool instance, not
>   API (Jesper);
> * expanded test system info a bit in the cover letter (Jesper).
> 
> [1] https://lore.kernel.org/bpf/20230303133232.2546004-1-aleksander.lobakin@intel.com
> [2] https://lore.kernel.org/bpf/20230301160315.1022488-1-aleksander.lobakin@intel.com

Thanks,
Olek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15 18:12               ` Alexander Lobakin
@ 2023-03-15 18:26                 ` Ilya Leoshkevich
  2023-03-16 13:22                   ` Alexander Lobakin
  0 siblings, 1 reply; 20+ messages in thread
From: Ilya Leoshkevich @ 2023-03-15 18:26 UTC (permalink / raw)
  To: Alexander Lobakin, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

On Wed, 2023-03-15 at 19:12 +0100, Alexander Lobakin wrote:
> From: Ilya Leoshkevich <iii@linux.ibm.com>
> Date: Wed, 15 Mar 2023 19:00:47 +0100
> 
> > On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote:
> > > On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote:
> > > > From: Alexander Lobakin <aleksander.lobakin@intel.com>
> > > > Date: Wed, 15 Mar 2023 10:56:25 +0100
> > > > 
> > > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > > > Date: Tue, 14 Mar 2023 16:54:25 -0700
> > > > > 
> > > > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
> > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > test_xdp_do_redirect:PASS:prog_run 0 nsec
> > > > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> > > > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> > > > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected
> > > > > > pkt_count_tc:
> > > > > > actual
> > > > > > 220 != expected 9998
> > > > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> > > > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> > > > > > close_netns:PASS:setns 0 nsec
> > > > > > #289 xdp_do_redirect:FAIL
> > > > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
> > > > > > 
> > > > > > Alex,
> > > > > > could you please take a look at why it's happening?
> > > > > > 
> > > > > > I suspect it's an endianness issue in:
> > > > > >         if (*metadata != 0x42)
> > > > > >                 return XDP_ABORTED;
> > > > > > but your patch didn't change that,
> > > > > > so I'm not sure why it worked before.
> > > > > 
> > > > > Sure, lemme fix it real quick.
> > > > 
> > > > Hi Ilya,
> > > > 
> > > > Do you have s390 testing setups? Maybe you could take a look,
> > > > since
> > > > I
> > > > don't have one and can't debug it? Doesn't seem to be
> > > > Endianness
> > > > issue.
> > > > I mean, I have this (the below patch), but not sure it will fix
> > > > anything -- IIRC eBPF arch always matches the host arch ._.
> > > > I can't figure out from the code what does happen wrongly :s
> > > > And it
> > > > happens only on s390.
> > > > 
> > > > Thanks,
> > > > Olek
> > > > ---
> > > > diff --git
> > > > a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > > > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > > > index 662b6c6c5ed7..b21371668447 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > > > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
> > > >                             .attach_point = BPF_TC_INGRESS);
> > > >  
> > > >         memcpy(&data[sizeof(__u32)], &pkt_udp,
> > > > sizeof(pkt_udp));
> > > > -       *((__u32 *)data) = 0x42; /* metadata test value */
> > > > +       *((__u32 *)data) = htonl(0x42); /* metadata test value
> > > > */
> > > >  
> > > >         skel = test_xdp_do_redirect__open();
> > > >         if (!ASSERT_OK_PTR(skel, "skel"))
> > > > diff --git
> > > > a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > > > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > > > index cd2d4e3258b8..2475bc30ced2 100644
> > > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > > > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > > > @@ -1,5 +1,6 @@
> > > >  // SPDX-License-Identifier: GPL-2.0
> > > >  #include <vmlinux.h>
> > > > +#include <bpf/bpf_endian.h>
> > > >  #include <bpf/bpf_helpers.h>
> > > >  
> > > >  #define ETH_ALEN 6
> > > > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
> > > >  SEC("xdp")
> > > >  int xdp_redirect(struct xdp_md *xdp)
> > > >  {
> > > > -       __u32 *metadata = (void *)(long)xdp->data_meta;
> > > > +       __be32 *metadata = (void *)(long)xdp->data_meta;
> > > >         void *data_end = (void *)(long)xdp->data_end;
> > > >         void *data = (void *)(long)xdp->data;
> > > >  
> > > > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
> > > >         if (metadata + 1 > data)
> > > >                 return XDP_ABORTED;
> > > >  
> > > > -       if (*metadata != 0x42)
> > > > +       if (*metadata != __bpf_htonl(0x42))
> > > >                 return XDP_ABORTED;
> > > >  
> > > >         if (*payload == MARK_XMIT)
> > > 
> > > Okay, I'll take a look. Two quick observations for now:
> > > 
> > > - Unfortunately the above patch does not help.
> > > 
> > > - In dmesg I see:
> > > 
> > >     Driver unsupported XDP return value 0 on prog xdp_redirect
> > > (id
> > > 23)
> > >     dev N/A, expect packet loss!
> > 
> > I haven't identified the issue yet, but I have found a couple more
> > things that might be helpful:
> > 
> > - In problematic cases metadata contains 0, so this is not an
> >   endianness issue. data is still reasonable though. I'm trying to
> >   understand what is causing this.
> > 
> > - Applying the following diff:
> > 
> > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)
> >  
> >         *payload = MARK_IN;
> >  
> > -       if (bpf_xdp_adjust_meta(xdp, 4))
> > +       if (false && bpf_xdp_adjust_meta(xdp, 4))
> >                 return XDP_ABORTED;
> >  
> >         if (retcode > XDP_PASS)
> > 
> > causes a kernel panic even on x86_64:
> > 
> > BUG: kernel NULL pointer dereference, address:
> > 0000000000000d28       
> > ...
> > Call Trace:            
> >  <TASK>                                                            
> >    
> >  build_skb_around+0x22/0xb0
> >  __xdp_build_skb_from_frame+0x4e/0x130
> >  bpf_test_run_xdp_live+0x65f/0x7c0
> >  ? __pfx_xdp_test_run_init_page+0x10/0x10
> >  bpf_prog_test_run_xdp+0x2ba/0x480
> >  bpf_prog_test_run+0xeb/0x110
> >  __sys_bpf+0x2b9/0x570
> >  __x64_sys_bpf+0x1c/0x30
> >  do_syscall_64+0x48/0xa0
> >  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > 
> > I haven't looked into this at all, but I believe this needs to be
> > fixed - BPF should never cause kernel panics.
> 
> This one is basically the same issue as syzbot mentioned today
> (separate
> subthread). I'm waiting for a feedback from Toke on which way of
> fixing
> he'd prefer (I proposed 2). If those zeroed metadata magics that you
> observe have the same roots with the panic, one fix will smash 2
> issues.
> 
> Thanks,
> Olek

Sounds good, I will wait for an update then.

In the meantime, I found the code that overwrites the metadata:

#0  0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
skb=0x88142200) at linux/include/net/neighbour.h:503
#1  0x0000000000ab2cda in neigh_output (skip_cache=false,
skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
#2  ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
#3  0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
#4  ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
linux/net/ipv6/ip6_output.c:206
#5  0x0000000000ab5cbc in dst_input (skb=<optimized out>) at
linux/include/net/dst.h:454
#6  ip6_sublist_rcv_finish (head=head@entry=0x38000dbf520) at
linux/net/ipv6/ip6_input.c:88
#7  0x0000000000ab6104 in ip6_list_rcv_finish (net=<optimized out>,
head=<optimized out>, sk=0x0) at linux/net/ipv6/ip6_input.c:145
#8  0x0000000000ab72bc in ipv6_list_rcv (head=0x38000dbf638,
pt=<optimized out>, orig_dev=<optimized out>) at
linux/net/ipv6/ip6_input.c:354
#9  0x00000000008b3710 in __netif_receive_skb_list_ptype
(orig_dev=0x880b8000, pt_prev=0x176b7f8 <ipv6_packet_type>,
head=0x38000dbf638) at linux/net/core/dev.c:5520
#10 __netif_receive_skb_list_core (head=head@entry=0x38000dbf7b8,
pfmemalloc=pfmemalloc@entry=false) at linux/net/core/dev.c:5568
#11 0x00000000008b4390 in __netif_receive_skb_list (head=0x38000dbf7b8)
at linux/net/core/dev.c:5620
#12 netif_receive_skb_list_internal (head=head@entry=0x38000dbf7b8) at
linux/net/core/dev.c:5711
#13 0x00000000008b45ce in netif_receive_skb_list
(head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5763
#14 0x0000000000950782 in xdp_recv_frames (dev=<optimized out>,
skbs=<optimized out>, nframes=62, frames=0x8587c600) at
linux/net/bpf/test_run.c:256
#15 xdp_test_run_batch (xdp=xdp@entry=0x38000dbf900,
prog=prog@entry=0x37fffe75000, repeat=<optimized out>) at
linux/net/bpf/test_run.c:334

namely:

static inline int neigh_hh_output(const struct hh_cache *hh, struct
sk_buff *skb)
   ...
   memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);

It's hard for me to see what is going on here, since I'm not familiar
with the networking code - since XDP metadata is located at the end of
headroom, should not there be something that prevents the network stack
from overwriting it? Or can it be that netif_receive_skb_list() is
free to do whatever it wants with that memory and we cannot expect to
receive it back intact?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15 18:00             ` Ilya Leoshkevich
@ 2023-03-15 18:12               ` Alexander Lobakin
  2023-03-15 18:26                 ` Ilya Leoshkevich
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-15 18:12 UTC (permalink / raw)
  To: Ilya Leoshkevich, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

From: Ilya Leoshkevich <iii@linux.ibm.com>
Date: Wed, 15 Mar 2023 19:00:47 +0100

> On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote:
>> On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote:
>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> Date: Wed, 15 Mar 2023 10:56:25 +0100
>>>
>>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>> Date: Tue, 14 Mar 2023 16:54:25 -0700
>>>>
>>>>> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>> [...]
>>>>
>>>>> test_xdp_do_redirect:PASS:prog_run 0 nsec
>>>>> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
>>>>> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
>>>>> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc:
>>>>> actual
>>>>> 220 != expected 9998
>>>>> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
>>>>> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
>>>>> close_netns:PASS:setns 0 nsec
>>>>> #289 xdp_do_redirect:FAIL
>>>>> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
>>>>>
>>>>> Alex,
>>>>> could you please take a look at why it's happening?
>>>>>
>>>>> I suspect it's an endianness issue in:
>>>>>         if (*metadata != 0x42)
>>>>>                 return XDP_ABORTED;
>>>>> but your patch didn't change that,
>>>>> so I'm not sure why it worked before.
>>>>
>>>> Sure, lemme fix it real quick.
>>>
>>> Hi Ilya,
>>>
>>> Do you have s390 testing setups? Maybe you could take a look, since
>>> I
>>> don't have one and can't debug it? Doesn't seem to be Endianness
>>> issue.
>>> I mean, I have this (the below patch), but not sure it will fix
>>> anything -- IIRC eBPF arch always matches the host arch ._.
>>> I can't figure out from the code what does happen wrongly :s And it
>>> happens only on s390.
>>>
>>> Thanks,
>>> Olek
>>> ---
>>> diff --git
>>> a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>> b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>> index 662b6c6c5ed7..b21371668447 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
>>> @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
>>>                             .attach_point = BPF_TC_INGRESS);
>>>  
>>>         memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
>>> -       *((__u32 *)data) = 0x42; /* metadata test value */
>>> +       *((__u32 *)data) = htonl(0x42); /* metadata test value */
>>>  
>>>         skel = test_xdp_do_redirect__open();
>>>         if (!ASSERT_OK_PTR(skel, "skel"))
>>> diff --git
>>> a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>> b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>> index cd2d4e3258b8..2475bc30ced2 100644
>>> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
>>> @@ -1,5 +1,6 @@
>>>  // SPDX-License-Identifier: GPL-2.0
>>>  #include <vmlinux.h>
>>> +#include <bpf/bpf_endian.h>
>>>  #include <bpf/bpf_helpers.h>
>>>  
>>>  #define ETH_ALEN 6
>>> @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
>>>  SEC("xdp")
>>>  int xdp_redirect(struct xdp_md *xdp)
>>>  {
>>> -       __u32 *metadata = (void *)(long)xdp->data_meta;
>>> +       __be32 *metadata = (void *)(long)xdp->data_meta;
>>>         void *data_end = (void *)(long)xdp->data_end;
>>>         void *data = (void *)(long)xdp->data;
>>>  
>>> @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
>>>         if (metadata + 1 > data)
>>>                 return XDP_ABORTED;
>>>  
>>> -       if (*metadata != 0x42)
>>> +       if (*metadata != __bpf_htonl(0x42))
>>>                 return XDP_ABORTED;
>>>  
>>>         if (*payload == MARK_XMIT)
>>
>> Okay, I'll take a look. Two quick observations for now:
>>
>> - Unfortunately the above patch does not help.
>>
>> - In dmesg I see:
>>
>>     Driver unsupported XDP return value 0 on prog xdp_redirect (id
>> 23)
>>     dev N/A, expect packet loss!
> 
> I haven't identified the issue yet, but I have found a couple more
> things that might be helpful:
> 
> - In problematic cases metadata contains 0, so this is not an
>   endianness issue. data is still reasonable though. I'm trying to
>   understand what is causing this.
> 
> - Applying the following diff:
> 
> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)
>  
>         *payload = MARK_IN;
>  
> -       if (bpf_xdp_adjust_meta(xdp, 4))
> +       if (false && bpf_xdp_adjust_meta(xdp, 4))
>                 return XDP_ABORTED;
>  
>         if (retcode > XDP_PASS)
> 
> causes a kernel panic even on x86_64:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000d28       
> ...
> Call Trace:            
>  <TASK>                                                               
>  build_skb_around+0x22/0xb0
>  __xdp_build_skb_from_frame+0x4e/0x130
>  bpf_test_run_xdp_live+0x65f/0x7c0
>  ? __pfx_xdp_test_run_init_page+0x10/0x10
>  bpf_prog_test_run_xdp+0x2ba/0x480
>  bpf_prog_test_run+0xeb/0x110
>  __sys_bpf+0x2b9/0x570
>  __x64_sys_bpf+0x1c/0x30
>  do_syscall_64+0x48/0xa0
>  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> I haven't looked into this at all, but I believe this needs to be
> fixed - BPF should never cause kernel panics.

This one is basically the same issue as syzbot mentioned today (separate
subthread). I'm waiting for a feedback from Toke on which way of fixing
he'd prefer (I proposed 2). If those zeroed metadata magics that you
observe have the same roots with the panic, one fix will smash 2 issues.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15 14:54           ` Ilya Leoshkevich
@ 2023-03-15 18:00             ` Ilya Leoshkevich
  2023-03-15 18:12               ` Alexander Lobakin
  0 siblings, 1 reply; 20+ messages in thread
From: Ilya Leoshkevich @ 2023-03-15 18:00 UTC (permalink / raw)
  To: Alexander Lobakin, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote:
> On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote:
> > From: Alexander Lobakin <aleksander.lobakin@intel.com>
> > Date: Wed, 15 Mar 2023 10:56:25 +0100
> > 
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Tue, 14 Mar 2023 16:54:25 -0700
> > > 
> > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > 
> > > [...]
> > > 
> > > > test_xdp_do_redirect:PASS:prog_run 0 nsec
> > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc:
> > > > actual
> > > > 220 != expected 9998
> > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> > > > close_netns:PASS:setns 0 nsec
> > > > #289 xdp_do_redirect:FAIL
> > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
> > > > 
> > > > Alex,
> > > > could you please take a look at why it's happening?
> > > > 
> > > > I suspect it's an endianness issue in:
> > > >         if (*metadata != 0x42)
> > > >                 return XDP_ABORTED;
> > > > but your patch didn't change that,
> > > > so I'm not sure why it worked before.
> > > 
> > > Sure, lemme fix it real quick.
> > 
> > Hi Ilya,
> > 
> > Do you have s390 testing setups? Maybe you could take a look, since
> > I
> > don't have one and can't debug it? Doesn't seem to be Endianness
> > issue.
> > I mean, I have this (the below patch), but not sure it will fix
> > anything -- IIRC eBPF arch always matches the host arch ._.
> > I can't figure out from the code what does happen wrongly :s And it
> > happens only on s390.
> > 
> > Thanks,
> > Olek
> > ---
> > diff --git
> > a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > index 662b6c6c5ed7..b21371668447 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
> >                             .attach_point = BPF_TC_INGRESS);
> >  
> >         memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
> > -       *((__u32 *)data) = 0x42; /* metadata test value */
> > +       *((__u32 *)data) = htonl(0x42); /* metadata test value */
> >  
> >         skel = test_xdp_do_redirect__open();
> >         if (!ASSERT_OK_PTR(skel, "skel"))
> > diff --git
> > a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > index cd2d4e3258b8..2475bc30ced2 100644
> > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > @@ -1,5 +1,6 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include <vmlinux.h>
> > +#include <bpf/bpf_endian.h>
> >  #include <bpf/bpf_helpers.h>
> >  
> >  #define ETH_ALEN 6
> > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
> >  SEC("xdp")
> >  int xdp_redirect(struct xdp_md *xdp)
> >  {
> > -       __u32 *metadata = (void *)(long)xdp->data_meta;
> > +       __be32 *metadata = (void *)(long)xdp->data_meta;
> >         void *data_end = (void *)(long)xdp->data_end;
> >         void *data = (void *)(long)xdp->data;
> >  
> > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
> >         if (metadata + 1 > data)
> >                 return XDP_ABORTED;
> >  
> > -       if (*metadata != 0x42)
> > +       if (*metadata != __bpf_htonl(0x42))
> >                 return XDP_ABORTED;
> >  
> >         if (*payload == MARK_XMIT)
> 
> Okay, I'll take a look. Two quick observations for now:
> 
> - Unfortunately the above patch does not help.
> 
> - In dmesg I see:
> 
>     Driver unsupported XDP return value 0 on prog xdp_redirect (id
> 23)
>     dev N/A, expect packet loss!

I haven't identified the issue yet, but I have found a couple more
things that might be helpful:

- In problematic cases metadata contains 0, so this is not an
  endianness issue. data is still reasonable though. I'm trying to
  understand what is causing this.

- Applying the following diff:

--- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)
 
        *payload = MARK_IN;
 
-       if (bpf_xdp_adjust_meta(xdp, 4))
+       if (false && bpf_xdp_adjust_meta(xdp, 4))
                return XDP_ABORTED;
 
        if (retcode > XDP_PASS)

causes a kernel panic even on x86_64:

BUG: kernel NULL pointer dereference, address: 0000000000000d28       
...
Call Trace:            
 <TASK>                                                               
 build_skb_around+0x22/0xb0
 __xdp_build_skb_from_frame+0x4e/0x130
 bpf_test_run_xdp_live+0x65f/0x7c0
 ? __pfx_xdp_test_run_init_page+0x10/0x10
 bpf_prog_test_run_xdp+0x2ba/0x480
 bpf_prog_test_run+0xeb/0x110
 __sys_bpf+0x2b9/0x570
 __x64_sys_bpf+0x1c/0x30
 do_syscall_64+0x48/0xa0
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

I haven't looked into this at all, but I believe this needs to be
fixed - BPF should never cause kernel panics.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15 10:54         ` Alexander Lobakin
  2023-03-15 14:54           ` Ilya Leoshkevich
@ 2023-03-15 16:55           ` Alexei Starovoitov
  1 sibling, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2023-03-15 16:55 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Ilya Leoshkevich, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Maciej Fijalkowski,
	Larysa Zaremba, Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

On Wed, Mar 15, 2023 at 3:55 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Date: Wed, 15 Mar 2023 10:56:25 +0100
>
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Date: Tue, 14 Mar 2023 16:54:25 -0700
> >
> >> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >
> > [...]
> >
> >> test_xdp_do_redirect:PASS:prog_run 0 nsec
> >> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> >> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> >> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
> >> 220 != expected 9998
> >> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> >> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> >> close_netns:PASS:setns 0 nsec
> >> #289 xdp_do_redirect:FAIL
> >> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
> >>
> >> Alex,
> >> could you please take a look at why it's happening?
> >>
> >> I suspect it's an endianness issue in:
> >>         if (*metadata != 0x42)
> >>                 return XDP_ABORTED;
> >> but your patch didn't change that,
> >> so I'm not sure why it worked before.
> >
> > Sure, lemme fix it real quick.
>
> Hi Ilya,
>
> Do you have s390 testing setups? Maybe you could take a look, since I
> don't have one and can't debug it? Doesn't seem to be Endianness issue.
> I mean, I have this (the below patch), but not sure it will fix
> anything -- IIRC eBPF arch always matches the host arch ._.
> I can't figure out from the code what does happen wrongly :s And it
> happens only on s390.
>
> Thanks,
> Olek
> ---
> diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> index 662b6c6c5ed7..b21371668447 100644
> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
>                             .attach_point = BPF_TC_INGRESS);
>
>         memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
> -       *((__u32 *)data) = 0x42; /* metadata test value */
> +       *((__u32 *)data) = htonl(0x42); /* metadata test value */
>
>         skel = test_xdp_do_redirect__open();
>         if (!ASSERT_OK_PTR(skel, "skel"))
> diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> index cd2d4e3258b8..2475bc30ced2 100644
> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <vmlinux.h>
> +#include <bpf/bpf_endian.h>
>  #include <bpf/bpf_helpers.h>
>
>  #define ETH_ALEN 6
> @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
>  SEC("xdp")
>  int xdp_redirect(struct xdp_md *xdp)
>  {
> -       __u32 *metadata = (void *)(long)xdp->data_meta;
> +       __be32 *metadata = (void *)(long)xdp->data_meta;
>         void *data_end = (void *)(long)xdp->data_end;
>         void *data = (void *)(long)xdp->data;
>
> @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
>         if (metadata + 1 > data)
>                 return XDP_ABORTED;
>
> -       if (*metadata != 0x42)
> +       if (*metadata != __bpf_htonl(0x42))
>                 return XDP_ABORTED;

Looks sane to me.
I'd probably use 'u8 * metadata' instead. Both in bpf and user space
just not to worry about endianness.
Could you please submit an official patch and let CI judge?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15 10:54         ` Alexander Lobakin
@ 2023-03-15 14:54           ` Ilya Leoshkevich
  2023-03-15 18:00             ` Ilya Leoshkevich
  2023-03-15 16:55           ` Alexei Starovoitov
  1 sibling, 1 reply; 20+ messages in thread
From: Ilya Leoshkevich @ 2023-03-15 14:54 UTC (permalink / raw)
  To: Alexander Lobakin, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote:
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Date: Wed, 15 Mar 2023 10:56:25 +0100
> 
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Date: Tue, 14 Mar 2023 16:54:25 -0700
> > 
> > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > 
> > [...]
> > 
> > > test_xdp_do_redirect:PASS:prog_run 0 nsec
> > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc:
> > > actual
> > > 220 != expected 9998
> > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> > > close_netns:PASS:setns 0 nsec
> > > #289 xdp_do_redirect:FAIL
> > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
> > > 
> > > Alex,
> > > could you please take a look at why it's happening?
> > > 
> > > I suspect it's an endianness issue in:
> > >         if (*metadata != 0x42)
> > >                 return XDP_ABORTED;
> > > but your patch didn't change that,
> > > so I'm not sure why it worked before.
> > 
> > Sure, lemme fix it real quick.
> 
> Hi Ilya,
> 
> Do you have s390 testing setups? Maybe you could take a look, since I
> don't have one and can't debug it? Doesn't seem to be Endianness
> issue.
> I mean, I have this (the below patch), but not sure it will fix
> anything -- IIRC eBPF arch always matches the host arch ._.
> I can't figure out from the code what does happen wrongly :s And it
> happens only on s390.
> 
> Thanks,
> Olek
> ---
> diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> index 662b6c6c5ed7..b21371668447 100644
> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
>                             .attach_point = BPF_TC_INGRESS);
>  
>         memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
> -       *((__u32 *)data) = 0x42; /* metadata test value */
> +       *((__u32 *)data) = htonl(0x42); /* metadata test value */
>  
>         skel = test_xdp_do_redirect__open();
>         if (!ASSERT_OK_PTR(skel, "skel"))
> diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> index cd2d4e3258b8..2475bc30ced2 100644
> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <vmlinux.h>
> +#include <bpf/bpf_endian.h>
>  #include <bpf/bpf_helpers.h>
>  
>  #define ETH_ALEN 6
> @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
>  SEC("xdp")
>  int xdp_redirect(struct xdp_md *xdp)
>  {
> -       __u32 *metadata = (void *)(long)xdp->data_meta;
> +       __be32 *metadata = (void *)(long)xdp->data_meta;
>         void *data_end = (void *)(long)xdp->data_end;
>         void *data = (void *)(long)xdp->data;
>  
> @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
>         if (metadata + 1 > data)
>                 return XDP_ABORTED;
>  
> -       if (*metadata != 0x42)
> +       if (*metadata != __bpf_htonl(0x42))
>                 return XDP_ABORTED;
>  
>         if (*payload == MARK_XMIT)

Okay, I'll take a look. Two quick observations for now:

- Unfortunately the above patch does not help.

- In dmesg I see:

    Driver unsupported XDP return value 0 on prog xdp_redirect (id 23)
    dev N/A, expect packet loss!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-15  9:56       ` Alexander Lobakin
@ 2023-03-15 10:54         ` Alexander Lobakin
  2023-03-15 14:54           ` Ilya Leoshkevich
  2023-03-15 16:55           ` Alexei Starovoitov
  0 siblings, 2 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-15 10:54 UTC (permalink / raw)
  To: Ilya Leoshkevich, Alexei Starovoitov
  Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Maciej Fijalkowski,
	Larysa Zaremba, Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

From: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Wed, 15 Mar 2023 10:56:25 +0100

> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Tue, 14 Mar 2023 16:54:25 -0700
> 
>> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
> 
> [...]
> 
>> test_xdp_do_redirect:PASS:prog_run 0 nsec
>> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
>> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
>> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
>> 220 != expected 9998
>> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
>> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
>> close_netns:PASS:setns 0 nsec
>> #289 xdp_do_redirect:FAIL
>> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
>>
>> Alex,
>> could you please take a look at why it's happening?
>>
>> I suspect it's an endianness issue in:
>>         if (*metadata != 0x42)
>>                 return XDP_ABORTED;
>> but your patch didn't change that,
>> so I'm not sure why it worked before.
> 
> Sure, lemme fix it real quick.

Hi Ilya,

Do you have s390 testing setups? Maybe you could take a look, since I
don't have one and can't debug it? Doesn't seem to be Endianness issue.
I mean, I have this (the below patch), but not sure it will fix
anything -- IIRC eBPF arch always matches the host arch ._.
I can't figure out from the code what does happen wrongly :s And it
happens only on s390.

Thanks,
Olek
---
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
index 662b6c6c5ed7..b21371668447 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
@@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
 			    .attach_point = BPF_TC_INGRESS);
 
 	memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
-	*((__u32 *)data) = 0x42; /* metadata test value */
+	*((__u32 *)data) = htonl(0x42); /* metadata test value */
 
 	skel = test_xdp_do_redirect__open();
 	if (!ASSERT_OK_PTR(skel, "skel"))
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
index cd2d4e3258b8..2475bc30ced2 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <vmlinux.h>
+#include <bpf/bpf_endian.h>
 #include <bpf/bpf_helpers.h>
 
 #define ETH_ALEN 6
@@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
 SEC("xdp")
 int xdp_redirect(struct xdp_md *xdp)
 {
-	__u32 *metadata = (void *)(long)xdp->data_meta;
+	__be32 *metadata = (void *)(long)xdp->data_meta;
 	void *data_end = (void *)(long)xdp->data_end;
 	void *data = (void *)(long)xdp->data;
 
@@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
 	if (metadata + 1 > data)
 		return XDP_ABORTED;
 
-	if (*metadata != 0x42)
+	if (*metadata != __bpf_htonl(0x42))
 		return XDP_ABORTED;
 
 	if (*payload == MARK_XMIT)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-14 23:54     ` Alexei Starovoitov
@ 2023-03-15  9:56       ` Alexander Lobakin
  2023-03-15 10:54         ` Alexander Lobakin
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-15  9:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML, Ilya Leoshkevich

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Tue, 14 Mar 2023 16:54:25 -0700

> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:

[...]

> test_xdp_do_redirect:PASS:prog_run 0 nsec
> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
> 220 != expected 9998
> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> close_netns:PASS:setns 0 nsec
> #289 xdp_do_redirect:FAIL
> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
> 
> Alex,
> could you please take a look at why it's happening?
> 
> I suspect it's an endianness issue in:
>         if (*metadata != 0x42)
>                 return XDP_ABORTED;
> but your patch didn't change that,
> so I'm not sure why it worked before.

Sure, lemme fix it real quick.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-14 18:52   ` Alexei Starovoitov
@ 2023-03-14 23:54     ` Alexei Starovoitov
  2023-03-15  9:56       ` Alexander Lobakin
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2023-03-14 23:54 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML, Ilya Leoshkevich

On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Mar 14, 2023 at 4:58 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
> >
> >   All error logs:
> >   libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad
> > address
> >   libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG --
> >   The sequence of 8193 jumps is too complex.
> >   verification time 77808 usec
> >   stack depth 64
> >   processed 156616 insns (limit 1000000) max_states_per_insn 8
> > total_states 1754 peak_states 1712 mark_read 12
> >   -- END PROG LOAD LOG --
> >   libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
> >   libbpf: failed to load object 'loop6.bpf.o'
> >   scale_test:FAIL:expect_success unexpected error: -14 (errno 14)
> >   #257     verif_scale_loop6:FAIL
> >   Summary: 288/1766 PASSED, 21 SKIPPED, 1 FAILED
> >
> > So, xdp_do_redirect, which was previously failing, now works fine. OTOH,
> > "verif_scale_loop6" now fails, but from what I understand from the log,
> > it has nothing with the series ("8193 jumps is too complex" -- I don't
> > even touch program-related stuff). I don't know what's the reason of it
> > failing, can it be some CI issues or maybe some recent commits?
>
> Yeah. It's an issue with the latest clang.
> We don't have a workaround for this yet.
> It's not a blocker for your patchset.
> We didn't have time to look at it closely.

I applied the workaround for this test.
It's all green now except s390 where it fails with

test_xdp_do_redirect:PASS:prog_run 0 nsec
test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
220 != expected 9998
test_max_pkt_size:PASS:prog_run_max_size 0 nsec
test_max_pkt_size:PASS:prog_run_too_big 0 nsec
close_netns:PASS:setns 0 nsec
#289 xdp_do_redirect:FAIL
Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED

Alex,
could you please take a look at why it's happening?

I suspect it's an endianness issue in:
        if (*metadata != 0x42)
                return XDP_ABORTED;
but your patch didn't change that,
so I'm not sure why it worked before.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-13 21:55 Alexander Lobakin
  2023-03-14 11:57 ` Alexander Lobakin
@ 2023-03-14 22:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-03-14 22:30 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: ast, daniel, andrii, martin.lau, maciej.fijalkowski,
	larysa.zaremba, toke, song, hawk, john.fastabend, imagedong,
	mykolal, davem, kuba, edumazet, pabeni, bpf, netdev,
	linux-kernel

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Mon, 13 Mar 2023 22:55:49 +0100 you wrote:
> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway.
> 
> __xdp_build_skb_from_frame() missed the moment when the networking stack
> became able to recycle skb pages backed by a page_pool. This was making
> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was
> also affected in some scenarios.
> A lot of drivers use skb_mark_for_recycle() already, it's been almost
> two years and seems like there are no issues in using it in the generic
> code too. {__,}xdp_release_frame() can be then removed as it losts its
> last user.
> Page Pool becomes then zero-alloc (or almost) in the abovementioned
> cases, too. Other memory type models (who needs them at this point)
> have no changes.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics
    https://git.kernel.org/bpf/bpf-next/c/487deb3e3393
  - [bpf-next,v3,2/4] net: page_pool, skbuff: make skb_mark_for_recycle() always available
    https://git.kernel.org/bpf/bpf-next/c/2c854e5fcd7e
  - [bpf-next,v3,3/4] xdp: recycle Page Pool backed skbs built from XDP frames
    https://git.kernel.org/bpf/bpf-next/c/9c94bbf9a87b
  - [bpf-next,v3,4/4] xdp: remove unused {__,}xdp_release_frame()
    https://git.kernel.org/bpf/bpf-next/c/d4e492338d11

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-14 11:57 ` Alexander Lobakin
@ 2023-03-14 18:52   ` Alexei Starovoitov
  2023-03-14 23:54     ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2023-03-14 18:52 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, Network Development, LKML

On Tue, Mar 14, 2023 at 4:58 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
>   All error logs:
>   libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad
> address
>   libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG --
>   The sequence of 8193 jumps is too complex.
>   verification time 77808 usec
>   stack depth 64
>   processed 156616 insns (limit 1000000) max_states_per_insn 8
> total_states 1754 peak_states 1712 mark_read 12
>   -- END PROG LOAD LOG --
>   libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
>   libbpf: failed to load object 'loop6.bpf.o'
>   scale_test:FAIL:expect_success unexpected error: -14 (errno 14)
>   #257     verif_scale_loop6:FAIL
>   Summary: 288/1766 PASSED, 21 SKIPPED, 1 FAILED
>
> So, xdp_do_redirect, which was previously failing, now works fine. OTOH,
> "verif_scale_loop6" now fails, but from what I understand from the log,
> it has nothing with the series ("8193 jumps is too complex" -- I don't
> even touch program-related stuff). I don't know what's the reason of it
> failing, can it be some CI issues or maybe some recent commits?

Yeah. It's an issue with the latest clang.
We don't have a workaround for this yet.
It's not a blocker for your patchset.
We didn't have time to look at it closely.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
  2023-03-13 21:55 Alexander Lobakin
@ 2023-03-14 11:57 ` Alexander Lobakin
  2023-03-14 18:52   ` Alexei Starovoitov
  2023-03-14 22:30 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-14 11:57 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

From: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Mon, 13 Mar 2023 22:55:49 +0100

> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway.
> 
> __xdp_build_skb_from_frame() missed the moment when the networking stack
> became able to recycle skb pages backed by a page_pool. This was making
> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was
> also affected in some scenarios.
[...]

Regarding failing tests, here's a piece of logs:

  #288     xdp_devmap_attach:OK
  [  156.324473] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
  [  156.362859] bond2 (unregistering): Released all slaves
  #289     xdp_do_redirect:OK
  #290     xdp_info:OK

[...]

  #297/1   xfrm_info/xfrm_info:OK
  #297     xfrm_info:OK

  All error logs:
  libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad
address
  libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG --
  The sequence of 8193 jumps is too complex.
  verification time 77808 usec
  stack depth 64
  processed 156616 insns (limit 1000000) max_states_per_insn 8
total_states 1754 peak_states 1712 mark_read 12
  -- END PROG LOAD LOG --
  libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
  libbpf: failed to load object 'loop6.bpf.o'
  scale_test:FAIL:expect_success unexpected error: -14 (errno 14)
  #257     verif_scale_loop6:FAIL
  Summary: 288/1766 PASSED, 21 SKIPPED, 1 FAILED

So, xdp_do_redirect, which was previously failing, now works fine. OTOH,
"verif_scale_loop6" now fails, but from what I understand from the log,
it has nothing with the series ("8193 jumps is too complex" -- I don't
even touch program-related stuff). I don't know what's the reason of it
failing, can it be some CI issues or maybe some recent commits?

Thanks,
Olek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
@ 2023-03-13 21:55 Alexander Lobakin
  2023-03-14 11:57 ` Alexander Lobakin
  2023-03-14 22:30 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 21:55 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway.

__xdp_build_skb_from_frame() missed the moment when the networking stack
became able to recycle skb pages backed by a page_pool. This was making
e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was
also affected in some scenarios.
A lot of drivers use skb_mark_for_recycle() already, it's been almost
two years and seems like there are no issues in using it in the generic
code too. {__,}xdp_release_frame() can be then removed as it losts its
last user.
Page Pool becomes then zero-alloc (or almost) in the abovementioned
cases, too. Other memory type models (who needs them at this point)
have no changes.

Some numbers on 1 Xeon Platinum core bombed with 27 Mpps of 64-byte
IPv6 UDP, iavf w/XDP[0] (CONFIG_PAGE_POOL_STATS is enabled):

Plain %XDP_PASS on baseline, Page Pool driver:

src cpu Rx     drops  dst cpu Rx
  2.1 Mpps       N/A    2.1 Mpps

cpumap redirect (cross-core, w/o leaving its NUMA node) on baseline:

  6.8 Mpps  5.0 Mpps    1.8 Mpps

cpumap redirect with skb PP recycling:

  7.9 Mpps  5.7 Mpps    2.2 Mpps
                       +22% (from cpumap redir on baseline)

[0] https://github.com/alobakin/linux/commits/iavf-xdp

Alexander Lobakin (4):
  selftests/bpf: robustify test_xdp_do_redirect with more payload magics
  net: page_pool, skbuff: make skb_mark_for_recycle() always available
  xdp: recycle Page Pool backed skbs built from XDP frames
  xdp: remove unused {__,}xdp_release_frame()

 include/linux/skbuff.h                        |  4 +--
 include/net/xdp.h                             | 29 ---------------
 net/core/xdp.c                                | 19 ++--------
 .../bpf/progs/test_xdp_do_redirect.c          | 36 +++++++++++++------
 4 files changed, 30 insertions(+), 58 deletions(-)

---
From v2[1]:
* fix the test_xdp_do_redirect selftest failing after the series: it was
  relying on that %XDP_PASS frames can't be recycled on veth
  (BPF CI, Alexei);
* explain "w/o leaving its node" in the cover letter (Jesper).

From v1[2]:
* make skb_mark_for_recycle() always available, otherwise there are build
  failures on non-PP systems (kbuild bot);
* 'Page Pool' -> 'page_pool' when it's about a page_pool instance, not
  API (Jesper);
* expanded test system info a bit in the cover letter (Jesper).

[1] https://lore.kernel.org/bpf/20230303133232.2546004-1-aleksander.lobakin@intel.com
[2] https://lore.kernel.org/bpf/20230301160315.1022488-1-aleksander.lobakin@intel.com
-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames
@ 2023-03-13 21:42 Alexander Lobakin
  2023-03-16 11:57 ` Alexander Lobakin
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2023-03-13 21:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Toke Høiland-Jørgensen, Song Liu,
	Jesper Dangaard Brouer, John Fastabend, Menglong Dong,
	Mykola Lysenko, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, bpf, netdev, linux-kernel

Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway.

__xdp_build_skb_from_frame() missed the moment when the networking stack
became able to recycle skb pages backed by a page_pool. This was making
e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was
also affected in some scenarios.
A lot of drivers use skb_mark_for_recycle() already, it's been almost
two years and seems like there are no issues in using it in the generic
code too. {__,}xdp_release_frame() can be then removed as it losts its
last user.
Page Pool becomes then zero-alloc (or almost) in the abovementioned
cases, too. Other memory type models (who needs them at this point)
have no changes.

Some numbers on 1 Xeon Platinum core bombed with 27 Mpps of 64-byte
IPv6 UDP, iavf w/XDP[0] (CONFIG_PAGE_POOL_STATS is enabled):

Plain %XDP_PASS on baseline, Page Pool driver:

src cpu Rx     drops  dst cpu Rx
  2.1 Mpps       N/A    2.1 Mpps

cpumap redirect (cross-core, w/o leaving its NUMA node) on baseline:

  6.8 Mpps  5.0 Mpps    1.8 Mpps

cpumap redirect with skb PP recycling:

  7.9 Mpps  5.7 Mpps    2.2 Mpps
                       +22% (from cpumap redir on baseline)

[0] https://github.com/alobakin/linux/commits/iavf-xdp

Alexander Lobakin (4):
  selftests/bpf: robustify test_xdp_do_redirect with more payload magics
  net: page_pool, skbuff: make skb_mark_for_recycle() always available
  xdp: recycle Page Pool backed skbs built from XDP frames
  xdp: remove unused {__,}xdp_release_frame()

 include/linux/skbuff.h                        |  4 +--
 include/net/xdp.h                             | 29 ---------------
 net/core/xdp.c                                | 19 ++--------
 .../bpf/progs/test_xdp_do_redirect.c          | 36 +++++++++++++------
 4 files changed, 30 insertions(+), 58 deletions(-)

---
From v2[1]:
* fix the test_xdp_do_redirect selftest failing after the series: it was
  relying on that %XDP_PASS frames can't be recycled on veth
  (BPF CI, Alexei);
* explain "w/o leaving its node" in the cover letter (Jesper).

From v1[2]:
* make skb_mark_for_recycle() always available, otherwise there are build
  failures on non-PP systems (kbuild bot);
* 'Page Pool' -> 'page_pool' when it's about a page_pool instance, not
  API (Jesper);
* expanded test system info a bit in the cover letter (Jesper).

[1] https://lore.kernel.org/bpf/20230303133232.2546004-1-aleksander.lobakin@intel.com
[2] https://lore.kernel.org/bpf/20230301160315.1022488-1-aleksander.lobakin@intel.com
-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-03-16 13:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13 19:08 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
2023-03-13 19:08 ` [PATCH bpf-next v3 1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics Alexander Lobakin
2023-03-13 19:08 ` [PATCH bpf-next v3 2/4] net: page_pool, skbuff: make skb_mark_for_recycle() always available Alexander Lobakin
2023-03-13 19:08 ` [PATCH bpf-next v3 3/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
2023-03-13 19:08 ` [PATCH bpf-next v3 4/4] xdp: remove unused {__,}xdp_release_frame() Alexander Lobakin
2023-03-13 21:42 [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Alexander Lobakin
2023-03-16 11:57 ` Alexander Lobakin
2023-03-13 21:55 Alexander Lobakin
2023-03-14 11:57 ` Alexander Lobakin
2023-03-14 18:52   ` Alexei Starovoitov
2023-03-14 23:54     ` Alexei Starovoitov
2023-03-15  9:56       ` Alexander Lobakin
2023-03-15 10:54         ` Alexander Lobakin
2023-03-15 14:54           ` Ilya Leoshkevich
2023-03-15 18:00             ` Ilya Leoshkevich
2023-03-15 18:12               ` Alexander Lobakin
2023-03-15 18:26                 ` Ilya Leoshkevich
2023-03-16 13:22                   ` Alexander Lobakin
2023-03-15 16:55           ` Alexei Starovoitov
2023-03-14 22:30 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.