All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing
@ 2021-02-10 16:28 Alexander Lobakin
  2021-02-10 16:28 ` [PATCH v4 net-next 01/11] skbuff: move __alloc_skb() next to the other skb allocation functions Alexander Lobakin
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

Currently, all sorts of skb allocation always do allocate
skbuff_heads one by one via kmem_cache_alloc().
On the other hand, we have percpu napi_alloc_cache to store
skbuff_heads queued up for freeing and flush them by bulks.

We can use this cache not only for bulk-wiping, but also to obtain
heads for new skbs and avoid unconditional allocations, as well as
for bulk-allocating (like XDP's cpumap code and veth driver already
do).

As this might affect latencies, cache pressure and lots of hardware
and driver-dependent stuff, this new feature is mostly optional and
can be issued via:
 - a new napi_build_skb() function (as a replacement for build_skb());
 - existing {,__}napi_alloc_skb() and napi_get_frags() functions;
 - __alloc_skb() with passing SKB_ALLOC_NAPI in flags.

iperf3 showed 35-70 Mbps bumps for both TCP and UDP while performing
VLAN NAT on 1.2 GHz MIPS board. The boost is likely to be bigger
on more powerful hosts and NICs with tens of Mpps.

Note on skbuff_heads from distant slabs or pfmemalloc'ed slabs:
 - kmalloc()/kmem_cache_alloc() itself allows by default allocating
   memory from the remote nodes to defragment their slabs. This is
   controlled by sysctl, but according to this, skbuff_head from a
   remote node is an OK case;
 - The easiest way to check if the slab of skbuff_head is remote or
   pfmemalloc'ed is:

	if (!dev_page_is_reusable(virt_to_head_page(skb)))
		/* drop it */;

   ...*but*, regarding that most slabs are built of compound pages,
   virt_to_head_page() will hit unlikely-branch every single call.
   This check costed at least 20 Mbps in test scenarios and seems
   like it'd be better to _not_ do this.

Since v3 [2]:
 - make the feature mostly optional, so driver developers could
   decide whether to use it or not (Paolo Abeni).
   This reuses the old flag for __alloc_skb() and introduces
   a new napi_build_skb();
 - reduce bulk-allocation size from 32 to 16 elements (also Paolo).
   This equals to the value of XDP's devmap and veth batch processing
   (which were tested a lot) and should be sane enough;
 - don't waste cycles on explicit in_serving_softirq() check.

Since v2 [1]:
 - also cover {,__}alloc_skb() and {,__}build_skb() cases (became handy
   after the changes that pass tiny skbs requests to kmalloc layer);
 - cover the cache with KASAN instrumentation (suggested by Eric
   Dumazet, help of Dmitry Vyukov);
 - completely drop redundant __kfree_skb_flush() (also Eric);
 - lots of code cleanups;
 - expand the commit message with NUMA and pfmemalloc points (Jakub).

Since v1 [0]:
 - use one unified cache instead of two separate to greatly simplify
   the logics and reduce hotpath overhead (Edward Cree);
 - new: recycle also GRO_MERGED_FREE skbs instead of immediate
   freeing;
 - correct performance numbers after optimizations and performing
   lots of tests for different use cases.

[0] https://lore.kernel.org/netdev/20210111182655.12159-1-alobakin@pm.me
[1] https://lore.kernel.org/netdev/20210113133523.39205-1-alobakin@pm.me
[2] https://lore.kernel.org/netdev/20210209204533.327360-1-alobakin@pm.me

Alexander Lobakin (11):
  skbuff: move __alloc_skb() next to the other skb allocation functions
  skbuff: simplify kmalloc_reserve()
  skbuff: make __build_skb_around() return void
  skbuff: simplify __alloc_skb() a bit
  skbuff: use __build_skb_around() in __alloc_skb()
  skbuff: remove __kfree_skb_flush()
  skbuff: move NAPI cache declarations upper in the file
  skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads
  skbuff: allow to optionally use NAPI cache from __alloc_skb()
  skbuff: allow to use NAPI cache from __napi_alloc_skb()
  skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing

 include/linux/skbuff.h |   4 +-
 net/core/dev.c         |  15 +-
 net/core/skbuff.c      | 429 +++++++++++++++++++++++------------------
 3 files changed, 243 insertions(+), 205 deletions(-)

-- 
2.30.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 01/11] skbuff: move __alloc_skb() next to the other skb allocation functions
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
@ 2021-02-10 16:28 ` Alexander Lobakin
  2021-02-10 16:28 ` [PATCH v4 net-next 02/11] skbuff: simplify kmalloc_reserve() Alexander Lobakin
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

In preparation before reusing several functions in all three skb
allocation variants, move __alloc_skb() next to the
__netdev_alloc_skb() and __napi_alloc_skb().
No functional changes.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 284 +++++++++++++++++++++++-----------------------
 1 file changed, 142 insertions(+), 142 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d380c7b5a12d..a0f846872d19 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -119,148 +119,6 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
 	skb_panic(skb, sz, addr, __func__);
 }
 
-/*
- * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
- * the caller if emergency pfmemalloc reserves are being used. If it is and
- * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
- * may be used. Otherwise, the packet data may be discarded until enough
- * memory is free
- */
-#define kmalloc_reserve(size, gfp, node, pfmemalloc) \
-	 __kmalloc_reserve(size, gfp, node, _RET_IP_, pfmemalloc)
-
-static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
-			       unsigned long ip, bool *pfmemalloc)
-{
-	void *obj;
-	bool ret_pfmemalloc = false;
-
-	/*
-	 * Try a regular allocation, when that fails and we're not entitled
-	 * to the reserves, fail.
-	 */
-	obj = kmalloc_node_track_caller(size,
-					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
-					node);
-	if (obj || !(gfp_pfmemalloc_allowed(flags)))
-		goto out;
-
-	/* Try again but now we are using pfmemalloc reserves */
-	ret_pfmemalloc = true;
-	obj = kmalloc_node_track_caller(size, flags, node);
-
-out:
-	if (pfmemalloc)
-		*pfmemalloc = ret_pfmemalloc;
-
-	return obj;
-}
-
-/* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
- *	'private' fields and also do memory statistics to find all the
- *	[BEEP] leaks.
- *
- */
-
-/**
- *	__alloc_skb	-	allocate a network buffer
- *	@size: size to allocate
- *	@gfp_mask: allocation mask
- *	@flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache
- *		instead of head cache and allocate a cloned (child) skb.
- *		If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
- *		allocations in case the data is required for writeback
- *	@node: numa node to allocate memory on
- *
- *	Allocate a new &sk_buff. The returned buffer has no headroom and a
- *	tail room of at least size bytes. The object has a reference count
- *	of one. The return is the buffer. On a failure the return is %NULL.
- *
- *	Buffers may only be allocated from interrupts using a @gfp_mask of
- *	%GFP_ATOMIC.
- */
-struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
-			    int flags, int node)
-{
-	struct kmem_cache *cache;
-	struct skb_shared_info *shinfo;
-	struct sk_buff *skb;
-	u8 *data;
-	bool pfmemalloc;
-
-	cache = (flags & SKB_ALLOC_FCLONE)
-		? skbuff_fclone_cache : skbuff_head_cache;
-
-	if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
-		gfp_mask |= __GFP_MEMALLOC;
-
-	/* Get the HEAD */
-	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
-	if (!skb)
-		goto out;
-	prefetchw(skb);
-
-	/* We do our best to align skb_shared_info on a separate cache
-	 * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
-	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
-	 * Both skb->head and skb_shared_info are cache line aligned.
-	 */
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
-	if (!data)
-		goto nodata;
-	/* kmalloc(size) might give us more room than requested.
-	 * Put skb_shared_info exactly at the end of allocated zone,
-	 * to allow max possible filling before reallocation.
-	 */
-	size = SKB_WITH_OVERHEAD(ksize(data));
-	prefetchw(data + size);
-
-	/*
-	 * Only clear those fields we need to clear, not those that we will
-	 * actually initialise below. Hence, don't put any more fields after
-	 * the tail pointer in struct sk_buff!
-	 */
-	memset(skb, 0, offsetof(struct sk_buff, tail));
-	/* Account for allocated memory : skb + skb->head */
-	skb->truesize = SKB_TRUESIZE(size);
-	skb->pfmemalloc = pfmemalloc;
-	refcount_set(&skb->users, 1);
-	skb->head = data;
-	skb->data = data;
-	skb_reset_tail_pointer(skb);
-	skb->end = skb->tail + size;
-	skb->mac_header = (typeof(skb->mac_header))~0U;
-	skb->transport_header = (typeof(skb->transport_header))~0U;
-
-	/* make sure we initialize shinfo sequentially */
-	shinfo = skb_shinfo(skb);
-	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
-	atomic_set(&shinfo->dataref, 1);
-
-	if (flags & SKB_ALLOC_FCLONE) {
-		struct sk_buff_fclones *fclones;
-
-		fclones = container_of(skb, struct sk_buff_fclones, skb1);
-
-		skb->fclone = SKB_FCLONE_ORIG;
-		refcount_set(&fclones->fclone_ref, 1);
-
-		fclones->skb2.fclone = SKB_FCLONE_CLONE;
-	}
-
-	skb_set_kcov_handle(skb, kcov_common_handle());
-
-out:
-	return skb;
-nodata:
-	kmem_cache_free(cache, skb);
-	skb = NULL;
-	goto out;
-}
-EXPORT_SYMBOL(__alloc_skb);
-
 /* Caller must provide SKB that is memset cleared */
 static struct sk_buff *__build_skb_around(struct sk_buff *skb,
 					  void *data, unsigned int frag_size)
@@ -408,6 +266,148 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
 }
 EXPORT_SYMBOL(__netdev_alloc_frag_align);
 
+/*
+ * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
+ * the caller if emergency pfmemalloc reserves are being used. If it is and
+ * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
+ * may be used. Otherwise, the packet data may be discarded until enough
+ * memory is free
+ */
+#define kmalloc_reserve(size, gfp, node, pfmemalloc) \
+	 __kmalloc_reserve(size, gfp, node, _RET_IP_, pfmemalloc)
+
+static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
+			       unsigned long ip, bool *pfmemalloc)
+{
+	void *obj;
+	bool ret_pfmemalloc = false;
+
+	/*
+	 * Try a regular allocation, when that fails and we're not entitled
+	 * to the reserves, fail.
+	 */
+	obj = kmalloc_node_track_caller(size,
+					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
+					node);
+	if (obj || !(gfp_pfmemalloc_allowed(flags)))
+		goto out;
+
+	/* Try again but now we are using pfmemalloc reserves */
+	ret_pfmemalloc = true;
+	obj = kmalloc_node_track_caller(size, flags, node);
+
+out:
+	if (pfmemalloc)
+		*pfmemalloc = ret_pfmemalloc;
+
+	return obj;
+}
+
+/* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
+ *	'private' fields and also do memory statistics to find all the
+ *	[BEEP] leaks.
+ *
+ */
+
+/**
+ *	__alloc_skb	-	allocate a network buffer
+ *	@size: size to allocate
+ *	@gfp_mask: allocation mask
+ *	@flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache
+ *		instead of head cache and allocate a cloned (child) skb.
+ *		If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
+ *		allocations in case the data is required for writeback
+ *	@node: numa node to allocate memory on
+ *
+ *	Allocate a new &sk_buff. The returned buffer has no headroom and a
+ *	tail room of at least size bytes. The object has a reference count
+ *	of one. The return is the buffer. On a failure the return is %NULL.
+ *
+ *	Buffers may only be allocated from interrupts using a @gfp_mask of
+ *	%GFP_ATOMIC.
+ */
+struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
+			    int flags, int node)
+{
+	struct kmem_cache *cache;
+	struct skb_shared_info *shinfo;
+	struct sk_buff *skb;
+	u8 *data;
+	bool pfmemalloc;
+
+	cache = (flags & SKB_ALLOC_FCLONE)
+		? skbuff_fclone_cache : skbuff_head_cache;
+
+	if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
+		gfp_mask |= __GFP_MEMALLOC;
+
+	/* Get the HEAD */
+	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
+	if (!skb)
+		goto out;
+	prefetchw(skb);
+
+	/* We do our best to align skb_shared_info on a separate cache
+	 * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
+	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
+	 * Both skb->head and skb_shared_info are cache line aligned.
+	 */
+	size = SKB_DATA_ALIGN(size);
+	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
+	if (!data)
+		goto nodata;
+	/* kmalloc(size) might give us more room than requested.
+	 * Put skb_shared_info exactly at the end of allocated zone,
+	 * to allow max possible filling before reallocation.
+	 */
+	size = SKB_WITH_OVERHEAD(ksize(data));
+	prefetchw(data + size);
+
+	/*
+	 * Only clear those fields we need to clear, not those that we will
+	 * actually initialise below. Hence, don't put any more fields after
+	 * the tail pointer in struct sk_buff!
+	 */
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	/* Account for allocated memory : skb + skb->head */
+	skb->truesize = SKB_TRUESIZE(size);
+	skb->pfmemalloc = pfmemalloc;
+	refcount_set(&skb->users, 1);
+	skb->head = data;
+	skb->data = data;
+	skb_reset_tail_pointer(skb);
+	skb->end = skb->tail + size;
+	skb->mac_header = (typeof(skb->mac_header))~0U;
+	skb->transport_header = (typeof(skb->transport_header))~0U;
+
+	/* make sure we initialize shinfo sequentially */
+	shinfo = skb_shinfo(skb);
+	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
+	atomic_set(&shinfo->dataref, 1);
+
+	if (flags & SKB_ALLOC_FCLONE) {
+		struct sk_buff_fclones *fclones;
+
+		fclones = container_of(skb, struct sk_buff_fclones, skb1);
+
+		skb->fclone = SKB_FCLONE_ORIG;
+		refcount_set(&fclones->fclone_ref, 1);
+
+		fclones->skb2.fclone = SKB_FCLONE_CLONE;
+	}
+
+	skb_set_kcov_handle(skb, kcov_common_handle());
+
+out:
+	return skb;
+nodata:
+	kmem_cache_free(cache, skb);
+	skb = NULL;
+	goto out;
+}
+EXPORT_SYMBOL(__alloc_skb);
+
 /**
  *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
  *	@dev: network device to receive on
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 02/11] skbuff: simplify kmalloc_reserve()
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
  2021-02-10 16:28 ` [PATCH v4 net-next 01/11] skbuff: move __alloc_skb() next to the other skb allocation functions Alexander Lobakin
@ 2021-02-10 16:28 ` Alexander Lobakin
  2021-02-10 16:29 ` [PATCH v4 net-next 03/11] skbuff: make __build_skb_around() return void Alexander Lobakin
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

Eversince the introduction of __kmalloc_reserve(), "ip" argument
hasn't been used. _RET_IP_ is embedded inside
kmalloc_node_track_caller().
Remove the redundant macro and rename the function after it.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a0f846872d19..70289f22a6f4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -273,11 +273,8 @@ EXPORT_SYMBOL(__netdev_alloc_frag_align);
  * may be used. Otherwise, the packet data may be discarded until enough
  * memory is free
  */
-#define kmalloc_reserve(size, gfp, node, pfmemalloc) \
-	 __kmalloc_reserve(size, gfp, node, _RET_IP_, pfmemalloc)
-
-static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
-			       unsigned long ip, bool *pfmemalloc)
+static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+			     bool *pfmemalloc)
 {
 	void *obj;
 	bool ret_pfmemalloc = false;
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 03/11] skbuff: make __build_skb_around() return void
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
  2021-02-10 16:28 ` [PATCH v4 net-next 01/11] skbuff: move __alloc_skb() next to the other skb allocation functions Alexander Lobakin
  2021-02-10 16:28 ` [PATCH v4 net-next 02/11] skbuff: simplify kmalloc_reserve() Alexander Lobakin
@ 2021-02-10 16:29 ` Alexander Lobakin
  2021-02-10 16:29 ` [PATCH v4 net-next 04/11] skbuff: simplify __alloc_skb() a bit Alexander Lobakin
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

__build_skb_around() can never fail and always returns passed skb.
Make it return void to simplify and optimize the code.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 70289f22a6f4..c7d184e11547 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -120,8 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
 }
 
 /* Caller must provide SKB that is memset cleared */
-static struct sk_buff *__build_skb_around(struct sk_buff *skb,
-					  void *data, unsigned int frag_size)
+static void __build_skb_around(struct sk_buff *skb, void *data,
+			       unsigned int frag_size)
 {
 	struct skb_shared_info *shinfo;
 	unsigned int size = frag_size ? : ksize(data);
@@ -144,8 +144,6 @@ static struct sk_buff *__build_skb_around(struct sk_buff *skb,
 	atomic_set(&shinfo->dataref, 1);
 
 	skb_set_kcov_handle(skb, kcov_common_handle());
-
-	return skb;
 }
 
 /**
@@ -176,8 +174,9 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
 		return NULL;
 
 	memset(skb, 0, offsetof(struct sk_buff, tail));
+	__build_skb_around(skb, data, frag_size);
 
-	return __build_skb_around(skb, data, frag_size);
+	return skb;
 }
 
 /* build_skb() is wrapper over __build_skb(), that specifically
@@ -210,9 +209,9 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
 	if (unlikely(!skb))
 		return NULL;
 
-	skb = __build_skb_around(skb, data, frag_size);
+	__build_skb_around(skb, data, frag_size);
 
-	if (skb && frag_size) {
+	if (frag_size) {
 		skb->head_frag = 1;
 		if (page_is_pfmemalloc(virt_to_head_page(data)))
 			skb->pfmemalloc = 1;
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 04/11] skbuff: simplify __alloc_skb() a bit
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (2 preceding siblings ...)
  2021-02-10 16:29 ` [PATCH v4 net-next 03/11] skbuff: make __build_skb_around() return void Alexander Lobakin
@ 2021-02-10 16:29 ` Alexander Lobakin
  2021-02-10 16:29 ` [PATCH v4 net-next 05/11] skbuff: use __build_skb_around() in __alloc_skb() Alexander Lobakin
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

Use unlikely() annotations for skbuff_head and data similarly to the
two other allocation functions and remove totally redundant goto.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c7d184e11547..88566de26cd1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -339,8 +339,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 	/* Get the HEAD */
 	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
-	if (!skb)
-		goto out;
+	if (unlikely(!skb))
+		return NULL;
 	prefetchw(skb);
 
 	/* We do our best to align skb_shared_info on a separate cache
@@ -351,7 +351,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	size = SKB_DATA_ALIGN(size);
 	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
-	if (!data)
+	if (unlikely(!data))
 		goto nodata;
 	/* kmalloc(size) might give us more room than requested.
 	 * Put skb_shared_info exactly at the end of allocated zone,
@@ -395,12 +395,11 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 	skb_set_kcov_handle(skb, kcov_common_handle());
 
-out:
 	return skb;
+
 nodata:
 	kmem_cache_free(cache, skb);
-	skb = NULL;
-	goto out;
+	return NULL;
 }
 EXPORT_SYMBOL(__alloc_skb);
 
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 05/11] skbuff: use __build_skb_around() in __alloc_skb()
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (3 preceding siblings ...)
  2021-02-10 16:29 ` [PATCH v4 net-next 04/11] skbuff: simplify __alloc_skb() a bit Alexander Lobakin
@ 2021-02-10 16:29 ` Alexander Lobakin
  2021-02-10 16:29 ` [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush() Alexander Lobakin
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

Just call __build_skb_around() instead of open-coding it.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 88566de26cd1..1c6f6ef70339 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -326,7 +326,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 			    int flags, int node)
 {
 	struct kmem_cache *cache;
-	struct skb_shared_info *shinfo;
 	struct sk_buff *skb;
 	u8 *data;
 	bool pfmemalloc;
@@ -366,21 +365,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * the tail pointer in struct sk_buff!
 	 */
 	memset(skb, 0, offsetof(struct sk_buff, tail));
-	/* Account for allocated memory : skb + skb->head */
-	skb->truesize = SKB_TRUESIZE(size);
+	__build_skb_around(skb, data, 0);
 	skb->pfmemalloc = pfmemalloc;
-	refcount_set(&skb->users, 1);
-	skb->head = data;
-	skb->data = data;
-	skb_reset_tail_pointer(skb);
-	skb->end = skb->tail + size;
-	skb->mac_header = (typeof(skb->mac_header))~0U;
-	skb->transport_header = (typeof(skb->transport_header))~0U;
-
-	/* make sure we initialize shinfo sequentially */
-	shinfo = skb_shinfo(skb);
-	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
-	atomic_set(&shinfo->dataref, 1);
 
 	if (flags & SKB_ALLOC_FCLONE) {
 		struct sk_buff_fclones *fclones;
@@ -393,8 +379,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 		fclones->skb2.fclone = SKB_FCLONE_CLONE;
 	}
 
-	skb_set_kcov_handle(skb, kcov_common_handle());
-
 	return skb;
 
 nodata:
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush()
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (4 preceding siblings ...)
  2021-02-10 16:29 ` [PATCH v4 net-next 05/11] skbuff: use __build_skb_around() in __alloc_skb() Alexander Lobakin
@ 2021-02-10 16:29 ` Alexander Lobakin
  2021-02-10 20:22     ` kernel test robot
  2021-02-11  0:19     ` kernel test robot
  2021-02-10 16:30 ` [PATCH v4 net-next 07/11] skbuff: move NAPI cache declarations upper in the file Alexander Lobakin
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

This function isn't much needed as NAPI skb queue gets bulk-freed
anyway when there's no more room, and even may reduce the efficiency
of bulk operations.
It will be even less needed after reusing skb cache on allocation path,
so remove it and this way lighten network softirqs a bit.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 include/linux/skbuff.h |  1 -
 net/core/dev.c         |  6 +-----
 net/core/skbuff.c      | 12 ------------
 3 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0a4e91a2f873..0e0707296098 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2919,7 +2919,6 @@ static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,
 }
 void napi_consume_skb(struct sk_buff *skb, int budget);
 
-void __kfree_skb_flush(void);
 void __kfree_skb_defer(struct sk_buff *skb);
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index 7647278e46f0..7134ae2fc0db 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4944,8 +4944,6 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
 			else
 				__kfree_skb_defer(skb);
 		}
-
-		__kfree_skb_flush();
 	}
 
 	if (sd->output_queue) {
@@ -7041,7 +7039,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 
 		if (list_empty(&list)) {
 			if (!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll))
-				goto out;
+				return;
 			break;
 		}
 
@@ -7068,8 +7066,6 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 
 	net_rps_action_and_irq_enable(sd);
-out:
-	__kfree_skb_flush();
 }
 
 struct netdev_adjacent {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1c6f6ef70339..4be2bb969535 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -838,18 +838,6 @@ void __consume_stateless_skb(struct sk_buff *skb)
 	kfree_skbmem(skb);
 }
 
-void __kfree_skb_flush(void)
-{
-	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
-
-	/* flush skb_cache if containing objects */
-	if (nc->skb_count) {
-		kmem_cache_free_bulk(skbuff_head_cache, nc->skb_count,
-				     nc->skb_cache);
-		nc->skb_count = 0;
-	}
-}
-
 static inline void _kfree_skb_defer(struct sk_buff *skb)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 07/11] skbuff: move NAPI cache declarations upper in the file
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (5 preceding siblings ...)
  2021-02-10 16:29 ` [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush() Alexander Lobakin
@ 2021-02-10 16:30 ` Alexander Lobakin
  2021-02-10 16:30 ` [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Alexander Lobakin
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

NAPI cache structures will be used for allocating skbuff_heads,
so move their declarations a bit upper.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 90 +++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4be2bb969535..860a9d4f752f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -119,6 +119,51 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
 	skb_panic(skb, sz, addr, __func__);
 }
 
+#define NAPI_SKB_CACHE_SIZE	64
+
+struct napi_alloc_cache {
+	struct page_frag_cache page;
+	unsigned int skb_count;
+	void *skb_cache[NAPI_SKB_CACHE_SIZE];
+};
+
+static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
+static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
+
+static void *__alloc_frag_align(unsigned int fragsz, gfp_t gfp_mask,
+				unsigned int align_mask)
+{
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+	return page_frag_alloc_align(&nc->page, fragsz, gfp_mask, align_mask);
+}
+
+void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
+{
+	fragsz = SKB_DATA_ALIGN(fragsz);
+
+	return __alloc_frag_align(fragsz, GFP_ATOMIC, align_mask);
+}
+EXPORT_SYMBOL(__napi_alloc_frag_align);
+
+void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
+{
+	struct page_frag_cache *nc;
+	void *data;
+
+	fragsz = SKB_DATA_ALIGN(fragsz);
+	if (in_irq() || irqs_disabled()) {
+		nc = this_cpu_ptr(&netdev_alloc_cache);
+		data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask);
+	} else {
+		local_bh_disable();
+		data = __alloc_frag_align(fragsz, GFP_ATOMIC, align_mask);
+		local_bh_enable();
+	}
+	return data;
+}
+EXPORT_SYMBOL(__netdev_alloc_frag_align);
+
 /* Caller must provide SKB that is memset cleared */
 static void __build_skb_around(struct sk_buff *skb, void *data,
 			       unsigned int frag_size)
@@ -220,51 +265,6 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(build_skb_around);
 
-#define NAPI_SKB_CACHE_SIZE	64
-
-struct napi_alloc_cache {
-	struct page_frag_cache page;
-	unsigned int skb_count;
-	void *skb_cache[NAPI_SKB_CACHE_SIZE];
-};
-
-static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
-static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
-
-static void *__alloc_frag_align(unsigned int fragsz, gfp_t gfp_mask,
-				unsigned int align_mask)
-{
-	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
-
-	return page_frag_alloc_align(&nc->page, fragsz, gfp_mask, align_mask);
-}
-
-void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
-{
-	fragsz = SKB_DATA_ALIGN(fragsz);
-
-	return __alloc_frag_align(fragsz, GFP_ATOMIC, align_mask);
-}
-EXPORT_SYMBOL(__napi_alloc_frag_align);
-
-void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
-{
-	struct page_frag_cache *nc;
-	void *data;
-
-	fragsz = SKB_DATA_ALIGN(fragsz);
-	if (in_irq() || irqs_disabled()) {
-		nc = this_cpu_ptr(&netdev_alloc_cache);
-		data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask);
-	} else {
-		local_bh_disable();
-		data = __alloc_frag_align(fragsz, GFP_ATOMIC, align_mask);
-		local_bh_enable();
-	}
-	return data;
-}
-EXPORT_SYMBOL(__netdev_alloc_frag_align);
-
 /*
  * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
  * the caller if emergency pfmemalloc reserves are being used. If it is and
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (6 preceding siblings ...)
  2021-02-10 16:30 ` [PATCH v4 net-next 07/11] skbuff: move NAPI cache declarations upper in the file Alexander Lobakin
@ 2021-02-10 16:30 ` Alexander Lobakin
  2021-02-11 12:54   ` Jesper Dangaard Brouer
  2021-02-10 16:30 ` [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb() Alexander Lobakin
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

Instead of just bulk-flushing skbuff_heads queued up through
napi_consume_skb() or __kfree_skb_defer(), try to reuse them
on allocation path.
If the cache is empty on allocation, bulk-allocate the first
16 elements, which is more efficient than per-skb allocation.
If the cache is full on freeing, bulk-wipe the second half of
the cache (32 elements).
This also includes custom KASAN poisoning/unpoisoning to be
double sure there are no use-after-free cases.

To not change current behaviour, introduce a new function,
napi_build_skb(), to optionally use a new approach later
in drivers.

Note on selected bulk size, 16:
 - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
   and especially VETH_XDP_BATCH, which is also used to
   bulk-allocate skbuff_heads and was tested on powerful
   setups;
 - this also showed the best performance in the actual
   test series (from the array of {8, 16, 32}).

Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 include/linux/skbuff.h |  2 +
 net/core/skbuff.c      | 94 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0e0707296098..906122eac82a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1087,6 +1087,8 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size);
 struct sk_buff *build_skb_around(struct sk_buff *skb,
 				 void *data, unsigned int frag_size);
 
+struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
+
 /**
  * alloc_skb - allocate a network buffer
  * @size: size to allocate
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 860a9d4f752f..9e1a8ded4acc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -120,6 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
 }
 
 #define NAPI_SKB_CACHE_SIZE	64
+#define NAPI_SKB_CACHE_BULK	16
+#define NAPI_SKB_CACHE_HALF	(NAPI_SKB_CACHE_SIZE / 2)
 
 struct napi_alloc_cache {
 	struct page_frag_cache page;
@@ -164,6 +166,25 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
 }
 EXPORT_SYMBOL(__netdev_alloc_frag_align);
 
+static struct sk_buff *napi_skb_cache_get(void)
+{
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct sk_buff *skb;
+
+	if (unlikely(!nc->skb_count))
+		nc->skb_count = kmem_cache_alloc_bulk(skbuff_head_cache,
+						      GFP_ATOMIC,
+						      NAPI_SKB_CACHE_BULK,
+						      nc->skb_cache);
+	if (unlikely(!nc->skb_count))
+		return NULL;
+
+	skb = nc->skb_cache[--nc->skb_count];
+	kasan_unpoison_object_data(skbuff_head_cache, skb);
+
+	return skb;
+}
+
 /* Caller must provide SKB that is memset cleared */
 static void __build_skb_around(struct sk_buff *skb, void *data,
 			       unsigned int frag_size)
@@ -265,6 +286,53 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(build_skb_around);
 
+/**
+ * __napi_build_skb - build a network buffer
+ * @data: data buffer provided by caller
+ * @frag_size: size of data, or 0 if head was kmalloced
+ *
+ * Version of __build_skb() that uses NAPI percpu caches to obtain
+ * skbuff_head instead of inplace allocation.
+ *
+ * Returns a new &sk_buff on success, %NULL on allocation failure.
+ */
+static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size)
+{
+	struct sk_buff *skb;
+
+	skb = napi_skb_cache_get();
+	if (unlikely(!skb))
+		return NULL;
+
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	__build_skb_around(skb, data, frag_size);
+
+	return skb;
+}
+
+/**
+ * napi_build_skb - build a network buffer
+ * @data: data buffer provided by caller
+ * @frag_size: size of data, or 0 if head was kmalloced
+ *
+ * Version of __napi_build_skb() that takes care of skb->head_frag
+ * and skb->pfmemalloc when the data is a page or page fragment.
+ *
+ * Returns a new &sk_buff on success, %NULL on allocation failure.
+ */
+struct sk_buff *napi_build_skb(void *data, unsigned int frag_size)
+{
+	struct sk_buff *skb = __napi_build_skb(data, frag_size);
+
+	if (likely(skb) && frag_size) {
+		skb->head_frag = 1;
+		skb_propagate_pfmemalloc(virt_to_head_page(data), skb);
+	}
+
+	return skb;
+}
+EXPORT_SYMBOL(napi_build_skb);
+
 /*
  * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
  * the caller if emergency pfmemalloc reserves are being used. If it is and
@@ -838,31 +906,31 @@ void __consume_stateless_skb(struct sk_buff *skb)
 	kfree_skbmem(skb);
 }
 
-static inline void _kfree_skb_defer(struct sk_buff *skb)
+static void napi_skb_cache_put(struct sk_buff *skb)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	u32 i;
 
 	/* drop skb->head and call any destructors for packet */
 	skb_release_all(skb);
 
-	/* record skb to CPU local list */
+	kasan_poison_object_data(skbuff_head_cache, skb);
 	nc->skb_cache[nc->skb_count++] = skb;
 
-#ifdef CONFIG_SLUB
-	/* SLUB writes into objects when freeing */
-	prefetchw(skb);
-#endif
-
-	/* flush skb_cache if it is filled */
 	if (unlikely(nc->skb_count == NAPI_SKB_CACHE_SIZE)) {
-		kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_SIZE,
-				     nc->skb_cache);
-		nc->skb_count = 0;
+		for (i = NAPI_SKB_CACHE_HALF; i < NAPI_SKB_CACHE_SIZE; i++)
+			kasan_unpoison_object_data(skbuff_head_cache,
+						   nc->skb_cache[i]);
+
+		kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_HALF,
+				     nc->skb_cache + NAPI_SKB_CACHE_HALF);
+		nc->skb_count = NAPI_SKB_CACHE_HALF;
 	}
 }
+
 void __kfree_skb_defer(struct sk_buff *skb)
 {
-	_kfree_skb_defer(skb);
+	napi_skb_cache_put(skb);
 }
 
 void napi_consume_skb(struct sk_buff *skb, int budget)
@@ -887,7 +955,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 		return;
 	}
 
-	_kfree_skb_defer(skb);
+	napi_skb_cache_put(skb);
 }
 EXPORT_SYMBOL(napi_consume_skb);
 
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (7 preceding siblings ...)
  2021-02-10 16:30 ` [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Alexander Lobakin
@ 2021-02-10 16:30 ` Alexander Lobakin
  2021-02-11 10:16   ` Paolo Abeni
  2021-02-10 16:30 ` [PATCH v4 net-next 10/11] skbuff: allow to use NAPI cache from __napi_alloc_skb() Alexander Lobakin
  2021-02-10 16:31 ` [PATCH v4 net-next 11/11] skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing Alexander Lobakin
  10 siblings, 1 reply; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get
an skbuff_head from the NAPI cache instead of inplace allocation
inside __alloc_skb().
This implies that the function is called from softirq or BH-off
context, not for allocating a clone or from a distant node.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9e1a8ded4acc..750fa1825b28 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -397,15 +397,20 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	struct sk_buff *skb;
 	u8 *data;
 	bool pfmemalloc;
+	bool clone;
 
-	cache = (flags & SKB_ALLOC_FCLONE)
-		? skbuff_fclone_cache : skbuff_head_cache;
+	clone = !!(flags & SKB_ALLOC_FCLONE);
+	cache = clone ? skbuff_fclone_cache : skbuff_head_cache;
 
 	if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
 		gfp_mask |= __GFP_MEMALLOC;
 
 	/* Get the HEAD */
-	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
+	if (!clone && (flags & SKB_ALLOC_NAPI) &&
+	    likely(node == NUMA_NO_NODE || node == numa_mem_id()))
+		skb = napi_skb_cache_get();
+	else
+		skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node);
 	if (unlikely(!skb))
 		return NULL;
 	prefetchw(skb);
@@ -436,7 +441,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	__build_skb_around(skb, data, 0);
 	skb->pfmemalloc = pfmemalloc;
 
-	if (flags & SKB_ALLOC_FCLONE) {
+	if (clone) {
 		struct sk_buff_fclones *fclones;
 
 		fclones = container_of(skb, struct sk_buff_fclones, skb1);
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 10/11] skbuff: allow to use NAPI cache from __napi_alloc_skb()
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (8 preceding siblings ...)
  2021-02-10 16:30 ` [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb() Alexander Lobakin
@ 2021-02-10 16:30 ` Alexander Lobakin
  2021-02-10 16:31 ` [PATCH v4 net-next 11/11] skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing Alexander Lobakin
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

{,__}napi_alloc_skb() is mostly used either for optional non-linear
receive methods (usually controlled via Ethtool private flags and off
by default) and/or for Rx copybreaks.
Use __napi_build_skb() here for obtaining skbuff_heads from NAPI cache
instead of inplace allocations. This includes both kmalloc and page
frag paths.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 750fa1825b28..ac6e0172f206 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -563,7 +563,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	if (len <= SKB_WITH_OVERHEAD(1024) ||
 	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
-		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
+		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
+				  NUMA_NO_NODE);
 		if (!skb)
 			goto skb_fail;
 		goto skb_success;
@@ -580,7 +581,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	if (unlikely(!data))
 		return NULL;
 
-	skb = __build_skb(data, len);
+	skb = __napi_build_skb(data, len);
 	if (unlikely(!skb)) {
 		skb_free_frag(data);
 		return NULL;
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 net-next 11/11] skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing
  2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
                   ` (9 preceding siblings ...)
  2021-02-10 16:30 ` [PATCH v4 net-next 10/11] skbuff: allow to use NAPI cache from __napi_alloc_skb() Alexander Lobakin
@ 2021-02-10 16:31 ` Alexander Lobakin
  10 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-10 16:31 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Alexander Lobakin, Randy Dunlap, Kevin Hao, Pablo Neira Ayuso,
	Jakub Sitnicki, Marco Elver, Dexuan Cui, Paolo Abeni,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

napi_frags_finish() and napi_skb_finish() can only be called inside
NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
got NAPI_MERGED_FREE verdict instead of immediate freeing.
Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
to NAPI cache.
As many drivers call napi_alloc_skb()/napi_get_frags() on their
receive path, this becomes especially useful.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 include/linux/skbuff.h |  1 +
 net/core/dev.c         |  9 +--------
 net/core/skbuff.c      | 12 +++++++++---
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 906122eac82a..6d0a33d1c0db 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2921,6 +2921,7 @@ static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,
 }
 void napi_consume_skb(struct sk_buff *skb, int budget);
 
+void napi_skb_free_stolen_head(struct sk_buff *skb);
 void __kfree_skb_defer(struct sk_buff *skb);
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index 7134ae2fc0db..f04877295b4f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6094,13 +6094,6 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
 }
 EXPORT_SYMBOL(gro_find_complete_by_type);
 
-static void napi_skb_free_stolen_head(struct sk_buff *skb)
-{
-	skb_dst_drop(skb);
-	skb_ext_put(skb);
-	kmem_cache_free(skbuff_head_cache, skb);
-}
-
 static gro_result_t napi_skb_finish(struct napi_struct *napi,
 				    struct sk_buff *skb,
 				    gro_result_t ret)
@@ -6114,7 +6107,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi,
 		if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD)
 			napi_skb_free_stolen_head(skb);
 		else
-			__kfree_skb(skb);
+			__kfree_skb_defer(skb);
 		break;
 
 	case GRO_HELD:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ac6e0172f206..9ff701afa837 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -917,9 +917,6 @@ static void napi_skb_cache_put(struct sk_buff *skb)
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 	u32 i;
 
-	/* drop skb->head and call any destructors for packet */
-	skb_release_all(skb);
-
 	kasan_poison_object_data(skbuff_head_cache, skb);
 	nc->skb_cache[nc->skb_count++] = skb;
 
@@ -936,6 +933,14 @@ static void napi_skb_cache_put(struct sk_buff *skb)
 
 void __kfree_skb_defer(struct sk_buff *skb)
 {
+	skb_release_all(skb);
+	napi_skb_cache_put(skb);
+}
+
+void napi_skb_free_stolen_head(struct sk_buff *skb)
+{
+	skb_dst_drop(skb);
+	skb_ext_put(skb);
 	napi_skb_cache_put(skb);
 }
 
@@ -961,6 +966,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 		return;
 	}
 
+	skb_release_all(skb);
 	napi_skb_cache_put(skb);
 }
 EXPORT_SYMBOL(napi_consume_skb);
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush()
  2021-02-10 16:29 ` [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush() Alexander Lobakin
@ 2021-02-10 20:22     ` kernel test robot
  2021-02-11  0:19     ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-02-10 20:22 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: kbuild-all, netdev, Jonathan Lemon, Eric Dumazet, Dmitry Vyukov,
	Willem de Bruijn, Alexander Lobakin, Randy Dunlap, Kevin Hao

[-- Attachment #1: Type: text/plain, Size: 3692 bytes --]

Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git de1db4a6ed6241e34cab0e5059d4b56f6bae39b9
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6bb224307d9f31afa154a8825f9acbd8e9f6b490
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
        git checkout 6bb224307d9f31afa154a8825f9acbd8e9f6b490
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/core/dev.c: In function 'napi_threaded_poll':
>> net/core/dev.c:7012:4: error: implicit declaration of function '__kfree_skb_flush'; did you mean '__kfree_skb_defer'? [-Werror=implicit-function-declaration]
    7012 |    __kfree_skb_flush();
         |    ^~~~~~~~~~~~~~~~~
         |    __kfree_skb_defer
   cc1: some warnings being treated as errors

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SND_ATMEL_SOC_PDC
   Depends on SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC && HAS_DMA
   Selected by
   - SND_ATMEL_SOC_SSC && SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC
   - SND_ATMEL_SOC_SSC_PDC && SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC && ATMEL_SSC


vim +7012 net/core/dev.c

29863d41bb6e1d Wei Wang 2021-02-08  6996  
29863d41bb6e1d Wei Wang 2021-02-08  6997  static int napi_threaded_poll(void *data)
29863d41bb6e1d Wei Wang 2021-02-08  6998  {
29863d41bb6e1d Wei Wang 2021-02-08  6999  	struct napi_struct *napi = data;
29863d41bb6e1d Wei Wang 2021-02-08  7000  	void *have;
29863d41bb6e1d Wei Wang 2021-02-08  7001  
29863d41bb6e1d Wei Wang 2021-02-08  7002  	while (!napi_thread_wait(napi)) {
29863d41bb6e1d Wei Wang 2021-02-08  7003  		for (;;) {
29863d41bb6e1d Wei Wang 2021-02-08  7004  			bool repoll = false;
29863d41bb6e1d Wei Wang 2021-02-08  7005  
29863d41bb6e1d Wei Wang 2021-02-08  7006  			local_bh_disable();
29863d41bb6e1d Wei Wang 2021-02-08  7007  
29863d41bb6e1d Wei Wang 2021-02-08  7008  			have = netpoll_poll_lock(napi);
29863d41bb6e1d Wei Wang 2021-02-08  7009  			__napi_poll(napi, &repoll);
29863d41bb6e1d Wei Wang 2021-02-08  7010  			netpoll_poll_unlock(have);
29863d41bb6e1d Wei Wang 2021-02-08  7011  
29863d41bb6e1d Wei Wang 2021-02-08 @7012  			__kfree_skb_flush();
29863d41bb6e1d Wei Wang 2021-02-08  7013  			local_bh_enable();
29863d41bb6e1d Wei Wang 2021-02-08  7014  
29863d41bb6e1d Wei Wang 2021-02-08  7015  			if (!repoll)
29863d41bb6e1d Wei Wang 2021-02-08  7016  				break;
29863d41bb6e1d Wei Wang 2021-02-08  7017  
29863d41bb6e1d Wei Wang 2021-02-08  7018  			cond_resched();
29863d41bb6e1d Wei Wang 2021-02-08  7019  		}
29863d41bb6e1d Wei Wang 2021-02-08  7020  	}
29863d41bb6e1d Wei Wang 2021-02-08  7021  	return 0;
29863d41bb6e1d Wei Wang 2021-02-08  7022  }
29863d41bb6e1d Wei Wang 2021-02-08  7023  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 54007 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush()
@ 2021-02-10 20:22     ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-02-10 20:22 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3768 bytes --]

Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git de1db4a6ed6241e34cab0e5059d4b56f6bae39b9
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6bb224307d9f31afa154a8825f9acbd8e9f6b490
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
        git checkout 6bb224307d9f31afa154a8825f9acbd8e9f6b490
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/core/dev.c: In function 'napi_threaded_poll':
>> net/core/dev.c:7012:4: error: implicit declaration of function '__kfree_skb_flush'; did you mean '__kfree_skb_defer'? [-Werror=implicit-function-declaration]
    7012 |    __kfree_skb_flush();
         |    ^~~~~~~~~~~~~~~~~
         |    __kfree_skb_defer
   cc1: some warnings being treated as errors

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SND_ATMEL_SOC_PDC
   Depends on SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC && HAS_DMA
   Selected by
   - SND_ATMEL_SOC_SSC && SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC
   - SND_ATMEL_SOC_SSC_PDC && SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC && ATMEL_SSC


vim +7012 net/core/dev.c

29863d41bb6e1d Wei Wang 2021-02-08  6996  
29863d41bb6e1d Wei Wang 2021-02-08  6997  static int napi_threaded_poll(void *data)
29863d41bb6e1d Wei Wang 2021-02-08  6998  {
29863d41bb6e1d Wei Wang 2021-02-08  6999  	struct napi_struct *napi = data;
29863d41bb6e1d Wei Wang 2021-02-08  7000  	void *have;
29863d41bb6e1d Wei Wang 2021-02-08  7001  
29863d41bb6e1d Wei Wang 2021-02-08  7002  	while (!napi_thread_wait(napi)) {
29863d41bb6e1d Wei Wang 2021-02-08  7003  		for (;;) {
29863d41bb6e1d Wei Wang 2021-02-08  7004  			bool repoll = false;
29863d41bb6e1d Wei Wang 2021-02-08  7005  
29863d41bb6e1d Wei Wang 2021-02-08  7006  			local_bh_disable();
29863d41bb6e1d Wei Wang 2021-02-08  7007  
29863d41bb6e1d Wei Wang 2021-02-08  7008  			have = netpoll_poll_lock(napi);
29863d41bb6e1d Wei Wang 2021-02-08  7009  			__napi_poll(napi, &repoll);
29863d41bb6e1d Wei Wang 2021-02-08  7010  			netpoll_poll_unlock(have);
29863d41bb6e1d Wei Wang 2021-02-08  7011  
29863d41bb6e1d Wei Wang 2021-02-08 @7012  			__kfree_skb_flush();
29863d41bb6e1d Wei Wang 2021-02-08  7013  			local_bh_enable();
29863d41bb6e1d Wei Wang 2021-02-08  7014  
29863d41bb6e1d Wei Wang 2021-02-08  7015  			if (!repoll)
29863d41bb6e1d Wei Wang 2021-02-08  7016  				break;
29863d41bb6e1d Wei Wang 2021-02-08  7017  
29863d41bb6e1d Wei Wang 2021-02-08  7018  			cond_resched();
29863d41bb6e1d Wei Wang 2021-02-08  7019  		}
29863d41bb6e1d Wei Wang 2021-02-08  7020  	}
29863d41bb6e1d Wei Wang 2021-02-08  7021  	return 0;
29863d41bb6e1d Wei Wang 2021-02-08  7022  }
29863d41bb6e1d Wei Wang 2021-02-08  7023  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 54007 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush()
  2021-02-10 16:29 ` [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush() Alexander Lobakin
@ 2021-02-11  0:19     ` kernel test robot
  2021-02-11  0:19     ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-02-11  0:19 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: kbuild-all, clang-built-linux, netdev, Jonathan Lemon,
	Eric Dumazet, Dmitry Vyukov, Willem de Bruijn, Alexander Lobakin,
	Randy Dunlap, Kevin Hao

[-- Attachment #1: Type: text/plain, Size: 10304 bytes --]

Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git de1db4a6ed6241e34cab0e5059d4b56f6bae39b9
config: s390-randconfig-r025-20210209 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project c9439ca36342fb6013187d0a69aef92736951476)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/6bb224307d9f31afa154a8825f9acbd8e9f6b490
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
        git checkout 6bb224307d9f31afa154a8825f9acbd8e9f6b490
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> net/core/dev.c:7012:4: error: implicit declaration of function '__kfree_skb_flush' [-Werror,-Wimplicit-function-declaration]
                           __kfree_skb_flush();
                           ^
   20 warnings and 1 error generated.


vim +/__kfree_skb_flush +7012 net/core/dev.c

29863d41bb6e1d Wei Wang 2021-02-08  6996  
29863d41bb6e1d Wei Wang 2021-02-08  6997  static int napi_threaded_poll(void *data)
29863d41bb6e1d Wei Wang 2021-02-08  6998  {
29863d41bb6e1d Wei Wang 2021-02-08  6999  	struct napi_struct *napi = data;
29863d41bb6e1d Wei Wang 2021-02-08  7000  	void *have;
29863d41bb6e1d Wei Wang 2021-02-08  7001  
29863d41bb6e1d Wei Wang 2021-02-08  7002  	while (!napi_thread_wait(napi)) {
29863d41bb6e1d Wei Wang 2021-02-08  7003  		for (;;) {
29863d41bb6e1d Wei Wang 2021-02-08  7004  			bool repoll = false;
29863d41bb6e1d Wei Wang 2021-02-08  7005  
29863d41bb6e1d Wei Wang 2021-02-08  7006  			local_bh_disable();
29863d41bb6e1d Wei Wang 2021-02-08  7007  
29863d41bb6e1d Wei Wang 2021-02-08  7008  			have = netpoll_poll_lock(napi);
29863d41bb6e1d Wei Wang 2021-02-08  7009  			__napi_poll(napi, &repoll);
29863d41bb6e1d Wei Wang 2021-02-08  7010  			netpoll_poll_unlock(have);
29863d41bb6e1d Wei Wang 2021-02-08  7011  
29863d41bb6e1d Wei Wang 2021-02-08 @7012  			__kfree_skb_flush();
29863d41bb6e1d Wei Wang 2021-02-08  7013  			local_bh_enable();
29863d41bb6e1d Wei Wang 2021-02-08  7014  
29863d41bb6e1d Wei Wang 2021-02-08  7015  			if (!repoll)
29863d41bb6e1d Wei Wang 2021-02-08  7016  				break;
29863d41bb6e1d Wei Wang 2021-02-08  7017  
29863d41bb6e1d Wei Wang 2021-02-08  7018  			cond_resched();
29863d41bb6e1d Wei Wang 2021-02-08  7019  		}
29863d41bb6e1d Wei Wang 2021-02-08  7020  	}
29863d41bb6e1d Wei Wang 2021-02-08  7021  	return 0;
29863d41bb6e1d Wei Wang 2021-02-08  7022  }
29863d41bb6e1d Wei Wang 2021-02-08  7023  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28249 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush()
@ 2021-02-11  0:19     ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-02-11  0:19 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10473 bytes --]

Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git de1db4a6ed6241e34cab0e5059d4b56f6bae39b9
config: s390-randconfig-r025-20210209 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project c9439ca36342fb6013187d0a69aef92736951476)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/6bb224307d9f31afa154a8825f9acbd8e9f6b490
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexander-Lobakin/skbuff-introduce-skbuff_heads-bulking-and-reusing/20210211-004041
        git checkout 6bb224307d9f31afa154a8825f9acbd8e9f6b490
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> net/core/dev.c:7012:4: error: implicit declaration of function '__kfree_skb_flush' [-Werror,-Wimplicit-function-declaration]
                           __kfree_skb_flush();
                           ^
   20 warnings and 1 error generated.


vim +/__kfree_skb_flush +7012 net/core/dev.c

29863d41bb6e1d Wei Wang 2021-02-08  6996  
29863d41bb6e1d Wei Wang 2021-02-08  6997  static int napi_threaded_poll(void *data)
29863d41bb6e1d Wei Wang 2021-02-08  6998  {
29863d41bb6e1d Wei Wang 2021-02-08  6999  	struct napi_struct *napi = data;
29863d41bb6e1d Wei Wang 2021-02-08  7000  	void *have;
29863d41bb6e1d Wei Wang 2021-02-08  7001  
29863d41bb6e1d Wei Wang 2021-02-08  7002  	while (!napi_thread_wait(napi)) {
29863d41bb6e1d Wei Wang 2021-02-08  7003  		for (;;) {
29863d41bb6e1d Wei Wang 2021-02-08  7004  			bool repoll = false;
29863d41bb6e1d Wei Wang 2021-02-08  7005  
29863d41bb6e1d Wei Wang 2021-02-08  7006  			local_bh_disable();
29863d41bb6e1d Wei Wang 2021-02-08  7007  
29863d41bb6e1d Wei Wang 2021-02-08  7008  			have = netpoll_poll_lock(napi);
29863d41bb6e1d Wei Wang 2021-02-08  7009  			__napi_poll(napi, &repoll);
29863d41bb6e1d Wei Wang 2021-02-08  7010  			netpoll_poll_unlock(have);
29863d41bb6e1d Wei Wang 2021-02-08  7011  
29863d41bb6e1d Wei Wang 2021-02-08 @7012  			__kfree_skb_flush();
29863d41bb6e1d Wei Wang 2021-02-08  7013  			local_bh_enable();
29863d41bb6e1d Wei Wang 2021-02-08  7014  
29863d41bb6e1d Wei Wang 2021-02-08  7015  			if (!repoll)
29863d41bb6e1d Wei Wang 2021-02-08  7016  				break;
29863d41bb6e1d Wei Wang 2021-02-08  7017  
29863d41bb6e1d Wei Wang 2021-02-08  7018  			cond_resched();
29863d41bb6e1d Wei Wang 2021-02-08  7019  		}
29863d41bb6e1d Wei Wang 2021-02-08  7020  	}
29863d41bb6e1d Wei Wang 2021-02-08  7021  	return 0;
29863d41bb6e1d Wei Wang 2021-02-08  7022  }
29863d41bb6e1d Wei Wang 2021-02-08  7023  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 28249 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()
  2021-02-10 16:30 ` [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb() Alexander Lobakin
@ 2021-02-11 10:16   ` Paolo Abeni
  2021-02-11 14:28     ` Alexander Lobakin
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2021-02-11 10:16 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Randy Dunlap, Kevin Hao, Pablo Neira Ayuso, Jakub Sitnicki,
	Marco Elver, Dexuan Cui, Jesper Dangaard Brouer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Taehee Yoo,
	Cong Wang, Björn Töpel, Miaohe Lin, Guillaume Nault,
	Yonghong Song, zhudi, Michal Kubecek, Marcelo Ricardo Leitner,
	Dmitry Safonov, Yang Yingliang, Florian Westphal, Edward Cree,
	linux-kernel, netdev

On Wed, 2021-02-10 at 16:30 +0000, Alexander Lobakin wrote:
> Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get
> an skbuff_head from the NAPI cache instead of inplace allocation
> inside __alloc_skb().
> This implies that the function is called from softirq or BH-off
> context, not for allocating a clone or from a distant node.
> 
> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> ---
>  net/core/skbuff.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9e1a8ded4acc..750fa1825b28 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -397,15 +397,20 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  	struct sk_buff *skb;
>  	u8 *data;
>  	bool pfmemalloc;
> +	bool clone;
>  
> -	cache = (flags & SKB_ALLOC_FCLONE)
> -		? skbuff_fclone_cache : skbuff_head_cache;
> +	clone = !!(flags & SKB_ALLOC_FCLONE);
> +	cache = clone ? skbuff_fclone_cache : skbuff_head_cache;
>  
>  	if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
>  		gfp_mask |= __GFP_MEMALLOC;
>  
>  	/* Get the HEAD */
> -	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
> +	if (!clone && (flags & SKB_ALLOC_NAPI) &&
> +	    likely(node == NUMA_NO_NODE || node == numa_mem_id()))
> +		skb = napi_skb_cache_get();
> +	else
> +		skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node);
>  	if (unlikely(!skb))
>  		return NULL;
>  	prefetchw(skb);

I hope the opt-in thing would have allowed leaving this code unchanged.
I see it's not trivial avoid touching this code path.
Still I think it would be nice if you would be able to let the device
driver use the cache without touching the above, which is also used
e.g. by the TCP xmit path, which in turn will not leverage the cache
(as it requires FCLONE skbs).

If I read correctly, the above chunk is needed to
allow __napi_alloc_skb() access the cache even for small skb
allocation. Good device drivers should not call alloc_skb() in the fast
path.

What about changing __napi_alloc_skb() to always use
the __napi_build_skb(), for both kmalloc and page backed skbs? That is,
always doing the 'data' allocation in __napi_alloc_skb() - either via
page_frag or via kmalloc() - and than call __napi_build_skb().

I think that should avoid adding more checks in __alloc_skb() and
should probably reduce the number of conditional used
by __napi_alloc_skb().

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads
  2021-02-10 16:30 ` [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Alexander Lobakin
@ 2021-02-11 12:54   ` Jesper Dangaard Brouer
  2021-02-11 14:38     ` Alexander Lobakin
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-11 12:54 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Dmitry Vyukov, Willem de Bruijn, Randy Dunlap, Kevin Hao,
	Pablo Neira Ayuso, Jakub Sitnicki, Marco Elver, Dexuan Cui,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev, brouer

On Wed, 10 Feb 2021 16:30:23 +0000
Alexander Lobakin <alobakin@pm.me> wrote:

> Instead of just bulk-flushing skbuff_heads queued up through
> napi_consume_skb() or __kfree_skb_defer(), try to reuse them
> on allocation path.

Maybe you are already aware of this dynamics, but high speed NICs will
usually run the TX "cleanup" (opportunistic DMA-completion) in the napi
poll function call, and often before processing RX packets. Like
ixgbe_poll[1] calls ixgbe_clean_tx_irq() before ixgbe_clean_rx_irq().

If traffic is symmetric (or is routed-back same interface) then this
SKB recycle scheme will be highly efficient. (I had this part of my
initial patchset and tested it on ixgbe).

[1] https://elixir.bootlin.com/linux/v5.11-rc7/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L3149

> If the cache is empty on allocation, bulk-allocate the first
> 16 elements, which is more efficient than per-skb allocation.
> If the cache is full on freeing, bulk-wipe the second half of
> the cache (32 elements).
> This also includes custom KASAN poisoning/unpoisoning to be
> double sure there are no use-after-free cases.
> 
> To not change current behaviour, introduce a new function,
> napi_build_skb(), to optionally use a new approach later
> in drivers.
> 
> Note on selected bulk size, 16:
>  - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
>    and especially VETH_XDP_BATCH, which is also used to
>    bulk-allocate skbuff_heads and was tested on powerful
>    setups;
>  - this also showed the best performance in the actual
>    test series (from the array of {8, 16, 32}).
> 
> Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
> Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
> Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
> Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> ---
>  include/linux/skbuff.h |  2 +
>  net/core/skbuff.c      | 94 ++++++++++++++++++++++++++++++++++++------
>  2 files changed, 83 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0e0707296098..906122eac82a 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1087,6 +1087,8 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size);
>  struct sk_buff *build_skb_around(struct sk_buff *skb,
>  				 void *data, unsigned int frag_size);
>  
> +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
> +
>  /**
>   * alloc_skb - allocate a network buffer
>   * @size: size to allocate
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 860a9d4f752f..9e1a8ded4acc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -120,6 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
>  }
>  
>  #define NAPI_SKB_CACHE_SIZE	64
> +#define NAPI_SKB_CACHE_BULK	16
> +#define NAPI_SKB_CACHE_HALF	(NAPI_SKB_CACHE_SIZE / 2)
>  


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()
  2021-02-11 10:16   ` Paolo Abeni
@ 2021-02-11 14:28     ` Alexander Lobakin
  2021-02-11 14:55       ` Paolo Abeni
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-11 14:28 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski,
	Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Randy Dunlap, Kevin Hao, Pablo Neira Ayuso, Jakub Sitnicki,
	Marco Elver, Dexuan Cui, Jesper Dangaard Brouer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Taehee Yoo,
	Cong Wang, Björn Töpel, Miaohe Lin, Guillaume Nault,
	Yonghong Song, zhudi, Michal Kubecek, Marcelo Ricardo Leitner,
	Dmitry Safonov, Yang Yingliang, Florian Westphal, Edward Cree,
	linux-kernel, netdev

From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 11 Feb 2021 11:16:40 +0100

> On Wed, 2021-02-10 at 16:30 +0000, Alexander Lobakin wrote:
> > Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get
> > an skbuff_head from the NAPI cache instead of inplace allocation
> > inside __alloc_skb().
> > This implies that the function is called from softirq or BH-off
> > context, not for allocating a clone or from a distant node.
> > 
> > Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> > ---
> >  net/core/skbuff.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 9e1a8ded4acc..750fa1825b28 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -397,15 +397,20 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> >  	struct sk_buff *skb;
> >  	u8 *data;
> >  	bool pfmemalloc;
> > +	bool clone;
> >  
> > -	cache = (flags & SKB_ALLOC_FCLONE)
> > -		? skbuff_fclone_cache : skbuff_head_cache;
> > +	clone = !!(flags & SKB_ALLOC_FCLONE);
> > +	cache = clone ? skbuff_fclone_cache : skbuff_head_cache;
> >  
> >  	if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
> >  		gfp_mask |= __GFP_MEMALLOC;
> >  
> >  	/* Get the HEAD */
> > -	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
> > +	if (!clone && (flags & SKB_ALLOC_NAPI) &&
> > +	    likely(node == NUMA_NO_NODE || node == numa_mem_id()))
> > +		skb = napi_skb_cache_get();
> > +	else
> > +		skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node);
> >  	if (unlikely(!skb))
> >  		return NULL;
> >  	prefetchw(skb);
> 
> I hope the opt-in thing would have allowed leaving this code unchanged.
> I see it's not trivial avoid touching this code path.
> Still I think it would be nice if you would be able to let the device
> driver use the cache without touching the above, which is also used
> e.g. by the TCP xmit path, which in turn will not leverage the cache
> (as it requires FCLONE skbs).
> 
> If I read correctly, the above chunk is needed to
> allow __napi_alloc_skb() access the cache even for small skb
> allocation.

Not only. I wanted to give an ability to access the new feature
through __alloc_skb() too, not only through napi_build_skb() or
napi_alloc_skb().
And not only for drivers. As you may remember, firstly
napi_consume_skb()'s batching system landed for drivers, but then
it got used in network core code.
I think that some core parts may benefit from reusing the NAPI
caches. We'll only see it later.

It's not as complex as it may seem. NUMA check is cheap and tends
to be true for the vast majority of cases. Check for fclone is
already present in baseline code, even two times through the function.
So it's mostly about (flags & SKB_ALLOC_NAPI).

> Good device drivers should not call alloc_skb() in the fast
> path.

Not really. Several enterprise NIC drivers use __alloc_skb() and
alloc_skb(): ChelsIO and Mellanox for inline TLS, Netronome etc.
Lots of RDMA and wireless drivers (not the legacy ones), too.
__alloc_skb() gives you more control on NUMA node and needed skb
headroom, so it's still sometimes useful in drivers.

> What about changing __napi_alloc_skb() to always use
> the __napi_build_skb(), for both kmalloc and page backed skbs? That is,
> always doing the 'data' allocation in __napi_alloc_skb() - either via
> page_frag or via kmalloc() - and than call __napi_build_skb().
>
> I think that should avoid adding more checks in __alloc_skb() and
> should probably reduce the number of conditional used
> by __napi_alloc_skb().

I thought of this too. But this will introduce conditional branch
to set or not skb->head_frag. So one branch less in __alloc_skb(),
one branch more here, and we also lose the ability to __alloc_skb()
with decached head.

> Thanks!
> 
> Paolo

Thanks,
Al


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads
  2021-02-11 12:54   ` Jesper Dangaard Brouer
@ 2021-02-11 14:38     ` Alexander Lobakin
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-11 14:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski,
	Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Randy Dunlap, Kevin Hao, Pablo Neira Ayuso, Jakub Sitnicki,
	Marco Elver, Dexuan Cui, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Taehee Yoo, Cong Wang,
	Björn Töpel, Miaohe Lin, Guillaume Nault,
	Yonghong Song, zhudi, Michal Kubecek, Marcelo Ricardo Leitner,
	Dmitry Safonov, Yang Yingliang, Florian Westphal, Edward Cree,
	linux-kernel, netdev

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 11 Feb 2021 13:54:59 +0100

> On Wed, 10 Feb 2021 16:30:23 +0000
> Alexander Lobakin <alobakin@pm.me> wrote:
> 
> > Instead of just bulk-flushing skbuff_heads queued up through
> > napi_consume_skb() or __kfree_skb_defer(), try to reuse them
> > on allocation path.
> 
> Maybe you are already aware of this dynamics, but high speed NICs will
> usually run the TX "cleanup" (opportunistic DMA-completion) in the napi
> poll function call, and often before processing RX packets. Like
> ixgbe_poll[1] calls ixgbe_clean_tx_irq() before ixgbe_clean_rx_irq().

Sure. 1G MIPS is my home project (I'll likely migrate to ARM64 cluster
in 2-3 months). I mostly work with 10-100G NICs at work.

> If traffic is symmetric (or is routed-back same interface) then this
> SKB recycle scheme will be highly efficient. (I had this part of my
> initial patchset and tested it on ixgbe).
> 
> [1] https://elixir.bootlin.com/linux/v5.11-rc7/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L3149

That's exactly why I introduced this feature. Firstly driver enriches
the cache with the consumed skbs from Tx completion queue, and then
it just decaches them back on Rx completion cycle. That's how things
worked most of the time on my test setup.

The reason why Paolo proposed this as an option, and why I agreed
it's safer to do instead of unconditional switching, is that
different platforms and setup may react differently on this.
We don't have an ability to test the entire zoo, so we propose
an option for driver and network core developers to test and use
"on demand".
As I wrote in reply to Paolo, there might be cases when even the
core networking code may benefit from this.

> > If the cache is empty on allocation, bulk-allocate the first
> > 16 elements, which is more efficient than per-skb allocation.
> > If the cache is full on freeing, bulk-wipe the second half of
> > the cache (32 elements).
> > This also includes custom KASAN poisoning/unpoisoning to be
> > double sure there are no use-after-free cases.
> > 
> > To not change current behaviour, introduce a new function,
> > napi_build_skb(), to optionally use a new approach later
> > in drivers.
> > 
> > Note on selected bulk size, 16:
> >  - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
> >    and especially VETH_XDP_BATCH, which is also used to
> >    bulk-allocate skbuff_heads and was tested on powerful
> >    setups;
> >  - this also showed the best performance in the actual
> >    test series (from the array of {8, 16, 32}).
> > 
> > Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
> > Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
> > Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
> > Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
> > Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> > ---
> >  include/linux/skbuff.h |  2 +
> >  net/core/skbuff.c      | 94 ++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 83 insertions(+), 13 deletions(-)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 0e0707296098..906122eac82a 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1087,6 +1087,8 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size);
> >  struct sk_buff *build_skb_around(struct sk_buff *skb,
> >  				 void *data, unsigned int frag_size);
> >  
> > +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
> > +
> >  /**
> >   * alloc_skb - allocate a network buffer
> >   * @size: size to allocate
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 860a9d4f752f..9e1a8ded4acc 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -120,6 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
> >  }
> >  
> >  #define NAPI_SKB_CACHE_SIZE	64
> > +#define NAPI_SKB_CACHE_BULK	16
> > +#define NAPI_SKB_CACHE_HALF	(NAPI_SKB_CACHE_SIZE / 2)
> >  
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

Thanks,
Al


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()
  2021-02-11 14:28     ` Alexander Lobakin
@ 2021-02-11 14:55       ` Paolo Abeni
  2021-02-11 15:26         ` Alexander Lobakin
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2021-02-11 14:55 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Dmitry Vyukov, Willem de Bruijn, Randy Dunlap, Kevin Hao,
	Pablo Neira Ayuso, Jakub Sitnicki, Marco Elver, Dexuan Cui,
	Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Taehee Yoo, Cong Wang, Björn Töpel,
	Miaohe Lin, Guillaume Nault, Yonghong Song, zhudi,
	Michal Kubecek, Marcelo Ricardo Leitner, Dmitry Safonov,
	Yang Yingliang, Florian Westphal, Edward Cree, linux-kernel,
	netdev

On Thu, 2021-02-11 at 14:28 +0000, Alexander Lobakin wrote:
> From: Paolo Abeni <pabeni@redhat.com> on Thu, 11 Feb 2021 11:16:40 +0100 wrote:
> > What about changing __napi_alloc_skb() to always use
> > the __napi_build_skb(), for both kmalloc and page backed skbs? That is,
> > always doing the 'data' allocation in __napi_alloc_skb() - either via
> > page_frag or via kmalloc() - and than call __napi_build_skb().
> > 
> > I think that should avoid adding more checks in __alloc_skb() and
> > should probably reduce the number of conditional used
> > by __napi_alloc_skb().
> 
> I thought of this too. But this will introduce conditional branch
> to set or not skb->head_frag. So one branch less in __alloc_skb(),
> one branch more here, and we also lose the ability to __alloc_skb()
> with decached head.

Just to try to be clear, I mean something alike the following (not even
build tested). In the fast path it has less branches than the current
code - for both kmalloc and page_frag allocation.

---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 785daff48030..a242fbe4730e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -506,23 +506,12 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
 	struct napi_alloc_cache *nc;
+	bool head_frag, pfmemalloc;
 	struct sk_buff *skb;
 	void *data;
 
 	len += NET_SKB_PAD + NET_IP_ALIGN;
 
-	/* If requested length is either too small or too big,
-	 * we use kmalloc() for skb->head allocation.
-	 */
-	if (len <= SKB_WITH_OVERHEAD(1024) ||
-	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
-	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
-		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
-		if (!skb)
-			goto skb_fail;
-		goto skb_success;
-	}
-
 	nc = this_cpu_ptr(&napi_alloc_cache);
 	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	len = SKB_DATA_ALIGN(len);
@@ -530,25 +519,34 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	if (sk_memalloc_socks())
 		gfp_mask |= __GFP_MEMALLOC;
 
-	data = page_frag_alloc(&nc->page, len, gfp_mask);
+	if (len <= SKB_WITH_OVERHEAD(1024) ||
+            len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
+            (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
+		data = kmalloc_reserve(len, gfp_mask, NUMA_NO_NODE, &pfmemalloc);
+		head_frag = 0;
+		len = 0;
+	} else {
+		data = page_frag_alloc(&nc->page, len, gfp_mask);
+		pfmemalloc = nc->page.pfmemalloc;
+		head_frag = 1;
+	}
 	if (unlikely(!data))
 		return NULL;
 
 	skb = __build_skb(data, len);
 	if (unlikely(!skb)) {
-		skb_free_frag(data);
+		if (head_frag)
+			skb_free_frag(data);
+		else
+			kfree(data);
 		return NULL;
 	}
 
-	if (nc->page.pfmemalloc)
-		skb->pfmemalloc = 1;
-	skb->head_frag = 1;
+	skb->pfmemalloc = pfmemalloc;
+	skb->head_frag = head_frag;
 
-skb_success:
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
 	skb->dev = napi->dev;
-
-skb_fail:
 	return skb;
 }
 EXPORT_SYMBOL(__napi_alloc_skb);


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()
  2021-02-11 14:55       ` Paolo Abeni
@ 2021-02-11 15:26         ` Alexander Lobakin
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Lobakin @ 2021-02-11 15:26 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski,
	Jonathan Lemon, Eric Dumazet, Dmitry Vyukov, Willem de Bruijn,
	Randy Dunlap, Kevin Hao, Pablo Neira Ayuso, Jakub Sitnicki,
	Marco Elver, Dexuan Cui, Jesper Dangaard Brouer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Taehee Yoo,
	Cong Wang, Björn Töpel, Miaohe Lin, Guillaume Nault,
	Yonghong Song, zhudi, Michal Kubecek, Marcelo Ricardo Leitner,
	Dmitry Safonov, Yang Yingliang, Florian Westphal, Edward Cree,
	linux-kernel, netdev

From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 11 Feb 2021 15:55:04 +0100

> On Thu, 2021-02-11 at 14:28 +0000, Alexander Lobakin wrote:
> > From: Paolo Abeni <pabeni@redhat.com> on Thu, 11 Feb 2021 11:16:40 +0100 wrote:
> > > What about changing __napi_alloc_skb() to always use
> > > the __napi_build_skb(), for both kmalloc and page backed skbs? That is,
> > > always doing the 'data' allocation in __napi_alloc_skb() - either via
> > > page_frag or via kmalloc() - and than call __napi_build_skb().
> > > 
> > > I think that should avoid adding more checks in __alloc_skb() and
> > > should probably reduce the number of conditional used
> > > by __napi_alloc_skb().
> > 
> > I thought of this too. But this will introduce conditional branch
> > to set or not skb->head_frag. So one branch less in __alloc_skb(),
> > one branch more here, and we also lose the ability to __alloc_skb()
> > with decached head.
> 
> Just to try to be clear, I mean something alike the following (not even
> build tested). In the fast path it has less branches than the current
> code - for both kmalloc and page_frag allocation.
> 
> ---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 785daff48030..a242fbe4730e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -506,23 +506,12 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  				 gfp_t gfp_mask)
>  {
>  	struct napi_alloc_cache *nc;
> +	bool head_frag, pfmemalloc;
>  	struct sk_buff *skb;
>  	void *data;
>  
>  	len += NET_SKB_PAD + NET_IP_ALIGN;
>  
> -	/* If requested length is either too small or too big,
> -	 * we use kmalloc() for skb->head allocation.
> -	 */
> -	if (len <= SKB_WITH_OVERHEAD(1024) ||
> -	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> -	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> -		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
> -		if (!skb)
> -			goto skb_fail;
> -		goto skb_success;
> -	}
> -
>  	nc = this_cpu_ptr(&napi_alloc_cache);
>  	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  	len = SKB_DATA_ALIGN(len);
> @@ -530,25 +519,34 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  	if (sk_memalloc_socks())
>  		gfp_mask |= __GFP_MEMALLOC;
>  
> -	data = page_frag_alloc(&nc->page, len, gfp_mask);
> +	if (len <= SKB_WITH_OVERHEAD(1024) ||
> +            len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> +            (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> +		data = kmalloc_reserve(len, gfp_mask, NUMA_NO_NODE, &pfmemalloc);
> +		head_frag = 0;
> +		len = 0;
> +	} else {
> +		data = page_frag_alloc(&nc->page, len, gfp_mask);
> +		pfmemalloc = nc->page.pfmemalloc;
> +		head_frag = 1;
> +	}
>  	if (unlikely(!data))
>  		return NULL;

Sure. I have a separate WIP series that reworks all three *alloc_skb()
functions, as there's a nice room for optimization, especially after
that tiny skbs now fall back to __alloc_skb().
It will likely hit mailing lists after the merge window and next
net-next season, not now. And it's not really connected with NAPI
cache reusing.

>  	skb = __build_skb(data, len);
>  	if (unlikely(!skb)) {
> -		skb_free_frag(data);
> +		if (head_frag)
> +			skb_free_frag(data);
> +		else
> +			kfree(data);
>  		return NULL;
>  	}
>  
> -	if (nc->page.pfmemalloc)
> -		skb->pfmemalloc = 1;
> -	skb->head_frag = 1;
> +	skb->pfmemalloc = pfmemalloc;
> +	skb->head_frag = head_frag;
>  
> -skb_success:
>  	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>  	skb->dev = napi->dev;
> -
> -skb_fail:
>  	return skb;
>  }
>  EXPORT_SYMBOL(__napi_alloc_skb);

Al


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-02-11 16:33 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 16:28 [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
2021-02-10 16:28 ` [PATCH v4 net-next 01/11] skbuff: move __alloc_skb() next to the other skb allocation functions Alexander Lobakin
2021-02-10 16:28 ` [PATCH v4 net-next 02/11] skbuff: simplify kmalloc_reserve() Alexander Lobakin
2021-02-10 16:29 ` [PATCH v4 net-next 03/11] skbuff: make __build_skb_around() return void Alexander Lobakin
2021-02-10 16:29 ` [PATCH v4 net-next 04/11] skbuff: simplify __alloc_skb() a bit Alexander Lobakin
2021-02-10 16:29 ` [PATCH v4 net-next 05/11] skbuff: use __build_skb_around() in __alloc_skb() Alexander Lobakin
2021-02-10 16:29 ` [PATCH v4 net-next 06/11] skbuff: remove __kfree_skb_flush() Alexander Lobakin
2021-02-10 20:22   ` kernel test robot
2021-02-10 20:22     ` kernel test robot
2021-02-11  0:19   ` kernel test robot
2021-02-11  0:19     ` kernel test robot
2021-02-10 16:30 ` [PATCH v4 net-next 07/11] skbuff: move NAPI cache declarations upper in the file Alexander Lobakin
2021-02-10 16:30 ` [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Alexander Lobakin
2021-02-11 12:54   ` Jesper Dangaard Brouer
2021-02-11 14:38     ` Alexander Lobakin
2021-02-10 16:30 ` [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb() Alexander Lobakin
2021-02-11 10:16   ` Paolo Abeni
2021-02-11 14:28     ` Alexander Lobakin
2021-02-11 14:55       ` Paolo Abeni
2021-02-11 15:26         ` Alexander Lobakin
2021-02-10 16:30 ` [PATCH v4 net-next 10/11] skbuff: allow to use NAPI cache from __napi_alloc_skb() Alexander Lobakin
2021-02-10 16:31 ` [PATCH v4 net-next 11/11] skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing Alexander Lobakin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.