netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs
@ 2023-02-02 18:57 Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

Our profile data show that using kmalloc(non_const_size)/kfree(ptr)
has a certain cost, because kfree(ptr) has to pull a 'struct page'
in cpu caches.

Using a dedicated kmem_cache for TCP skb->head allocations makes
a difference, both in cpu cycles and memory savings.

This kmem_cache could also be used for GRO skb allocations,
this is left as a future exercise.

Eric Dumazet (4):
  net: add SKB_HEAD_ALIGN() helper
  net: remove osize variable in __alloc_skb()
  net: factorize code in kmalloc_reserve()
  net: add dedicated kmem_cache for typical/small skb->head

 include/linux/skbuff.h |  8 ++++
 net/core/skbuff.c      | 95 +++++++++++++++++++++++++++---------------
 2 files changed, 70 insertions(+), 33 deletions(-)

-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
@ 2023-02-02 18:57 ` Eric Dumazet
  2023-02-02 20:06   ` Soheil Hassas Yeganeh
  2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

We have many places using this expression:

 SKB_DATA_ALIGN(sizeof(struct skb_shared_info))

Use of SKB_HEAD_ALIGN() will allow to clean them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h |  8 ++++++++
 net/core/skbuff.c      | 18 ++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5ba12185f43e311e37c9045763c3ee0efc274f2a..f2141b7e3940cee060e8443dbaa147b843eb43a0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -255,6 +255,14 @@
 #define SKB_DATA_ALIGN(X)	ALIGN(X, SMP_CACHE_BYTES)
 #define SKB_WITH_OVERHEAD(X)	\
 	((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+/* For X bytes available in skb->head, what is the minimal
+ * allocation needed, knowing struct skb_shared_info needs
+ * to be aligned.
+ */
+#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \
+	SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
 #define SKB_MAX_ORDER(X, ORDER) \
 	SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
 #define SKB_MAX_HEAD(X)		(SKB_MAX_ORDER((X), 0))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bb79b4cb89db344d23609f93b2bcca5103f1e92d..b73de8fb0756c02cf9ba4b7e90854c9c17728463 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	osize = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
 	if (unlikely(!data))
@@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 		goto skb_success;
 	}
 
-	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	len = SKB_DATA_ALIGN(len);
+	len = SKB_HEAD_ALIGN(len);
 
 	if (sk_memalloc_socks())
 		gfp_mask |= __GFP_MEMALLOC;
@@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 		data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
 		pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
 	} else {
-		len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-		len = SKB_DATA_ALIGN(len);
+		len = SKB_HEAD_ALIGN(len);
 
 		data = page_frag_alloc(&nc->page, len, gfp_mask);
 		pfmemalloc = nc->page.pfmemalloc;
@@ -1936,8 +1933,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	size = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
@@ -6288,8 +6284,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	size = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
@@ -6407,8 +6402,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	size = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/4] net: remove osize variable in __alloc_skb()
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
@ 2023-02-02 18:57 ` Eric Dumazet
  2023-02-02 20:07   ` Soheil Hassas Yeganeh
  2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

This is a cleanup patch, to prepare following change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b73de8fb0756c02cf9ba4b7e90854c9c17728463..a82df5289208d69716e60c5c1f201ec3ca50a258 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 {
 	struct kmem_cache *cache;
 	struct sk_buff *skb;
-	unsigned int osize;
 	bool pfmemalloc;
 	u8 *data;
 
@@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
 	size = SKB_HEAD_ALIGN(size);
-	osize = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
+	size = kmalloc_size_roundup(size);
+	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
 	if (unlikely(!data))
 		goto nodata;
 	/* kmalloc_size_roundup() might give us more room than requested.
 	 * Put skb_shared_info exactly at the end of allocated zone,
 	 * to allow max possible filling before reallocation.
 	 */
-	size = SKB_WITH_OVERHEAD(osize);
-	prefetchw(data + size);
+	prefetchw(data + SKB_WITH_OVERHEAD(size));
 
 	/*
 	 * Only clear those fields we need to clear, not those that we will
@@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * the tail pointer in struct sk_buff!
 	 */
 	memset(skb, 0, offsetof(struct sk_buff, tail));
-	__build_skb_around(skb, data, osize);
+	__build_skb_around(skb, data, size);
 	skb->pfmemalloc = pfmemalloc;
 
 	if (flags & SKB_ALLOC_FCLONE) {
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/4] net: factorize code in kmalloc_reserve()
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
@ 2023-02-02 18:58 ` Eric Dumazet
  2023-02-02 20:09   ` Soheil Hassas Yeganeh
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
  2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
  4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:58 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

All kmalloc_reserve() callers have to make the same computation,
we can factorize them, to prepare following patch in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a82df5289208d69716e60c5c1f201ec3ca50a258..ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb);
  * may be used. Otherwise, the packet data may be discarded until enough
  * memory is free
  */
-static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
 			     bool *pfmemalloc)
 {
-	void *obj;
 	bool ret_pfmemalloc = false;
+	unsigned int obj_size;
+	void *obj;
 
+	obj_size = SKB_HEAD_ALIGN(*size);
+	*size = obj_size = kmalloc_size_roundup(obj_size);
 	/*
 	 * Try a regular allocation, when that fails and we're not entitled
 	 * to the reserves, fail.
 	 */
-	obj = kmalloc_node_track_caller(size,
+	obj = kmalloc_node_track_caller(obj_size,
 					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
 					node);
 	if (obj || !(gfp_pfmemalloc_allowed(flags)))
@@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
 
 	/* Try again but now we are using pfmemalloc reserves */
 	ret_pfmemalloc = true;
-	obj = kmalloc_node_track_caller(size, flags, node);
+	obj = kmalloc_node_track_caller(obj_size, flags, node);
 
 out:
 	if (pfmemalloc)
@@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
+	data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
 	if (unlikely(!data))
 		goto nodata;
 	/* kmalloc_size_roundup() might give us more room than requested.
@@ -1931,9 +1932,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+	data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
 		goto nodata;
 	size = SKB_WITH_OVERHEAD(size);
@@ -6282,9 +6281,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+	data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
 		return -ENOMEM;
 	size = SKB_WITH_OVERHEAD(size);
@@ -6400,9 +6397,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+	data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
 		return -ENOMEM;
 	size = SKB_WITH_OVERHEAD(size);
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
                   ` (2 preceding siblings ...)
  2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
@ 2023-02-02 18:58 ` Eric Dumazet
  2023-02-02 20:14   ` Soheil Hassas Yeganeh
                     ` (3 more replies)
  2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
  4 siblings, 4 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:58 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

Recent removal of ksize() in alloc_skb() increased
performance because we no longer read
the associated struct page.

We have an equivalent cost at kfree_skb() time.

kfree(skb->head) has to access a struct page,
often cold in cpu caches to get the owning
struct kmem_cache.

Considering that many allocations are small,
we can have our own kmem_cache to avoid the cache line miss.

This also saves memory because these small heads
are no longer padded to 1024 bytes.

CONFIG_SLUB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head   2907   2907    640   51    8 : tunables    0    0    0 : slabdata     57     57      0

CONFIG_SLAB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head    607    624    640    6    1 : tunables   54   27    8 : slabdata    104    104      5

Note: after Kees Cook patches and this one, we might
be able to revert commit
dbae2b062824 ("net: skb: introduce and use a single page frag cache")
because GRO_MAX_HEAD is also small.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
---
 net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc..3e540b4924701cc57b6fbd1b668bab3b652ee94c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -89,6 +89,19 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
 #ifdef CONFIG_SKB_EXTENSIONS
 static struct kmem_cache *skbuff_ext_cache __ro_after_init;
 #endif
+static struct kmem_cache *skb_small_head_cache __ro_after_init;
+
+#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
+
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
+#define SKB_SMALL_HEAD_CACHE_SIZE					\
+	(is_power_of_2(SKB_SMALL_HEAD_SIZE) ?			\
+		(SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) :	\
+		SKB_SMALL_HEAD_SIZE)
+
+#define SKB_SMALL_HEAD_HEADROOM						\
+	SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
+
 int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
 EXPORT_SYMBOL(sysctl_max_skb_frags);
 
@@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
 	void *obj;
 
 	obj_size = SKB_HEAD_ALIGN(*size);
+	if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
+	    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
+
+		/* skb_small_head_cache has non power of two size,
+		 * likely forcing SLUB to use order-3 pages.
+		 * We deliberately attempt a NOMEMALLOC allocation only.
+		 */
+		obj = kmem_cache_alloc_node(skb_small_head_cache,
+				flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
+				node);
+		if (obj) {
+			*size = SKB_SMALL_HEAD_CACHE_SIZE;
+			goto out;
+		}
+	}
 	*size = obj_size = kmalloc_size_roundup(obj_size);
 	/*
 	 * Try a regular allocation, when that fails and we're not entitled
@@ -805,6 +833,14 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
 	return page_pool_return_skb_page(virt_to_page(data));
 }
 
+static void skb_kfree_head(void *head, unsigned int end_offset)
+{
+	if (end_offset == SKB_SMALL_HEAD_HEADROOM)
+		kmem_cache_free(skb_small_head_cache, head);
+	else
+		kfree(head);
+}
+
 static void skb_free_head(struct sk_buff *skb)
 {
 	unsigned char *head = skb->head;
@@ -814,7 +850,7 @@ static void skb_free_head(struct sk_buff *skb)
 			return;
 		skb_free_frag(head);
 	} else {
-		kfree(head);
+		skb_kfree_head(head, skb_end_offset(skb));
 	}
 }
 
@@ -1995,7 +2031,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	return 0;
 
 nofrags:
-	kfree(data);
+	skb_kfree_head(data, size);
 nodata:
 	return -ENOMEM;
 }
@@ -4633,6 +4669,12 @@ void __init skb_init(void)
 						0,
 						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
 						NULL);
+	skb_small_head_cache = kmem_cache_create("skbuff_small_head",
+						SKB_SMALL_HEAD_CACHE_SIZE,
+						0,
+						SLAB_HWCACHE_ALIGN | SLAB_PANIC,
+						NULL);
+
 	skb_extensions_init();
 }
 
@@ -6297,7 +6339,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_cloned(skb)) {
 		/* drop the old head gracefully */
 		if (skb_orphan_frags(skb, gfp_mask)) {
-			kfree(data);
+			skb_kfree_head(data, size);
 			return -ENOMEM;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
@@ -6405,7 +6447,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	memcpy((struct skb_shared_info *)(data + size),
 	       skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
 	if (skb_orphan_frags(skb, gfp_mask)) {
-		kfree(data);
+		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
 	shinfo = (struct skb_shared_info *)(data + size);
@@ -6441,7 +6483,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 		/* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
 		if (skb_has_frag_list(skb))
 			kfree_skb_list(skb_shinfo(skb)->frag_list);
-		kfree(data);
+		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
 	skb_release_data(skb, SKB_CONSUMED);
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
@ 2023-02-02 20:06   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> We have many places using this expression:
>
>  SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>
> Use of SKB_HEAD_ALIGN() will allow to clean them.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  include/linux/skbuff.h |  8 ++++++++
>  net/core/skbuff.c      | 18 ++++++------------
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 5ba12185f43e311e37c9045763c3ee0efc274f2a..f2141b7e3940cee060e8443dbaa147b843eb43a0 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -255,6 +255,14 @@
>  #define SKB_DATA_ALIGN(X)      ALIGN(X, SMP_CACHE_BYTES)
>  #define SKB_WITH_OVERHEAD(X)   \
>         ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +
> +/* For X bytes available in skb->head, what is the minimal
> + * allocation needed, knowing struct skb_shared_info needs
> + * to be aligned.
> + */
> +#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \
> +       SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +
>  #define SKB_MAX_ORDER(X, ORDER) \
>         SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
>  #define SKB_MAX_HEAD(X)                (SKB_MAX_ORDER((X), 0))
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index bb79b4cb89db344d23609f93b2bcca5103f1e92d..b73de8fb0756c02cf9ba4b7e90854c9c17728463 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * aligned memory blocks, unless SLUB/SLAB debug is enabled.
>          * Both skb->head and skb_shared_info are cache line aligned.
>          */
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         osize = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
>         if (unlikely(!data))
> @@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
>                 goto skb_success;
>         }
>
> -       len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -       len = SKB_DATA_ALIGN(len);
> +       len = SKB_HEAD_ALIGN(len);
>
>         if (sk_memalloc_socks())
>                 gfp_mask |= __GFP_MEMALLOC;
> @@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>                 data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
>                 pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
>         } else {
> -               len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -               len = SKB_DATA_ALIGN(len);
> +               len = SKB_HEAD_ALIGN(len);
>
>                 data = page_frag_alloc(&nc->page, len, gfp_mask);
>                 pfmemalloc = nc->page.pfmemalloc;
> @@ -1936,8 +1933,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         size = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
> @@ -6288,8 +6284,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         size = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
> @@ -6407,8 +6402,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         size = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/4] net: remove osize variable in __alloc_skb()
  2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
@ 2023-02-02 20:07   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> This is a cleanup patch, to prepare following change.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  net/core/skbuff.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b73de8fb0756c02cf9ba4b7e90854c9c17728463..a82df5289208d69716e60c5c1f201ec3ca50a258 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  {
>         struct kmem_cache *cache;
>         struct sk_buff *skb;
> -       unsigned int osize;
>         bool pfmemalloc;
>         u8 *data;
>
> @@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * Both skb->head and skb_shared_info are cache line aligned.
>          */
>         size = SKB_HEAD_ALIGN(size);
> -       osize = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
> +       size = kmalloc_size_roundup(size);
> +       data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
>         if (unlikely(!data))
>                 goto nodata;
>         /* kmalloc_size_roundup() might give us more room than requested.
>          * Put skb_shared_info exactly at the end of allocated zone,
>          * to allow max possible filling before reallocation.
>          */
> -       size = SKB_WITH_OVERHEAD(osize);
> -       prefetchw(data + size);
> +       prefetchw(data + SKB_WITH_OVERHEAD(size));
>
>         /*
>          * Only clear those fields we need to clear, not those that we will
> @@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * the tail pointer in struct sk_buff!
>          */
>         memset(skb, 0, offsetof(struct sk_buff, tail));
> -       __build_skb_around(skb, data, osize);
> +       __build_skb_around(skb, data, size);
>         skb->pfmemalloc = pfmemalloc;
>
>         if (flags & SKB_ALLOC_FCLONE) {
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/4] net: factorize code in kmalloc_reserve()
  2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
@ 2023-02-02 20:09   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> All kmalloc_reserve() callers have to make the same computation,
> we can factorize them, to prepare following patch in the series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  net/core/skbuff.c | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a82df5289208d69716e60c5c1f201ec3ca50a258..ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb);
>   * may be used. Otherwise, the packet data may be discarded until enough
>   * memory is free
>   */
> -static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
> +static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
>                              bool *pfmemalloc)
>  {
> -       void *obj;
>         bool ret_pfmemalloc = false;
> +       unsigned int obj_size;
> +       void *obj;
>
> +       obj_size = SKB_HEAD_ALIGN(*size);
> +       *size = obj_size = kmalloc_size_roundup(obj_size);
>         /*
>          * Try a regular allocation, when that fails and we're not entitled
>          * to the reserves, fail.
>          */
> -       obj = kmalloc_node_track_caller(size,
> +       obj = kmalloc_node_track_caller(obj_size,
>                                         flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
>                                         node);
>         if (obj || !(gfp_pfmemalloc_allowed(flags)))
> @@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
>
>         /* Try again but now we are using pfmemalloc reserves */
>         ret_pfmemalloc = true;
> -       obj = kmalloc_node_track_caller(size, flags, node);
> +       obj = kmalloc_node_track_caller(obj_size, flags, node);
>
>  out:
>         if (pfmemalloc)
> @@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * aligned memory blocks, unless SLUB/SLAB debug is enabled.
>          * Both skb->head and skb_shared_info are cache line aligned.
>          */
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> +       data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
>         if (unlikely(!data))
>                 goto nodata;
>         /* kmalloc_size_roundup() might give us more room than requested.
> @@ -1931,9 +1932,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> +       data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
>                 goto nodata;
>         size = SKB_WITH_OVERHEAD(size);
> @@ -6282,9 +6281,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> +       data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
>                 return -ENOMEM;
>         size = SKB_WITH_OVERHEAD(size);
> @@ -6400,9 +6397,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> +       data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
>                 return -ENOMEM;
>         size = SKB_WITH_OVERHEAD(size);
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
@ 2023-02-02 20:14   ` Soheil Hassas Yeganeh
  2023-02-03  5:11   ` Jakub Kicinski
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Recent removal of ksize() in alloc_skb() increased
> performance because we no longer read
> the associated struct page.
>
> We have an equivalent cost at kfree_skb() time.
>
> kfree(skb->head) has to access a struct page,
> often cold in cpu caches to get the owning
> struct kmem_cache.
>
> Considering that many allocations are small,
> we can have our own kmem_cache to avoid the cache line miss.
>
> This also saves memory because these small heads
> are no longer padded to 1024 bytes.
>
> CONFIG_SLUB=y
> $ grep skbuff_small_head /proc/slabinfo
> skbuff_small_head   2907   2907    640   51    8 : tunables    0    0    0 : slabdata     57     57      0
>
> CONFIG_SLAB=y
> $ grep skbuff_small_head /proc/slabinfo
> skbuff_small_head    607    624    640    6    1 : tunables   54   27    8 : slabdata    104    104      5
>
> Note: after Kees Cook patches and this one, we might
> be able to revert commit
> dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> because GRO_MAX_HEAD is also small.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Very nice!

> ---
>  net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc..3e540b4924701cc57b6fbd1b668bab3b652ee94c 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -89,6 +89,19 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
>  #ifdef CONFIG_SKB_EXTENSIONS
>  static struct kmem_cache *skbuff_ext_cache __ro_after_init;
>  #endif
> +static struct kmem_cache *skb_small_head_cache __ro_after_init;
> +
> +#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
> +
> +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
> +#define SKB_SMALL_HEAD_CACHE_SIZE                                      \
> +       (is_power_of_2(SKB_SMALL_HEAD_SIZE) ?                   \
> +               (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) :        \
> +               SKB_SMALL_HEAD_SIZE)
> +
> +#define SKB_SMALL_HEAD_HEADROOM                                                \
> +       SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
> +
>  int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
>  EXPORT_SYMBOL(sysctl_max_skb_frags);
>
> @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
>         void *obj;
>
>         obj_size = SKB_HEAD_ALIGN(*size);
> +       if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> +           !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> +
> +               /* skb_small_head_cache has non power of two size,
> +                * likely forcing SLUB to use order-3 pages.
> +                * We deliberately attempt a NOMEMALLOC allocation only.
> +                */
> +               obj = kmem_cache_alloc_node(skb_small_head_cache,
> +                               flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> +                               node);
> +               if (obj) {
> +                       *size = SKB_SMALL_HEAD_CACHE_SIZE;
> +                       goto out;
> +               }
> +       }
>         *size = obj_size = kmalloc_size_roundup(obj_size);
>         /*
>          * Try a regular allocation, when that fails and we're not entitled
> @@ -805,6 +833,14 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
>         return page_pool_return_skb_page(virt_to_page(data));
>  }
>
> +static void skb_kfree_head(void *head, unsigned int end_offset)
> +{
> +       if (end_offset == SKB_SMALL_HEAD_HEADROOM)
> +               kmem_cache_free(skb_small_head_cache, head);
> +       else
> +               kfree(head);
> +}
> +
>  static void skb_free_head(struct sk_buff *skb)
>  {
>         unsigned char *head = skb->head;
> @@ -814,7 +850,7 @@ static void skb_free_head(struct sk_buff *skb)
>                         return;
>                 skb_free_frag(head);
>         } else {
> -               kfree(head);
> +               skb_kfree_head(head, skb_end_offset(skb));
>         }
>  }
>
> @@ -1995,7 +2031,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>         return 0;
>
>  nofrags:
> -       kfree(data);
> +       skb_kfree_head(data, size);
>  nodata:
>         return -ENOMEM;
>  }
> @@ -4633,6 +4669,12 @@ void __init skb_init(void)
>                                                 0,
>                                                 SLAB_HWCACHE_ALIGN|SLAB_PANIC,
>                                                 NULL);
> +       skb_small_head_cache = kmem_cache_create("skbuff_small_head",
> +                                               SKB_SMALL_HEAD_CACHE_SIZE,
> +                                               0,
> +                                               SLAB_HWCACHE_ALIGN | SLAB_PANIC,
> +                                               NULL);
> +
>         skb_extensions_init();
>  }
>
> @@ -6297,7 +6339,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>         if (skb_cloned(skb)) {
>                 /* drop the old head gracefully */
>                 if (skb_orphan_frags(skb, gfp_mask)) {
> -                       kfree(data);
> +                       skb_kfree_head(data, size);
>                         return -ENOMEM;
>                 }
>                 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> @@ -6405,7 +6447,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>         memcpy((struct skb_shared_info *)(data + size),
>                skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
>         if (skb_orphan_frags(skb, gfp_mask)) {
> -               kfree(data);
> +               skb_kfree_head(data, size);
>                 return -ENOMEM;
>         }
>         shinfo = (struct skb_shared_info *)(data + size);
> @@ -6441,7 +6483,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>                 /* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
>                 if (skb_has_frag_list(skb))
>                         kfree_skb_list(skb_shinfo(skb)->frag_list);
> -               kfree(data);
> +               skb_kfree_head(data, size);
>                 return -ENOMEM;
>         }
>         skb_release_data(skb, SKB_CONSUMED);
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
  2023-02-02 20:14   ` Soheil Hassas Yeganeh
@ 2023-02-03  5:11   ` Jakub Kicinski
  2023-02-03  7:00     ` Eric Dumazet
  2023-02-03  7:59   ` Paolo Abeni
  2023-02-03 19:37   ` kernel test robot
  3 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03  5:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Thu,  2 Feb 2023 18:58:01 +0000 Eric Dumazet wrote:
> +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */

Why is that?  Is it to prevent potential mixing up of objects from 
the cache with objects from general slabs (since we only do a
end_offset == SKB_SMALL_HEAD_HEADROOM check)?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03  5:11   ` Jakub Kicinski
@ 2023-02-03  7:00     ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-03  7:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Fri, Feb 3, 2023 at 6:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu,  2 Feb 2023 18:58:01 +0000 Eric Dumazet wrote:
> > +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
>
> Why is that?  Is it to prevent potential mixing up of objects from
> the cache with objects from general slabs (since we only do a
> end_offset == SKB_SMALL_HEAD_HEADROOM check)?

Good question.

Some alloc_skb() callers use GFP_DMA (or __GFP_ACCOUNT)
we can not use the dedicated kmem_cache for them.

They could get an object of size 512 or 1024

Since I chose not adding yet another
skb->head_has_been_allocated_from_small_head_cache,
we want to make sure we will  not have issues in the future, if
SKB_HEAD_ALIGN(MAX_TCP_HEADER)
becomes a power-of-two. (for example for some of us increasing MAX_SKB_FRAGS)

Alternative would be to add a check at boot time, making sure
no standard cache has the same object size.

This might have an issue with CONFIG_SLOB=y, I wish this was gone already...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
  2023-02-02 20:14   ` Soheil Hassas Yeganeh
  2023-02-03  5:11   ` Jakub Kicinski
@ 2023-02-03  7:59   ` Paolo Abeni
  2023-02-03  8:17     ` Eric Dumazet
  2023-02-03 19:37   ` kernel test robot
  3 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2023-02-03  7:59 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh

On Thu, 2023-02-02 at 18:58 +0000, Eric Dumazet wrote:
> Note: after Kees Cook patches and this one, we might
> be able to revert commit
> dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> because GRO_MAX_HEAD is also small.

I guess I'll need some time to do the relevant benchmarks, but I'm not
able to schedule them very soon.

> @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
>  	void *obj;
>  
>  	obj_size = SKB_HEAD_ALIGN(*size);
> +	if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> +	    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> +
> +		/* skb_small_head_cache has non power of two size,
> +		 * likely forcing SLUB to use order-3 pages.
> +		 * We deliberately attempt a NOMEMALLOC allocation only.
> +		 */
> +		obj = kmem_cache_alloc_node(skb_small_head_cache,
> +				flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> +				node);
> +		if (obj) {
> +			*size = SKB_SMALL_HEAD_CACHE_SIZE;
> +			goto out;
> +		}

In case kmem allocation failure, should we try to skip the 2nd 
__GFP_NOMEMALLOC attempt below?

I *think* non power of two size is also required to avoid an issue
plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
kmem_cache allocated.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03  7:59   ` Paolo Abeni
@ 2023-02-03  8:17     ` Eric Dumazet
  2023-02-03 19:47       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-03  8:17 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Fri, Feb 3, 2023 at 8:59 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2023-02-02 at 18:58 +0000, Eric Dumazet wrote:
> > Note: after Kees Cook patches and this one, we might
> > be able to revert commit
> > dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> > because GRO_MAX_HEAD is also small.
>
> I guess I'll need some time to do the relevant benchmarks, but I'm not
> able to schedule them very soon.

No worries, this can be done later.
Note the results might depend on SLUB/SLAB choice.

>
> > @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
> >       void *obj;
> >
> >       obj_size = SKB_HEAD_ALIGN(*size);
> > +     if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> > +         !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> > +
> > +             /* skb_small_head_cache has non power of two size,
> > +              * likely forcing SLUB to use order-3 pages.
> > +              * We deliberately attempt a NOMEMALLOC allocation only.
> > +              */
> > +             obj = kmem_cache_alloc_node(skb_small_head_cache,
> > +                             flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> > +                             node);
> > +             if (obj) {
> > +                     *size = SKB_SMALL_HEAD_CACHE_SIZE;
> > +                     goto out;
> > +             }
>
> In case kmem allocation failure, should we try to skip the 2nd
> __GFP_NOMEMALLOC attempt below?

We could, but my reasoning was that we might find an object in the
other kmem_cache freelist.

kmalloc-1 objects tend to use smaller page orders, so are more likely
to succeed if
there is no order-3 page available in the buddy allocator.(I mentioned
this in the comment)


>
> I *think* non power of two size is also required to avoid an issue
> plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
> kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
> kmem_cache allocated.

Indeed there are multiple cases explaining why SKB_SMALL_HEAD_CACHE_SIZE
needs to be a non power of two.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
                     ` (2 preceding siblings ...)
  2023-02-03  7:59   ` Paolo Abeni
@ 2023-02-03 19:37   ` kernel test robot
  2023-02-03 19:42     ` Jakub Kicinski
  3 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2023-02-03 19:37 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: oe-kbuild-all, netdev, eric.dumazet, Alexander Duyck,
	Soheil Hassas Yeganeh, Eric Dumazet

Hi Eric,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230203-031126
patch link:    https://lore.kernel.org/r/20230202185801.4179599-5-edumazet%40google.com
patch subject: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
config: arc-randconfig-r014-20230204 (https://download.01.org/0day-ci/archive/20230204/202302040329.E10xZHbY-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/002bb9238e98f3fbeb0402c97f711420c3321b93
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230203-031126
        git checkout 002bb9238e98f3fbeb0402c97f711420c3321b93
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash net/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/core/skbuff.c: In function 'kmalloc_reserve':
>> net/core/skbuff.c:503:23: error: 'KMALLOC_NOT_NORMAL_BITS' undeclared (first use in this function)
     503 |             !(flags & KMALLOC_NOT_NORMAL_BITS)) {
         |                       ^~~~~~~~~~~~~~~~~~~~~~~
   net/core/skbuff.c:503:23: note: each undeclared identifier is reported only once for each function it appears in


vim +/KMALLOC_NOT_NORMAL_BITS +503 net/core/skbuff.c

   486	
   487	/*
   488	 * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
   489	 * the caller if emergency pfmemalloc reserves are being used. If it is and
   490	 * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
   491	 * may be used. Otherwise, the packet data may be discarded until enough
   492	 * memory is free
   493	 */
   494	static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
   495				     bool *pfmemalloc)
   496	{
   497		bool ret_pfmemalloc = false;
   498		unsigned int obj_size;
   499		void *obj;
   500	
   501		obj_size = SKB_HEAD_ALIGN(*size);
   502		if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
 > 503		    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
   504	
   505			/* skb_small_head_cache has non power of two size,
   506			 * likely forcing SLUB to use order-3 pages.
   507			 * We deliberately attempt a NOMEMALLOC allocation only.
   508			 */
   509			obj = kmem_cache_alloc_node(skb_small_head_cache,
   510					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
   511					node);
   512			if (obj) {
   513				*size = SKB_SMALL_HEAD_CACHE_SIZE;
   514				goto out;
   515			}
   516		}
   517		*size = obj_size = kmalloc_size_roundup(obj_size);
   518		/*
   519		 * Try a regular allocation, when that fails and we're not entitled
   520		 * to the reserves, fail.
   521		 */
   522		obj = kmalloc_node_track_caller(obj_size,
   523						flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
   524						node);
   525		if (obj || !(gfp_pfmemalloc_allowed(flags)))
   526			goto out;
   527	
   528		/* Try again but now we are using pfmemalloc reserves */
   529		ret_pfmemalloc = true;
   530		obj = kmalloc_node_track_caller(obj_size, flags, node);
   531	
   532	out:
   533		if (pfmemalloc)
   534			*pfmemalloc = ret_pfmemalloc;
   535	
   536		return obj;
   537	}
   538	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03 19:37   ` kernel test robot
@ 2023-02-03 19:42     ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 19:42 UTC (permalink / raw)
  To: kernel test robot
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, oe-kbuild-all,
	netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh

On Sat, 4 Feb 2023 03:37:10 +0800 kernel test robot wrote:
> All errors (new ones prefixed by >>):
> 
>    net/core/skbuff.c: In function 'kmalloc_reserve':
> >> net/core/skbuff.c:503:23: error: 'KMALLOC_NOT_NORMAL_BITS' undeclared (first use in this function)  
>      503 |             !(flags & KMALLOC_NOT_NORMAL_BITS)) {
>          |                       ^~~~~~~~~~~~~~~~~~~~~~~
>    net/core/skbuff.c:503:23: note: each undeclared identifier is reported only once for each function it appears in

diff --git a/net/Kconfig b/net/Kconfig
index 27bc49c7ad73..41e27a4805a8 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -8,6 +8,7 @@ menuconfig NET
        select NLATTR
        select GENERIC_NET_UTILS
        select BPF
+       depends on !SLOB
        help
          Unless you really know what you are doing, you should say Y here.
          The reason is that some programs need kernel networking support even

? :)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03  8:17     ` Eric Dumazet
@ 2023-02-03 19:47       ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 19:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, David S . Miller, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Fri, 3 Feb 2023 09:17:55 +0100 Eric Dumazet wrote:
> > I *think* non power of two size is also required to avoid an issue
> > plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
> > kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
> > kmem_cache allocated.  
> 
> Indeed there are multiple cases explaining why SKB_SMALL_HEAD_CACHE_SIZE
> needs to be a non power of two.

Since you may need to respin would you mind spelling this out a bit
more in the comment ? Maybe: 

-/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two.
+ * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique
+ * size, and we can differentiate heads from skb_small_head_cache
+ * vs system slabs by looking at their size (skb_end_offset()).
+ */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
                   ` (3 preceding siblings ...)
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
@ 2023-02-06 18:16 ` Paolo Abeni
  4 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2023-02-06 18:16 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh

On Thu, 2023-02-02 at 18:57 +0000, Eric Dumazet wrote:
> Our profile data show that using kmalloc(non_const_size)/kfree(ptr)
> has a certain cost, because kfree(ptr) has to pull a 'struct page'
> in cpu caches.
> 
> Using a dedicated kmem_cache for TCP skb->head allocations makes
> a difference, both in cpu cycles and memory savings.
> 
> This kmem_cache could also be used for GRO skb allocations,
> this is left as a future exercise.
> 
> Eric Dumazet (4):
>   net: add SKB_HEAD_ALIGN() helper
>   net: remove osize variable in __alloc_skb()
>   net: factorize code in kmalloc_reserve()
>   net: add dedicated kmem_cache for typical/small skb->head
> 
>  include/linux/skbuff.h |  8 ++++
>  net/core/skbuff.c      | 95 +++++++++++++++++++++++++++---------------
>  2 files changed, 70 insertions(+), 33 deletions(-)

LGTM,

Acked-by: Paolo Abeni <pabeni@redhat.com>

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-02-06 18:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
2023-02-02 20:06   ` Soheil Hassas Yeganeh
2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
2023-02-02 20:07   ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
2023-02-02 20:09   ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
2023-02-02 20:14   ` Soheil Hassas Yeganeh
2023-02-03  5:11   ` Jakub Kicinski
2023-02-03  7:00     ` Eric Dumazet
2023-02-03  7:59   ` Paolo Abeni
2023-02-03  8:17     ` Eric Dumazet
2023-02-03 19:47       ` Jakub Kicinski
2023-02-03 19:37   ` kernel test robot
2023-02-03 19:42     ` Jakub Kicinski
2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).